Real-Time Event Processing for Yelo Digital Marketplace
The Challenge
Yelo's existing batch-based data processing system couldn't keep pace with rapid business growth. The platform was processing over 100,000 transactions daily, with each transaction generating dozens of events (orders, payments, inventory updates, notifications). The legacy system had 15-30 minute delays in reporting, preventing real-time operational dashboards and dynamic pricing capabilities. Data engineers spent hours debugging failed batch jobs, and business teams lacked visibility into current operations.
The Strategy
- 1 Design event-driven architecture using Apache Kafka for real-time streaming
- 2 Build data pipelines to capture and process transaction lifecycle events in sub-second latency
- 3 Implement Apache Airflow for orchestrating downstream analytics workflows
- 4 Create real-time dashboards for operations and business intelligence teams
๐ Kafka Event Streaming Architecture
The Problem We Found
The existing system used nightly batch ETL jobs that pulled data from operational databases. This approach created table locks during peak hours, caused data consistency issues, and prevented real-time analysis. There was no audit trail of event sequencing.
Our Approach
- Deployed 3-node Apache Kafka cluster on AWS MSK with cross-AZ replication
- Created topic structure for marketplace events: orders, payments, inventory, notifications
- Implemented event producers in microservices to publish events asynchronously
- Built Kafka Streams applications for real-time aggregations and transformations
- Set up schema registry for event schema evolution
The Result
Achieved sub-200ms event publishing latency and guaranteed event ordering within partitions. Event producers became fire-and-forget, removing database bottlenecks. Operations team gained real-time visibility into transaction funnel metrics.
Metrics
๐ Real-Time Analytics Pipelines
The Problem We Found
Business stakeholders needed up-to-the-minute metrics for dynamic pricing, inventory management, and demand forecasting. The batch system provided yesterday's insights for today's decisions.
Our Approach
- Built Kafka Connect pipelines to stream events into Amazon S3 for data lake
- Created real-time aggregation jobs using Kafka Streams for windowed metrics
- Implemented Flink SQL jobs for complex event processing patterns
- Developed real-time dashboards in Tableau connected to materialized views
- Set up alerting for anomaly detection (sudden demand spikes, inventory issues)
The Result
Operations team now monitors live transaction metrics with <5 second latency. Dynamic pricing algorithm responds to demand changes in real-time. Inventory management improved as managers could identify supply gaps immediately.
Metrics
๐ Workflow Orchestration with Airflow
The Problem We Found
Downstream analytics jobs (financial reporting, ML model training, vendor payouts) still needed coordinated batch processing but were tightly coupled to the legacy ETL system.
Our Approach
- Deployed Apache Airflow on AWS ECS for scalable workflow orchestration
- Migrated 25+ legacy batch jobs to Airflow DAGs with proper dependency management
- Created sensors to trigger jobs based on Kafka event thresholds
- Implemented data quality checks and alerting for pipeline failures
- Set up backfill capabilities for historical data reprocessing
The Result
Reduced pipeline failures by 85% through better dependency management and retry logic. Data engineering team gained observability through Airflow UI. New analytics pipelines could be deployed in hours instead of weeks.
Metrics
Impact & Results
The real-time data architecture transformation enabled Yelo to scale from 100K to 500K+ daily transactions without data infrastructure limitations. Real-time dashboards empowered operations teams to respond instantly to demand surges and inventory issues. The platform now processes over 5 million events daily with sub-second latency, enabling features like dynamic pricing and predictive inventory management. Data engineering velocity increased 10x with Airflow orchestration replacing brittle legacy ETL.
"Zatsys transformed our data infrastructure from a reporting bottleneck to a real-time competitive advantage. We can now make operational decisions based on what's happening right now, not what happened yesterday."
Facing Similar Challenges?
Let's discuss how we can help transform your data infrastructure.