Agoda transitioned from disparate data pipelines to a Financial Unified Data Pipeline (FINUDP) to overcome inconsistencies and ensure a single source of truth for critical financial data. This article details the architectural decisions, technical practices like shadow testing and proactive monitoring, and challenges faced in building a robust, high-quality data system using Apache Spark.
Read original on ByteByteGoAgoda faced significant challenges with multiple, independently developed data pipelines for financial data. Each team's pipeline, while offering simplicity and clear ownership initially, led to duplicate data sources, inconsistent definitions and transformations, and a lack of centralized monitoring and quality control. This resulted in discrepancies in financial reporting, undermined data reliability, and wasted computational resources.
To address these issues, Agoda developed the Financial Unified Data Pipeline (FINUDP), a centralized system built on Apache Spark for distributed processing. The architecture comprises:
Key Non-Functional Requirements
FINUDP was designed with three critical non-functional requirements in mind: data freshness (hourly updates, monitored by GoFresh), reliability (automated data quality checks with immediate alerts), and maintainability (strong peer-reviewed designs, mandatory code reviews, and shadow testing).
Agoda implemented several technical practices to ensure the reliability and quality of FINUDP, crucial for any production-grade data system:
These practices collectively enhance data reliability, ensure consistent data quality, and provide robust mechanisms for identifying and resolving issues quickly, significantly improving trust in financial data.