This article details Avanse Financial Services' migration from a legacy external analytics application to a cloud-native lakehouse architecture on AWS, leveraging Amazon SageMaker Unified Studio. The migration addressed critical issues like data synchronization bottlenecks, high licensing costs, limited auditability, and poor data discovery by unifying data engineering, analytics, and AI/ML workflows within a single, governed AWS environment. The new architecture is built on Amazon S3 for storage, Athena and EMR Serverless for compute, and SageMaker Unified Studio for a unified user experience.
Read original on AWS Architecture BlogAvanse Financial Services faced significant challenges with their previous analytics architecture, which involved an AWS-based data lake and a separate, external analytics application. This dual-system approach led to daily 4-hour data synchronization bottlenecks, making business decisions based on stale data. Other issues included fixed licensing costs regardless of usage, poor auditability, and a lack of centralized data discovery and native integration with AWS services.
The core of Avanse's modernization was a shift from a two-application model to a single, integrated cloud-native lakehouse architecture. The previous setup required batch copying data from Amazon S3 to an external application, creating silos and governance complexities. The new architecture consolidates analytics directly on data stored in Amazon S3, using Amazon SageMaker Unified Studio as the central hub. This eliminates data movement, enables usage-based pricing, and centralizes governance.
Why a Lakehouse Architecture?
The lakehouse pattern combines the scalability and cost-effectiveness of data lakes (e.g., storing raw data in S3) with the data management and ACID transaction features of data warehouses (e.g., schema enforcement, data governance, and analytics capabilities). This provides flexibility for various workloads from raw data processing to structured reporting and ML, which is crucial for financial analytics requiring data consistency and auditability.