This article details Atlassian's Forge billing architecture, focusing on how it tracks and processes distributed usage signals at scale for its serverless platform. The system handles high-volume events from various services, ensuring validation, attribution, deduplication, and aggregation for accurate financial records and near real-time visibility.
Read original on InfoQ ArchitectureAtlassian's Forge platform, a serverless extensibility ecosystem for products like Jira and Confluence, evolved to support usage-based pricing. This shift necessitated a robust billing architecture capable of accurately tracking and processing high-volume, distributed usage events. The core challenge involved collecting fine-grained signals (e.g., function invocations, storage consumption) from independent services, ensuring their financial correctness, and transforming them into billing-ready records without loss or duplication.
The Forge billing architecture comprises several key layers: Forge services that emit usage events, a centralized ingestion and streaming layer, a Usage Tracking Service (UTS), and downstream billing/commerce systems. This decoupled design allows for independent scaling and resilience. Services use a shared schema for events to ensure consistent interpretation across the pipeline.
The UTS acts as the "nervous system" for Forge Billing. Its responsibilities include validating, normalizing, enriching, and preparing incoming usage data. Crucially, UTS ensures that each event is correctly attributed to the right entitlement or subscription context before persistence and further processing. This attribution is a key complexity in multi-tenant, distributed billing systems.
Key System Design Challenges
Building such a system involves critical challenges like ensuring financial correctness (no double counting, no lost events), handling distributed consistency (idempotency, out-of-order events), and supporting high volume and scale with near real-time visibility.
Events flow through a Kafka-based streaming infrastructure, which provides schema validation and reliable delivery. The tracking layer then performs validation, normalization, enrichment, deduplication, and ordering. Idempotent event design and time-based aggregation are used to prevent double-counting and correctly incorporate late-arriving events through windowed processing. A stream processing engine aggregates raw usage events into metrics for billing and analytics.