Menu
InfoQ Architecture·March 31, 2026

Discord's Osprey: A High-Throughput Safety Rules Engine for Real-time Event Processing

Discord open-sourced Osprey, a scalable event stream decisions engine designed for real-time threat detection and mitigation, capable of processing 2.3 million rules per second. Its architecture combines a Rust-based coordinator for high-concurrency event stream management and stateless Python worker nodes for rule evaluation. This polyglot design pattern is key to achieving high throughput and scalability while maintaining developer agility.

Read original on InfoQ Architecture

Introduction to Osprey's Architecture

Osprey is an event stream decisions engine, open-sourced by Discord, that evaluates real-time platform activity against dynamically loadable rules to execute automated responses. It's built for high-throughput scenarios, processing 2.3 million rules per second across 400 million daily actions, demonstrating its capability for real-time threat detection and mitigation. The system is designed for horizontal scalability and has already seen adoption by other networks like Bluesky and Matrix.org.

Polyglot Architecture: Rust Coordinator & Python Workers

A core architectural decision in Osprey is its polyglot design, utilizing Rust for its high-performance coordinator service and Python for its stateless worker nodes. This separation allows Rust to manage asynchronous event streams from message queues (like Kafka) and prioritize synchronous gRPC requests, acting as the high-performance data plane. Python workers handle the business logic of rule evaluation, leveraging the domain-specific language SML (with Python syntax) for accessibility and extensibility via User Defined Functions (UDFs).

💡

Polyglot Architecture for Performance and Agility

The Rust-Python polyglot pattern is becoming standard for systems requiring high throughput. Rust handles compute-heavy operations and network traffic (data plane), maximizing hardware utilization. Python handles business logic, ML integrations, and user APIs (control plane), maintaining developer velocity. This allows teams to optimize for both performance and development speed.

Scalable Rule Evaluation and State Management

The Python worker nodes are stateless and containerized, enabling easy horizontal scaling to accommodate traffic spikes. Rules are distributed via ETCD, allowing dynamic updates in production without redeploying the application. To optimize execution, SML rules are parsed into an Abstract Syntax Tree (AST) at worker startup, front-loading compilation costs and minimizing per-event processing time. Osprey tracks state across 'Entities' for classification and generates verdicts routed to configurable output sinks, often leveraging Apache Kafka and Apache Druid for real-time analysis.

  • Coordinator (Rust): Manages high-concurrency event streams, traffic shaping, and prioritizes requests for stable latency.
  • Worker Nodes (Python): Stateless, horizontally scalable, evaluate SML rules, support UDFs for extensibility.
  • Rule Distribution (ETCD): Enables dynamic, real-time updates to rules without downtime.
  • Data Flow: Actions (JSON events) -> Osprey engine -> Verdicts/Effects -> Output Sinks (e.g., Kafka to Druid).
📌

Example of a System Leveraging a Rules Engine

Imagine building a financial transaction fraud detection system. A rules engine like Osprey could evaluate incoming transactions against a set of predefined fraud rules (e.g., 'transaction amount > $1000 AND initiated from a new IP address'). The system would then generate a verdict (e.g., 'flag for review') and route it to an alert system, demonstrating real-time decision-making.

Rules EngineReal-time ProcessingPolyglot ArchitectureRustPythonEvent StreamsScalabilityThreat Detection

Comments

Loading comments...