Slack's Anomaly Event Response (AER) system is a proactive security mechanism designed to detect and automatically respond to suspicious activities in real-time. It leverages a multi-tiered architecture involving a detection engine, a decision framework, and a response orchestrator to identify high-confidence threats and terminate user sessions, significantly reducing the detection-to-response gap. This system demonstrates a robust approach to building automated, scalable security infrastructure.
Read original on Slack EngineeringThe Anomaly Event Response (AER) system by Slack addresses the critical need for rapid detection and response to cyberattacks. Traditional security models often suffer from significant delays, allowing attackers ample time to execute their objectives. AER aims to compress this detection-to-response window from hours or days to mere minutes by autonomously identifying and neutralizing threats as they emerge on the platform. This proactive approach prevents data exfiltration and system compromise by disrupting attack chains before they fully execute.
AER employs a multi-tiered architecture for real-time anomaly detection, asynchronous job orchestration, and dynamic notifications. This design is crucial for handling billions of events daily and ensuring high availability and responsiveness. The system is composed of three primary components that work in concert to achieve automated threat mitigation.
The detection engine is responsible for monitoring Slack events at scale. It uses a hybrid approach combining rule-based heuristics with dynamic thresholds. A key architectural decision here is the calibration of these thresholds *per enterprise* based on historical usage patterns. This adaptive mechanism is vital for reducing false positives across a diverse customer base and continuously fine-tuning the sensitivity of detections. For example, excessive downloading for one organization might be normal for another, requiring individualized baselines.
Upon identification of suspicious behavior by the detection engine, the decision framework evaluates the audit payload. This component performs validation checks against the customer's specific AER configuration and internal rules. A critical aspect of its design is the ability to analyze preceding activities within a session to determine if malicious behavior is persisting despite session terminations, while simultaneously preventing legitimate users from getting caught in continuous termination loops. Once an anomaly is validated and configured for response, an asynchronous job is enqueued to trigger the automated response.
The response orchestrator executes the autonomous actions. Its primary function is to terminate all active user sessions associated with the malicious activity. Beyond termination, it generates a comprehensive `user_sessions_reset_by_anomaly_event_response` audit log, providing crucial context like the anomaly type, acting user details, and originating session ID for post-incident investigations. The orchestrator also handles dynamic notification routing based on customer preferences, ensuring relevant stakeholders (Org Primary Owner, Security Admins) are informed, while employing smart logic to prevent notification fatigue by consolidating duplicate alerts.