This article explores a system architecture for detecting malicious code at scale, evolving from individual pull requests to comprehensive dependency package analysis. It details how a combination of stacked LLM evaluations and tool-driven investigation is used to maintain accuracy and cost efficiency in a high-throughput security scanning pipeline, highlighting the trade-offs and design considerations for such a system.
Read original on Datadog BlogThe challenge of detecting malicious code scales significantly when moving from individual code changes (like pull requests) to analyzing vast numbers of external dependency packages. This requires a robust, scalable system that can process high volumes of code efficiently while minimizing false positives and maintaining detection accuracy. The core idea is to combine the pattern recognition capabilities of Large Language Models (LLMs) with the precision of static and dynamic analysis tools.
A key architectural pattern discussed involves a multi-stage pipeline. Initial stages might use lightweight, fast heuristics or LLM-based pre-screening to filter out obvious benign cases and prioritize suspicious ones. Subsequent stages then apply more resource-intensive, but highly accurate, tools and deeper LLM analysis. This layered approach is crucial for optimizing cost and latency.
Instead of a single LLM pass, the system employs a 'stacked LLM evaluation' model. This could involve multiple LLMs, or multiple prompts to the same LLM, each trained or configured for different aspects of malicious code detection (e.g., one for identifying obfuscation, another for analyzing API calls). This ensemble approach enhances the overall confidence and reduces false positives, acting as a sophisticated feature extractor or classifier in a larger ML pipeline.
Design Consideration: LLM Integration
When integrating LLMs into a security pipeline, consider asynchronous processing for long-running analyses. Utilize caching mechanisms for frequently scanned or unchanged code segments to reduce inference costs. Implement robust retry mechanisms and circuit breakers to handle API rate limits or LLM service outages.
LLMs provide strong signals, but traditional security tools provide definitive evidence. The system integrates static analysis (SAST) tools, dynamic analysis (DAST) in sandboxed environments, and dependency graph analysis. The LLM's role might be to guide which tools to run or to interpret the output of these tools, effectively creating a feedback loop where LLM insights inform tool execution and tool results refine LLM understanding.