Datadog Blog·June 2, 2026

Scaling Malicious Code Detection with LLMs and Tool-Driven Investigation

This article explores a system architecture for detecting malicious code at scale, evolving from individual pull requests to comprehensive dependency package analysis. It details how a combination of stacked LLM evaluations and tool-driven investigation is used to maintain accuracy and cost efficiency in a high-throughput security scanning pipeline, highlighting the trade-offs and design considerations for such a system.

Security Distributed Systems AI & ML Infrastructure

Read original on Datadog Blog

The challenge of detecting malicious code scales significantly when moving from individual code changes (like pull requests) to analyzing vast numbers of external dependency packages. This requires a robust, scalable system that can process high volumes of code efficiently while minimizing false positives and maintaining detection accuracy. The core idea is to combine the pattern recognition capabilities of Large Language Models (LLMs) with the precision of static and dynamic analysis tools.

Architectural Overview for Scalable Code Analysis

A key architectural pattern discussed involves a multi-stage pipeline. Initial stages might use lightweight, fast heuristics or LLM-based pre-screening to filter out obvious benign cases and prioritize suspicious ones. Subsequent stages then apply more resource-intensive, but highly accurate, tools and deeper LLM analysis. This layered approach is crucial for optimizing cost and latency.

Leveraging LLMs in a Stacked Evaluation Model

Instead of a single LLM pass, the system employs a 'stacked LLM evaluation' model. This could involve multiple LLMs, or multiple prompts to the same LLM, each trained or configured for different aspects of malicious code detection (e.g., one for identifying obfuscation, another for analyzing API calls). This ensemble approach enhances the overall confidence and reduces false positives, acting as a sophisticated feature extractor or classifier in a larger ML pipeline.

💡

Design Consideration: LLM Integration

When integrating LLMs into a security pipeline, consider asynchronous processing for long-running analyses. Utilize caching mechanisms for frequently scanned or unchanged code segments to reduce inference costs. Implement robust retry mechanisms and circuit breakers to handle API rate limits or LLM service outages.

Tool-Driven Investigation for Accuracy

LLMs provide strong signals, but traditional security tools provide definitive evidence. The system integrates static analysis (SAST) tools, dynamic analysis (DAST) in sandboxed environments, and dependency graph analysis. The LLM's role might be to guide which tools to run or to interpret the output of these tools, effectively creating a feedback loop where LLM insights inform tool execution and tool results refine LLM understanding.

Scalability Challenges: Processing billions of lines of code efficiently.
Accuracy vs. Cost: Balancing high detection rates with inference costs and computational resources.
False Positives: Minimizing erroneous alerts to prevent alert fatigue.
Dynamic Evasion: Adapting to new obfuscation techniques and malicious patterns.

malware detectionLLMsecurity scanningpipelinestatic analysisdynamic analysissystem architecturescalability

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly scalable, real-time malicious code detection service that scans both incoming pull requests and third-party dependency packages for a large enterprise. The system should integrate multiple layers of defense, including initial LLM-based anomaly detection, detailed static and dynamic analysis tools, and a final stacked LLM evaluation for suspicious findings. Focus on how to handle high-throughput, minimize latency for critical paths, and ensure cost-effectiveness while maintaining high detection accuracy and low false positives.

Practice Interview

Focus: scalable malicious code detection pipeline using stacked LLMs and security tools

Other design angles

· Design only the LLM inference pipeline for malicious code detection, focusing on model serving, prompt engineering strategies, and managing inference costs for large codebases.· Design a secure, sandboxed environment for dynamic analysis of suspicious code packages, ensuring isolation and preventing compromise of the analysis infrastructure.· Architect a feedback loop system where threat intelligence from detected malware is used to retrain and improve the detection models and rules.