Menu
Dev.to #systemdesign·June 24, 2026

Filesystem Work-Bus for Agent Orchestration without a Message Broker

This article introduces a "filesystem work-bus" as a lightweight alternative to message brokers or complex frameworks for orchestrating a fleet of independent AI agent CLIs. The system uses atomic file operations to manage task and result states, providing durability, language-agnostic coordination, and graceful degradation without the overhead of traditional distributed messaging systems.

Read original on Dev.to #systemdesign

Coordinating a fleet of independent command-line interface (CLI) based AI agents often leads to considering heavyweight solutions like in-process orchestration frameworks (e.g., LangGraph) or dedicated message brokers (e.g., Kafka, Redis, RabbitMQ). However, for smaller, single-operator fleets, these options introduce significant operational overhead, tight coupling, or complex infrastructure management.

The Filesystem Work-Bus Pattern

The proposed filesystem work-bus offers a simpler approach. A conductor process manages a shared directory, decomposing a goal into a Directed Acyclic Graph (DAG) of subtasks. Each subtask is represented by a `Task` file written to the bus, and workers respond by writing a `Result` file. The conductor polls for these results, absorbing them and advancing the DAG.

  1. Decomposition: Break down a complex goal into a DAG of subtasks (e.g., gather → narrate → build).
  2. Task Publication: For each ready subtask, write a `Task` file (tagged with required capabilities) to the bus.
  3. Result Polling: Continuously poll for the corresponding `Result` file with backoff.
  4. Absorption & Progression: Validate the result, absorb it, and release subsequent subtasks in the DAG.
ℹ️

Atomic Writes for Durability

The core safety mechanism is atomic file writes. Records are written to a temporary path and then atomically `rename`d into place (a POSIX guarantee). This ensures readers always see a complete file, preventing half-written or corrupted states and providing durability across restarts.

python
# atomic publish — a reader never sees a partial record 
def publish(path, record):
    tmp = path.with_suffix(".tmp")
    tmp.write_text(record.model_dump_json())
    tmp.rename(path) # atomic on POSIX 

# the conductor loop 
for task in topo_order(dag):
    publish(bus / f"{task.id}.task.json", task)
    result = poll(bus / f"{task.id}.result.json", backoff=...) # durable: waits for the file 
    absorb(result)

Key Architectural Advantages

  • State vs. Events: Unlike ephemeral event buses, the file-bus maintains durable state. Tasks and results are persistent files, meaning workers can start late or restart and find their pending work, ensuring robust coordination.
  • Capability-Based Routing: The conductor routes tasks based on advertised worker capabilities rather than hard-coded names. This allows for dynamic fleet composition, where workers can be added or removed without reconfiguring routing logic.
  • Graceful Degradation: A critical feature for evolving systems, the conductor marks subtasks as *skipped* if no healthy worker can fulfill a required capability. This prevents system crashes due to missing workers and provides a clear
orchestrationagent-fleetserverlessmicroservicesdistributed coordinationfile-based messagingdurabilityatomic operations

Comments

Loading comments...