This article introduces a "filesystem work-bus" as a lightweight alternative to message brokers or complex frameworks for orchestrating a fleet of independent AI agent CLIs. The system uses atomic file operations to manage task and result states, providing durability, language-agnostic coordination, and graceful degradation without the overhead of traditional distributed messaging systems.
Read original on Dev.to #systemdesignCoordinating a fleet of independent command-line interface (CLI) based AI agents often leads to considering heavyweight solutions like in-process orchestration frameworks (e.g., LangGraph) or dedicated message brokers (e.g., Kafka, Redis, RabbitMQ). However, for smaller, single-operator fleets, these options introduce significant operational overhead, tight coupling, or complex infrastructure management.
The proposed filesystem work-bus offers a simpler approach. A conductor process manages a shared directory, decomposing a goal into a Directed Acyclic Graph (DAG) of subtasks. Each subtask is represented by a `Task` file written to the bus, and workers respond by writing a `Result` file. The conductor polls for these results, absorbing them and advancing the DAG.
Atomic Writes for Durability
The core safety mechanism is atomic file writes. Records are written to a temporary path and then atomically `rename`d into place (a POSIX guarantee). This ensures readers always see a complete file, preventing half-written or corrupted states and providing durability across restarts.
# atomic publish — a reader never sees a partial record
def publish(path, record):
tmp = path.with_suffix(".tmp")
tmp.write_text(record.model_dump_json())
tmp.rename(path) # atomic on POSIX
# the conductor loop
for task in topo_order(dag):
publish(bus / f"{task.id}.task.json", task)
result = poll(bus / f"{task.id}.result.json", backoff=...) # durable: waits for the file
absorb(result)