Menu
DZone Microservices·June 9, 2026

Combining Temporal and Kafka for Resilient Distributed Systems

This article explores the synergy between Apache Kafka and Temporal for building resilient distributed systems. It emphasizes that Kafka excels at durable event streaming, while Temporal manages long-running workflow state, retries, and recovery. The combination allows for robust business process orchestration where Kafka provides the event backbone and Temporal acts as the control plane.

Read original on DZone Microservices

Kafka as Event Backbone, Temporal as Control Plane

The core thesis is that Kafka and Temporal address different failure boundaries and are complementary, not substitutes. Kafka is optimized for moving ordered, replayable event streams across many consumers and machines, ideal for integration boundaries and decoupling services. Temporal, on the other hand, provides durable orchestration for long-running application logic, ensuring workflows recover from crashes, outages, and restarts by replaying persisted event history. This combination is powerful: Kafka carries "facts" (events) and Temporal remembers "intent" (business process state, timers, retries, compensations).

Design Principle: Keep Kafka at the Boundary of Workflow Replay

A critical design rule is to avoid embedding Kafka client calls directly within Temporal Workflow code. Temporal workflows require deterministic logic on replay. Non-deterministic operations, like API calls or database queries, should be encapsulated within Activities. Workflows should function as lean state machines deciding what actions to take, while Activities perform the actual side effects, which can be retried or fail. This separation ensures Kafka remains an external event fabric without compromising Temporal's replay semantics.

java
private boolean paymentReceived;
private final OrderActivities activities = Workflow.newActivityStub(
 OrderActivities.class,
 ActivityOptions.newBuilder()
 .setStartToCloseTimeout(Duration.ofSeconds(30))
 .setRetryOptions(
 RetryOptions.newBuilder()
 .setInitialInterval(Duration.ofSeconds(1))
 .setMaximumInterval(Duration.ofSeconds(30))
 .build())
 .build());

@WorkflowMethod
public void process(String orderId) {
 activities.reserveInventory(orderId);
 boolean paid = Workflow.await(Duration.ofHours(2), () -> paymentReceived);
 if (!paid) {
 activities.releaseInventory(orderId);
 activities.publishTimedOut(orderId);
 return;
 }
 activities.publishConfirmed(orderId);
}

@SignalMethod
public void paymentCaptured(String paymentId) {
 paymentReceived = true;
}

This example shows a simple, robust workflow. Inventory reservation and event publication are delegated to Activities. The workflow merely manages state and waits for events. The two-hour wait is durable, meaning Temporal persists timers, allowing execution to resume even after worker or service interruptions. Kafka would supply the external `paymentCaptured` event, which a thin bridge translates into a Temporal signal.

Handling Duplicates and Failure Semantics

The handoff from Kafka to Temporal should be designed for duplicate tolerance. While Kafka allows manual offset control, a crash between Temporal accepting a signal and the offset commit can lead to redelivery. Making Temporal Workflow IDs stable for a given business entity and ensuring Activities are idempotent are crucial, as Temporal may retry Activity executions. The article clarifies that neither Kafka's transactional features nor Temporal's effectively-once scheduling inherently provide end-to-end exactly-once semantics for external side effects; explicit idempotency keys or transactional guarantees across all involved systems are required.

Long-Running Discipline: Operational Considerations

  • Key Alignment: Reusing the business key as both the Kafka record key and Temporal Workflow ID creates a clean ownership model, serializing updates across both systems.
  • Thin Kafka Bridge: The Kafka consumer bridge should remain thin and poll regularly to avoid being rebalanced out of the group.
  • History Management: Long-running Temporal workflows with many Signals or Activities require careful history management. Temporal's Event History, while enabling recovery, has performance limits. Using `Continue-As-New` periodically is recommended for very long-lived or event-intensive workflows.
  • Code Evolution: Workflow logic is replayed, so code changes must be handled deliberately using Temporal's versioning guidance (patching or worker versioning) to maintain determinism for in-flight executions.
KafkaTemporalEvent-DrivenWorkflowsOrchestrationResilienceIdempotencyDistributed Transactions

Comments

Loading comments...