Coordination Patterns

How agents in a swarm communicate and coordinate.

Pattern 1: Orchestrator-Worker

A central orchestrator (Lima) directs worker agents.

flowchart TB
    Orch[Lima Orchestrator]
    
    Orch --> A[Agent A]
    Orch --> B[Agent B]
    Orch --> C[Agent C]
    
    A --> Orch
    B --> Orch
    C --> Orch

Use Case: Most common pattern in Crella Engine.

Pros:

Clear command structure
Easy to debug
Centralized state

Cons:

Orchestrator is single point of failure
Can bottleneck at scale

Pattern 2: Pipeline (Sequential)

Agents process in sequence, each passing to the next.

flowchart LR
    A[Alpha] --> B[Bravo] --> C[Charlie] --> D[Delta]

Use Case: Linear workflows like document processing.

Pros:

Simple to understand
Clear dependencies
Easy error tracking

Cons:

Slowest pattern
One failure stops pipeline

Pattern 3: Fan-Out / Fan-In

One agent distributes work, another collects results.

flowchart TB
    Distribute[Distribute Task]
    
    Distribute --> A[Worker A]
    Distribute --> B[Worker B]
    Distribute --> C[Worker C]
    
    A --> Collect[Collect Results]
    B --> Collect
    C --> Collect

Use Case: Parallel research, batch processing.

Pros:

Maximum parallelism
Fastest for independent tasks

Cons:

Requires aggregation logic
Complex error handling

Pattern 4: Peer-to-Peer

Agents communicate directly with each other.

flowchart LR
    A[Agent A] <--> B[Agent B]
    B <--> C[Agent C]
    A <--> C

Use Case: Collaborative tasks, real-time adjustments.

Pros:

Flexible
No single point of failure
Low latency

Cons:

Complex coordination
Harder to debug
State management challenging

Pattern 5: Event-Driven

Agents respond to events, not direct commands.

flowchart TB
    Event[Event Bus]
    
    Event --> |document.uploaded| Charlie[Charlie]
    Event --> |lead.qualified| Echo[Echo]
    Event --> |response.received| India[India]
    
    Charlie --> |document.processed| Event
    Echo --> |lead.enriched| Event
    India --> |escalation.created| Event

Use Case: Decoupled systems, async processing.

Pros:

Highly scalable
Loose coupling
Easy to add agents

Cons:

Eventually consistent
Debugging complexity

Communication Protocols

Synchronous (Request-Response)

// Agent A calls Agent B directly
const result = await agentB.process(data);

When to use: Need immediate result, simple operations.

Asynchronous (Message Queue)

// Agent A publishes to queue
await queue.publish('charlie.tasks', { document: data });

// Charlie subscribes and processes
queue.subscribe('charlie.tasks', async (msg) => {
  const result = await process(msg.document);
  await queue.publish('charlie.results', result);
});

When to use: High volume, fire-and-forget, resilience needed.

State Management

Centralized State (Redis)

flowchart TB
    subgraph agents [Agents]
        A[Alpha]
        B[Bravo]
        C[Charlie]
    end
    
    Redis[(Redis State Store)]
    
    A --> Redis
    B --> Redis
    C --> Redis

Use: Shared counters, session data, rate limits.

Distributed State (Per-Agent)

flowchart LR
    A[Alpha + State A]
    B[Bravo + State B]
    C[Charlie + State C]

Use: Agent-specific state, no sharing needed.

Workflow State (Lima)

{
  "workflow_id": "wf-123",
  "current_step": 3,
  "state": {
    "lead_validated": true,
    "enrichment_complete": true,
    "email_generated": false
  },
  "history": [
    {"agent": "ALPHA001", "result": "success"},
    {"agent": "ECHO001", "result": "success"}
  ]
}

Error Handling

Retry Pattern

flowchart TB
    Task[Execute Task] --> Check{Success?}
    Check -->|Yes| Done[Complete]
    Check -->|No| Retry{Attempts < Max?}
    Retry -->|Yes| Wait[Backoff Wait]
    Wait --> Task
    Retry -->|No| Escalate[Escalate/Fail]

Circuit Breaker

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: Failures > Threshold
    Open --> HalfOpen: Timeout Elapsed
    HalfOpen --> Closed: Success
    HalfOpen --> Open: Failure

Dead Letter Queue

Failed messages go to a DLQ for manual review:

Main Queue → Agent → Success ✓
     ↓
  Failure (after retries)
     ↓
Dead Letter Queue → Human Review

Best Practices

Start with Orchestrator-Worker — Simplest pattern, optimize later
Use async for high volume — Don't block on slow operations
Implement idempotency — Safe to retry any operation
Log everything — Correlation IDs across agents
Set timeouts — Don't let agents hang forever
Design for failure — Every call can fail

Pattern 1: Orchestrator-Worker​

Pattern 2: Pipeline (Sequential)​

Pattern 3: Fan-Out / Fan-In​

Pattern 4: Peer-to-Peer​

Pattern 5: Event-Driven​

Communication Protocols​

Synchronous (Request-Response)​

Asynchronous (Message Queue)​

State Management​

Centralized State (Redis)​

Distributed State (Per-Agent)​

Workflow State (Lima)​

Error Handling​

Retry Pattern​

Circuit Breaker​

Dead Letter Queue​

Best Practices​

Pattern 1: Orchestrator-Worker

Pattern 2: Pipeline (Sequential)

Pattern 3: Fan-Out / Fan-In

Pattern 4: Peer-to-Peer

Pattern 5: Event-Driven

Communication Protocols

Synchronous (Request-Response)

Asynchronous (Message Queue)

State Management

Centralized State (Redis)

Distributed State (Per-Agent)

Workflow State (Lima)

Error Handling

Retry Pattern

Circuit Breaker

Dead Letter Queue

Best Practices