Scaling Swarms

How to grow swarm capacity as demand increases.

Scaling Dimensions

1. Horizontal Scaling (More Instances)

Add more instances of the same agent:

flowchart LR
    subgraph before [Before: 1 Alpha]
        A1[Alpha-1]
    end
    
    subgraph after [After: 3 Alphas]
        A1b[Alpha-1]
        A2[Alpha-2]
        A3[Alpha-3]
    end
    
    before --> |Scale Out| after

When: Single agent can't keep up with volume.

2. Vertical Scaling (Bigger Agents)

Use more powerful models or hardware:

Level	Model	Cost	Speed
Base	GPT-3.5	$0.001	Fast
Standard	GPT-4o	$0.01	Medium
Premium	Claude Opus	$0.03	Slower

When: Quality needs improvement, not volume.

3. Specialization (More Agent Types)

Add specialized agents for specific domains:

flowchart TB
    subgraph generic [Generic]
        Delta[Delta - General]
    end
    
    subgraph specialized [Specialized]
        DeltaMM[Delta-MM<br/>MoneyMatcher]
        DeltaEHMP[Delta-EHMP<br/>Sequoia]
        DeltaCustom[Delta-X<br/>Custom]
    end
    
    generic --> specialized

When: Domain-specific accuracy matters.

Scaling Strategies

Auto-Scaling Based on Queue Depth

flowchart TB
    Queue[Task Queue] --> Monitor[Monitor Depth]
    Monitor --> Check{Depth > Threshold?}
    Check -->|Yes| ScaleUp[Add Agent Instance]
    Check -->|No| CheckLow{Depth < Min?}
    CheckLow -->|Yes| ScaleDown[Remove Instance]
    CheckLow -->|No| Maintain[Maintain Count]

Configuration:

autoscaling:
  agent: ALPHA001
  min_instances: 1
  max_instances: 10
  scale_up_threshold: 100  # queue depth
  scale_down_threshold: 10
  cooldown_seconds: 60

Time-Based Scaling

Scale proactively for known patterns:

schedule:
  - cron: "0 9 * * 1-5"   # 9 AM weekdays
    instances: 5
  - cron: "0 18 * * 1-5"  # 6 PM weekdays
    instances: 2
  - cron: "0 0 * * 6-7"   # Weekends
    instances: 1

Event-Based Scaling

Scale for specific events:

// Large batch import detected
if (batchSize > 1000) {
  await swarm.scale('ALPHA001', { instances: 5 });
  await swarm.scale('CHARLIE001', { instances: 3 });
}

Capacity Planning

Throughput Calculation

Agent Capacity = (1 / Avg Processing Time) × Instance Count

Example:
- Alpha processes 1 task in 200ms (5 tasks/sec)
- With 3 instances: 15 tasks/sec = 54,000 tasks/hour

Cost Modeling

Scale Level	Instances	Cost/Hour	Capacity
Minimal	1 each	$0.50	1,000/hr
Standard	3 each	$1.50	3,000/hr
High	10 each	$5.00	10,000/hr
Maximum	50 each	$25.00	50,000/hr

Load Balancing

Round-Robin

Simple rotation through instances:

flowchart LR
    LB[Load Balancer]
    LB --> |1| A1[Alpha-1]
    LB --> |2| A2[Alpha-2]
    LB --> |3| A3[Alpha-3]
    LB --> |4| A1

Least Connections

Route to least busy instance:

flowchart TB
    Task[New Task]
    Task --> LB{Least Busy?}
    LB --> |3 active| A1[Alpha-1]
    LB --> |1 active| A2[Alpha-2]
    LB --> |5 active| A3[Alpha-3]

Weighted Distribution

Prefer certain instances:

weights:
  alpha-1: 50  # Faster hardware
  alpha-2: 30
  alpha-3: 20  # Slower

Bottleneck Identification

Monitoring Metrics

Metric	Healthy	Warning	Critical
Queue Depth	Under 50	50-200	Over 200
Processing Time	Under spec	+20%	+50%
Error Rate	Under 1%	1-5%	Over 5%
Instance CPU	Under 70%	70-90%	Over 90%

Common Bottlenecks

Orchestrator Overload
- Symptom: Lima CPU high, all agents idle
- Fix: Optimize routing logic, add Lima instances
Database Contention
- Symptom: All agents slow, DB CPU high
- Fix: Add read replicas, caching, query optimization
External API Limits
- Symptom: Echo failing, rate limit errors
- Fix: Implement queuing, request API limit increase
Network Saturation
- Symptom: High latency across all agents
- Fix: Optimize payloads, regional deployment

Best Practices

Start small, scale gradually — Don't over-provision initially
Monitor before scaling — Understand the actual bottleneck
Test scale events — Verify scaling works before production
Set resource limits — Prevent runaway costs
Plan for bursts — Buffer capacity for spikes
Scale down aggressively — Don't pay for idle capacity

Scaling Dimensions​

1. Horizontal Scaling (More Instances)​

2. Vertical Scaling (Bigger Agents)​

3. Specialization (More Agent Types)​

Scaling Strategies​

Auto-Scaling Based on Queue Depth​

Time-Based Scaling​

Event-Based Scaling​

Capacity Planning​

Throughput Calculation​

Cost Modeling​

Load Balancing​

Round-Robin​

Least Connections​

Weighted Distribution​

Bottleneck Identification​

Monitoring Metrics​

Common Bottlenecks​

Best Practices​