Skip to main content

Scaling Swarms

How to grow swarm capacity as demand increases.

Scaling Dimensions

1. Horizontal Scaling (More Instances)

Add more instances of the same agent:

flowchart LR
subgraph before [Before: 1 Alpha]
A1[Alpha-1]
end

subgraph after [After: 3 Alphas]
A1b[Alpha-1]
A2[Alpha-2]
A3[Alpha-3]
end

before --> |Scale Out| after

When: Single agent can't keep up with volume.

2. Vertical Scaling (Bigger Agents)

Use more powerful models or hardware:

LevelModelCostSpeed
BaseGPT-3.5$0.001Fast
StandardGPT-4o$0.01Medium
PremiumClaude Opus$0.03Slower

When: Quality needs improvement, not volume.

3. Specialization (More Agent Types)

Add specialized agents for specific domains:

flowchart TB
subgraph generic [Generic]
Delta[Delta - General]
end

subgraph specialized [Specialized]
DeltaMM[Delta-MM<br/>MoneyMatcher]
DeltaEHMP[Delta-EHMP<br/>Sequoia]
DeltaCustom[Delta-X<br/>Custom]
end

generic --> specialized

When: Domain-specific accuracy matters.

Scaling Strategies

Auto-Scaling Based on Queue Depth

flowchart TB
Queue[Task Queue] --> Monitor[Monitor Depth]
Monitor --> Check{Depth > Threshold?}
Check -->|Yes| ScaleUp[Add Agent Instance]
Check -->|No| CheckLow{Depth < Min?}
CheckLow -->|Yes| ScaleDown[Remove Instance]
CheckLow -->|No| Maintain[Maintain Count]

Configuration:

autoscaling:
agent: ALPHA001
min_instances: 1
max_instances: 10
scale_up_threshold: 100 # queue depth
scale_down_threshold: 10
cooldown_seconds: 60

Time-Based Scaling

Scale proactively for known patterns:

schedule:
- cron: "0 9 * * 1-5" # 9 AM weekdays
instances: 5
- cron: "0 18 * * 1-5" # 6 PM weekdays
instances: 2
- cron: "0 0 * * 6-7" # Weekends
instances: 1

Event-Based Scaling

Scale for specific events:

// Large batch import detected
if (batchSize > 1000) {
await swarm.scale('ALPHA001', { instances: 5 });
await swarm.scale('CHARLIE001', { instances: 3 });
}

Capacity Planning

Throughput Calculation

Agent Capacity = (1 / Avg Processing Time) × Instance Count

Example:
- Alpha processes 1 task in 200ms (5 tasks/sec)
- With 3 instances: 15 tasks/sec = 54,000 tasks/hour

Cost Modeling

Scale LevelInstancesCost/HourCapacity
Minimal1 each$0.501,000/hr
Standard3 each$1.503,000/hr
High10 each$5.0010,000/hr
Maximum50 each$25.0050,000/hr

Load Balancing

Round-Robin

Simple rotation through instances:

flowchart LR
LB[Load Balancer]
LB --> |1| A1[Alpha-1]
LB --> |2| A2[Alpha-2]
LB --> |3| A3[Alpha-3]
LB --> |4| A1

Least Connections

Route to least busy instance:

flowchart TB
Task[New Task]
Task --> LB{Least Busy?}
LB --> |3 active| A1[Alpha-1]
LB --> |1 active| A2[Alpha-2]
LB --> |5 active| A3[Alpha-3]

Weighted Distribution

Prefer certain instances:

weights:
alpha-1: 50 # Faster hardware
alpha-2: 30
alpha-3: 20 # Slower

Bottleneck Identification

Monitoring Metrics

MetricHealthyWarningCritical
Queue DepthUnder 5050-200Over 200
Processing TimeUnder spec+20%+50%
Error RateUnder 1%1-5%Over 5%
Instance CPUUnder 70%70-90%Over 90%

Common Bottlenecks

  1. Orchestrator Overload

    • Symptom: Lima CPU high, all agents idle
    • Fix: Optimize routing logic, add Lima instances
  2. Database Contention

    • Symptom: All agents slow, DB CPU high
    • Fix: Add read replicas, caching, query optimization
  3. External API Limits

    • Symptom: Echo failing, rate limit errors
    • Fix: Implement queuing, request API limit increase
  4. Network Saturation

    • Symptom: High latency across all agents
    • Fix: Optimize payloads, regional deployment

Best Practices

  1. Start small, scale gradually — Don't over-provision initially
  2. Monitor before scaling — Understand the actual bottleneck
  3. Test scale events — Verify scaling works before production
  4. Set resource limits — Prevent runaway costs
  5. Plan for bursts — Buffer capacity for spikes
  6. Scale down aggressively — Don't pay for idle capacity