Scaling Swarms
How to grow swarm capacity as demand increases.
Scaling Dimensions
1. Horizontal Scaling (More Instances)
Add more instances of the same agent:
flowchart LR
subgraph before [Before: 1 Alpha]
A1[Alpha-1]
end
subgraph after [After: 3 Alphas]
A1b[Alpha-1]
A2[Alpha-2]
A3[Alpha-3]
end
before --> |Scale Out| after
When: Single agent can't keep up with volume.
2. Vertical Scaling (Bigger Agents)
Use more powerful models or hardware:
| Level | Model | Cost | Speed |
|---|---|---|---|
| Base | GPT-3.5 | $0.001 | Fast |
| Standard | GPT-4o | $0.01 | Medium |
| Premium | Claude Opus | $0.03 | Slower |
When: Quality needs improvement, not volume.
3. Specialization (More Agent Types)
Add specialized agents for specific domains:
flowchart TB
subgraph generic [Generic]
Delta[Delta - General]
end
subgraph specialized [Specialized]
DeltaMM[Delta-MM<br/>MoneyMatcher]
DeltaEHMP[Delta-EHMP<br/>Sequoia]
DeltaCustom[Delta-X<br/>Custom]
end
generic --> specialized
When: Domain-specific accuracy matters.
Scaling Strategies
Auto-Scaling Based on Queue Depth
flowchart TB
Queue[Task Queue] --> Monitor[Monitor Depth]
Monitor --> Check{Depth > Threshold?}
Check -->|Yes| ScaleUp[Add Agent Instance]
Check -->|No| CheckLow{Depth < Min?}
CheckLow -->|Yes| ScaleDown[Remove Instance]
CheckLow -->|No| Maintain[Maintain Count]
Configuration:
autoscaling:
agent: ALPHA001
min_instances: 1
max_instances: 10
scale_up_threshold: 100 # queue depth
scale_down_threshold: 10
cooldown_seconds: 60
Time-Based Scaling
Scale proactively for known patterns:
schedule:
- cron: "0 9 * * 1-5" # 9 AM weekdays
instances: 5
- cron: "0 18 * * 1-5" # 6 PM weekdays
instances: 2
- cron: "0 0 * * 6-7" # Weekends
instances: 1
Event-Based Scaling
Scale for specific events:
// Large batch import detected
if (batchSize > 1000) {
await swarm.scale('ALPHA001', { instances: 5 });
await swarm.scale('CHARLIE001', { instances: 3 });
}
Capacity Planning
Throughput Calculation
Agent Capacity = (1 / Avg Processing Time) × Instance Count
Example:
- Alpha processes 1 task in 200ms (5 tasks/sec)
- With 3 instances: 15 tasks/sec = 54,000 tasks/hour
Cost Modeling
| Scale Level | Instances | Cost/Hour | Capacity |
|---|---|---|---|
| Minimal | 1 each | $0.50 | 1,000/hr |
| Standard | 3 each | $1.50 | 3,000/hr |
| High | 10 each | $5.00 | 10,000/hr |
| Maximum | 50 each | $25.00 | 50,000/hr |
Load Balancing
Round-Robin
Simple rotation through instances:
flowchart LR
LB[Load Balancer]
LB --> |1| A1[Alpha-1]
LB --> |2| A2[Alpha-2]
LB --> |3| A3[Alpha-3]
LB --> |4| A1
Least Connections
Route to least busy instance:
flowchart TB
Task[New Task]
Task --> LB{Least Busy?}
LB --> |3 active| A1[Alpha-1]
LB --> |1 active| A2[Alpha-2]
LB --> |5 active| A3[Alpha-3]
Weighted Distribution
Prefer certain instances:
weights:
alpha-1: 50 # Faster hardware
alpha-2: 30
alpha-3: 20 # Slower
Bottleneck Identification
Monitoring Metrics
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Queue Depth | Under 50 | 50-200 | Over 200 |
| Processing Time | Under spec | +20% | +50% |
| Error Rate | Under 1% | 1-5% | Over 5% |
| Instance CPU | Under 70% | 70-90% | Over 90% |
Common Bottlenecks
-
Orchestrator Overload
- Symptom: Lima CPU high, all agents idle
- Fix: Optimize routing logic, add Lima instances
-
Database Contention
- Symptom: All agents slow, DB CPU high
- Fix: Add read replicas, caching, query optimization
-
External API Limits
- Symptom: Echo failing, rate limit errors
- Fix: Implement queuing, request API limit increase
-
Network Saturation
- Symptom: High latency across all agents
- Fix: Optimize payloads, regional deployment
Best Practices
- Start small, scale gradually — Don't over-provision initially
- Monitor before scaling — Understand the actual bottleneck
- Test scale events — Verify scaling works before production
- Set resource limits — Prevent runaway costs
- Plan for bursts — Buffer capacity for spikes
- Scale down aggressively — Don't pay for idle capacity