Scalability
Canyon scales horizontally at every layer. Stateless control plane, auto-scaling sandbox pools, multi-AZ managed databases and a governed data layer that caches by default. Customers are running hundreds of Canyon apps in production today.
What it holds up to
Canyon is production infrastructure for customers running hundreds of apps on a single tenant. The shape of the scale problem is not millions of QPS — it is hundreds of simultaneous agent runs, thousands of concurrent sandboxes, and bursty generation workloads against governed data sources.
Control plane
The API is stateless and horizontally scalable. A load balancer fronts the cluster; a distributed cache handles session data and pub/sub events. Autoscaling is triggered by CPU saturation, request latency, and queue depth — whichever binds first.
CPU > 70%Add a control-plane pod.Request latency > 500msAdd a control-plane pod.Queue depth > 100Add a control-plane pod.Topology
Load balancer
│
├──► control-plane pod 1 ──┐
├──► control-plane pod 2 ──┼──► Managed PostgreSQL (multi-AZ)
├──► control-plane pod 3 ──┤
└──► control-plane pod N ──┴──► Distributed cache (clustered)Sandbox fleet
Sandbox capacity auto-scales across two node pools: a cost-optimised pool (spot / preemptible) for ephemeral sandboxes and an on-demand pool for long-running or premium-tier workloads. A small pre-warmed buffer absorbs bursts without cold-start latency.
Data layer
- Multi-AZ deployment
- Read replicas
- Connection pooling
- Auto-scaling storage
- Clustered
- Session caching
- Rate limiting
- Pub/sub events
- Pre-aggregations
- Query caching
- Rollup scheduling
- Horizontal shards
Model traffic
Model usage is bounded by prompt caching, batching, and per-role model routing — small fast models for classification, larger reasoning models for generation. Rate limits are handled with exponential backoff; token budgets are enforced per request at the orchestrator.