Canyon
Book a Demo
Canyon/Docs/Scalability
Infrastructure

Scalability

Canyon scales horizontally at every layer. Stateless control plane, auto-scaling sandbox pools, multi-AZ managed databases and a governed data layer that caches by default. Customers are running hundreds of Canyon apps in production today.

BetaUpdated April 2026ReferenceCanyon v1.0

BetaThese docs are in beta
Expect gaps and drift from the live product. Something unclear or missing? Grab 30 minutes with the team and we will walk you through it.
Talk to support

What it holds up to

Canyon is production infrastructure for customers running hundreds of apps on a single tenant. The shape of the scale problem is not millions of QPS — it is hundreds of simultaneous agent runs, thousands of concurrent sandboxes, and bursty generation workloads against governed data sources.

240+
apps in production on a single tenant
70
new apps shipped in 10 days at one customer
< 2s
cold start for a fresh sandbox
99.9%
control-plane SLO target

Control plane

The API is stateless and horizontally scalable. A load balancer fronts the cluster; a distributed cache handles session data and pub/sub events. Autoscaling is triggered by CPU saturation, request latency, and queue depth — whichever binds first.

CPU > 70%Add a control-plane pod.
Request latency > 500msAdd a control-plane pod.
Queue depth > 100Add a control-plane pod.

Topology

Load balancer
      │
      ├──► control-plane pod 1 ──┐
      ├──► control-plane pod 2 ──┼──► Managed PostgreSQL (multi-AZ)
      ├──► control-plane pod 3 ──┤
      └──► control-plane pod N ──┴──► Distributed cache (clustered)

Sandbox fleet

Sandbox capacity auto-scales across two node pools: a cost-optimised pool (spot / preemptible) for ephemeral sandboxes and an on-demand pool for long-running or premium-tier workloads. A small pre-warmed buffer absorbs bursts without cold-start latency.

Cost-optimised pool
Ephemeral sandboxes on spot / preemptible capacity. Canyon drains gracefully on interruption and recreates on the next request.
On-demand pool
Reserved capacity for long-running generation jobs and premium workloads where interruption is unacceptable.
Pre-warm pools
Canyon keeps a small buffer of initialised sandboxes ready to attach. First sandbox attach returns in under two seconds, even under cold-cache conditions.

Data layer

Operational PostgreSQL
  • Multi-AZ deployment
  • Read replicas
  • Connection pooling
  • Auto-scaling storage
Distributed cache
  • Clustered
  • Session caching
  • Rate limiting
  • Pub/sub events
Canyon semantic layer
  • Pre-aggregations
  • Query caching
  • Rollup scheduling
  • Horizontal shards

Model traffic

Model usage is bounded by prompt caching, batching, and per-role model routing — small fast models for classification, larger reasoning models for generation. Rate limits are handled with exponential backoff; token budgets are enforced per request at the orchestrator.

Prompt caching
Large cached system prompts shared across Intent, Planner and Conversation. Significant cost reductions on cache hits.
Request batching
Where the provider supports it, validation and multi-file operations are batched into single API calls.
Per-role model routing
Intent uses a fast, cheap classifier. Planner and Builder use the larger reasoning model. Model selection is configurable per organisation.
Provider failover
Automatic failover to a secondary provider or region when a rate cap is hit.
Token budgets
Per-request budgets enforced at the orchestrator level. A single user turn cannot exceed its allotted tokens.
Multi-region deployments
Active-active multi-region deployments are in alpha with design partners for customers with strict latency or data-residency requirements. Talk to us if that’s relevant.