Infrastructure

Scalability

Canyon scales horizontally at every layer. Stateless control plane, auto-scaling sandbox pools, multi-AZ managed databases and a governed data layer that caches by default. Customers are running hundreds of Canyon apps in production today.

BetaUpdated April 2026•Reference•Canyon v1.0

BetaThese docs are in beta

Expect gaps and drift from the live product. Something unclear or missing? Grab 30 minutes with the team and we will walk you through it.

Talk to support

What it holds up to

Canyon is production infrastructure for customers running hundreds of apps on a single tenant. The shape of the scale problem is not millions of QPS. it is hundreds of simultaneous agent runs, thousands of concurrent sandboxes, and bursty generation workloads against governed data sources.

100s

apps in production on a single tenant

Days

from prompt to shipped internal app

< 2s

cold start for a fresh sandbox

99.9%

control-plane SLO target

Control plane

The API is stateless and horizontally scalable. A load balancer fronts the cluster; a distributed cache handles session data and pub/sub events. Autoscaling is triggered by CPU saturation, request latency, and queue depth. whichever binds first.

CPU > 70%Add a control-plane pod.

Request latency > 500msAdd a control-plane pod.

Queue depth > 100Add a control-plane pod.

Topology

Load balancer
      │
      ├──► control-plane pod 1 ──┐
      ├──► control-plane pod 2 ──┼──► Managed PostgreSQL (multi-AZ)
      ├──► control-plane pod 3 ──┤
      └──► control-plane pod N ──┴──► Distributed cache (clustered)

Sandbox fleet

Sandbox capacity auto-scales across two node pools: a cost-optimised pool (spot / preemptible) for ephemeral sandboxes and an on-demand pool for long-running or premium-tier workloads. A small pre-warmed buffer absorbs bursts without cold-start latency.

Cost-optimised pool

Ephemeral sandboxes on spot / preemptible capacity. Canyon drains gracefully on interruption and recreates on the next request.

On-demand pool

Reserved capacity for long-running generation jobs and premium workloads where interruption is unacceptable.

Pre-warm pools

Canyon keeps a small buffer of initialised sandboxes ready to attach. First sandbox attach returns in under two seconds, even under cold-cache conditions.

Data layer

Operational PostgreSQL

Multi-AZ deployment
Read replicas
Connection pooling
Auto-scaling storage

Distributed cache

Clustered
Session caching
Rate limiting
Pub/sub events

Canyon semantic layer

Pre-aggregations
Query caching
Rollup scheduling
Horizontal shards

Model traffic

Model usage is bounded by prompt caching, batching, and per-role model routing. small fast models for classification, larger reasoning models for generation. Rate limits are handled with exponential backoff; token budgets are enforced per request at the orchestrator.

Prompt caching

Large cached system prompts shared across Intent, Planner and Conversation. Significant cost reductions on cache hits.

Request batching

Where the provider supports it, validation and multi-file operations are batched into single API calls.

Per-role model routing

Intent uses a fast, cheap classifier. Planner and Builder use the larger reasoning model. Model selection is configurable per organisation.

Provider failover

Automatic failover to a secondary provider or region when a rate cap is hit.

Token budgets

Per-request budgets enforced at the orchestrator level. A single user turn cannot exceed its allotted tokens.

Multi-region deployments

Active-active multi-region deployments are in alpha with design partners for customers with strict latency or data-residency requirements. Talk to us if that’s relevant.

← Previous

Deployment

Security