Heatwave Kamal Stack — Architecture & Index
The containerized deployment stack that replaced Capistrano + Passenger. This directory is the single source of truth for the new infrastructure.
| Doc | Covers |
|---|---|
| README.md (this file) | Stack inventory, status, master architecture + network diagrams |
| DEPLOYING.md | The deploy guidebook — bin/deploy, the deploy lifecycle, migrations, rollback |
| MANAGING.md | Day-2 operations — accessories, DB restore, mailpit, secrets, scaling, provisioning a new box |
| TROUBLESHOOTING.md | Runbook for the failure modes we’ve actually hit |
Status (2026-06-14). Production and staging both run on Kamal on Latitude bare-metal. Dallas (
dal-latitude-heatwave-01, Tailscale100.123.47.52) is the primary and hosts both environments; a cross-DC PostgreSQL standby runs in Chicago (chi-latitude-heatwave-02,100.68.157.49). The Capistrano + Passenger + Vultr stack was retired at the 2026-06-07 cutover. Historical record:doc/tasks/202606022303_KAMAL_MIGRATION.md(cutover) anddoc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md(two-region HA end-state).Note: the network diagram and a few body sections below still describe the pre-cutover Vultr topology and are being refreshed.
What changed vs. Capistrano
Section titled “What changed vs. Capistrano”| Concern | Old (Capistrano + Passenger) | New (Kamal) |
|---|---|---|
| Unit of deploy | git pull + bundle on the host | An OCI image built once, pushed, rolled out |
| Web server | Passenger (Apache/nginx) | Thruster → Puma in a container |
| Zero-downtime | Passenger restart | kamal-proxy rolling swap on a /up health check |
| Ingress | nginx + origin TLS | Cloudflare Tunnel (no public ports, no origin TLS) |
| Asset bridging | linked_dirs public/javascripts/webpack | Kamal asset_path host volume |
| Datastores | External Postgres / Managed Valkey | Kamal accessories (PG18 + Valkey ×3 + pgbouncer + HAProxy), co-located on the Latitude boxes in both envs |
| Secrets | config/master.key on host | 1Password resolver-only .kamal/secrets* (tracked in git) |
| Deploy command | bin/deploy | bin/deploy → kamal deploy |
| Provisioning | Hand-built hosts | Terraform/OpenTofu (infra/terraform/) + cloud-init |
Stack inventory
Section titled “Stack inventory”Every moving piece of the new stack and where it’s configured.
Compute & orchestration
Section titled “Compute & orchestration”- Kamal 2.x — orchestrates build → push → rolling deploy. Config:
config/deploy.yml(base/prod),config/deploy.staging.yml(staging overrides). - kamal-proxy — per-host reverse proxy giving zero-downtime rolling swaps.
Listens on host
:80, health-checks/up, no TLS (ssl: false). - Docker — installed by cloud-init (
get.docker.com). All app + accessory containers attach to thekamaldocker network and resolve each other by name.
The application image
Section titled “The application image”Dockerfile— multi-stage (base→build→final), baseruby:4.0.5-slim. Build stage compiles gems + Yarn 4 / webpack assets; final stage is a slim runtime (gems + app + built assets, non-rootrailsuser uid 1001). Entry:bin/docker-entrypoint; CMDbin/thrust bin/rails server.- Thruster — HTTP/2 + X-Sendfile front, listens
:80, proxies to Puma:3000. - Registry — everything is on GitHub Container Registry (no Vultr CR):
the app image is
ghcr.io/warmlyyours/heatwaveand the custom Postgres accessory image isghcr.io/warmlyyours/heatwave-postgres:18(the host’s singleghcr.iologin covers both).
Roles (containers Kamal runs)
Section titled “Roles (containers Kamal runs)”web— Puma (4 workers × 3 threads, jemalloc), behind kamal-proxy.sidekiq— a single consolidated Sidekiq process (SIDEKIQ_CONSOLIDATED=1) running the high/low/campaign capsules + thedefaultset + the scheduler in one container.cmd: bundle exec sidekiq -C config/sidekiq.yml. Sidekiq Prosuper_fetchmakes rolling restarts safe;.kamal/hooks/pre-deployquiets it (TSTP) before the swap.
Accessories (co-located on the box; staging detail below)
Section titled “Accessories (co-located on the box; staging detail below)”Both environments run their datastores as Kamal accessories on the Latitude boxes (prod splits Postgres across Dallas + Chicago — see the note below the table). The staging-specific accessories are:
postgres— custom PG18 image (ghcr.io/warmlyyours/heatwave-postgres:18, built fromdocker/postgresql.Dockerfile) withpgvector,hypopg,pg_repack,pg_stat_statements. Tuned down (shared_buffers=8GB) because the box is shared with the prod stack. Data on a host volume. Host-published127.0.0.1:5432for localpsql; the app reaches it asheatwave-postgreson thekamalnetwork.- Valkey ×3 —
valkey/valkey:9.1in a 3-flavor split:heatwave-staging-valkey-cache(allkeys-lru),-sessions(noeviction),-queue(noeviction+ AOF).RedisConfigroutes to them per logical DB viaREDIS_CACHE_HOST/REDIS_SESSIONS_HOST/REDIS_QUEUE_HOST(no singleREDIS_HOST). Internal to thekamalnetwork — not host-published. Mirrors the prod split (heatwave-valkey-{cache,sessions,queue}). mailpit— SMTP sink + web UI. App/sidekiq deliver toheatwave-mailpit:1025; the UI is bound to the Tailscale interface only (http://100.123.47.52:8025), so captured staging mail (reset tokens etc.) is never publicly exposed.
Production runs the same accessories, just split across two Latitude boxes: a PG18 primary in Dallas (
heatwave-postgres) with a cross-DC streaming standby in Chicago (heatwave-postgres-replica), fronted by per-nodepgbouncerand a TCP write-VIP HAProxy (heatwave-haproxy:6433, the app’sDATABASE_HOST) so apg_promoteflip reroutes with no app redeploy; the same 3-flavor Valkey split (heatwave-valkey-cache/-sessions/-queue); and Databasus PITR → Cloudflare R2 backups off the Chicago standby. The old Vultr Postgres (db4/db3) and Vultr Managed Valkey are gone. Full current topology, hosts, ports, and image tags:doc/infrastructure/INFRASTRUCTURE_INVENTORY.mdanddoc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md.
Ingress & network
Section titled “Ingress & network”- Cloudflare Tunnel (
cloudflared, host systemd service, remotely managed — ingress configured in Cloudflare, not on the box). Outbound-only QUIC; the only inbound web path. Routescrm/www/api/mcp.warmlyyours.ws → http://localhost:80. - Cloudflare Access — SSO gate (the
wy-employeesgroup) in front of every staging hostname. - Tailscale — the admin/SSH plane (and, in the HA end-state, cross-region DB
replication). Hosts get
100.xaddresses; SSH is Tailscale-only. - Firewall, defense-in-depth — Latitude edge firewall (SSH from the Tailscale
CGNAT range
100.64.0.0/10only) + host UFW (default-deny inbound, allowlo+tailscale0+:22) + a DOCKER-USER iptables chain that blocks public:80/:443(Docker bypasses UFW for published ports) + Cloudflare Access.
Secrets
Section titled “Secrets”.kamal/secrets-common— shared:RAILS_MASTER_KEY(=config/master.key),BUNDLE_GEMS__CONTRIBSYS__COM(Sidekiq Pro),KAMAL_REGISTRY_PASSWORD(GHCR)..kamal/secrets.staging— staging PG password + thestagingHeatwave::Configurationenv-key..kamal/secrets— prod PG password +productionenv-key (op://IT/Heatwave-Postgresmust be created before cutover).- All three are resolver-only (Kamal’s 1Password adapter — no literal secrets) and therefore committed. See MANAGING.md → Secrets.
Provisioning (Infrastructure as Code)
Section titled “Provisioning (Infrastructure as Code)”infra/terraform/latitude/— provisions a Latitude bare-metal box: SSH keys, cloud-init (deploy user uid 1001, Docker, Tailscale, UFW + DOCKER-USER, cloudflared), RAID-1, edge firewall.infra/terraform/cloudflare/— the tunnel (remotely managed) + DNS CNAMEs + Access app/policy for*.warmlyyours.ws.infra/terraform/(root) — the original Vultr provisioning module (being retired in favour of Latitude).
Deploy tooling & lifecycle hooks
Section titled “Deploy tooling & lifecycle hooks”bin/deploy— the wrapper aroundkamal deploy(clean-tree gate, 1Password unlock, gated migrations, sourcemap upload, edge-cache purge). See DEPLOYING.md..kamal/hooks/pre-build— stampsREVISION(git SHA) into the build context so webpack/AppSignal report a real revision..kamal/hooks/pre-deploy— quiets Sidekiq (TSTP) before the swap..kamal/hooks/post-deploy— clearsREVISION+ the Sidekiq quiet marker.script/db_restore_kamal.sh— fast+deferred DB restore into the staging Postgres accessory (see MANAGING.md → Database restore).
Master architecture — staging (live)
Section titled “Master architecture — staging (live)”flowchart TB user([User / browser])
subgraph CF["Cloudflare edge"] tls["TLS termination<br/>+ WAF + cache"] access["Access SSO gate<br/>(wy-employees group)"] cft["Cloudflare Tunnel<br/>crm/www/api/mcp.warmlyyours.ws"] end
subgraph BOX["Latitude bare-metal — dal-latitude-heatwave-01 (Tailscale 100.123.47.52)"] direction TB cfd["cloudflared<br/>(host systemd, outbound QUIC)"] proxy["kamal-proxy :80<br/>(rolling swap, /up healthcheck)"]
subgraph NET["docker network: kamal"] direction TB web["web container<br/>Thruster :80 → Puma :3000"] sidekiq["sidekiq container<br/>consolidated capsules + scheduler"] pg[("postgres accessory<br/>PG18 · heatwave + heatwave_versions")] valkey[("valkey accessories ×3<br/>cache / sessions / queue")] mailpit["mailpit accessory<br/>SMTP :1025 / UI :8025"] end end
admin([Operator]) -. "SSH / psql / mailpit UI<br/>over Tailscale" .-> BOX
user -->|HTTPS| tls --> access --> cft cft -->|"QUIC (dialed out by cloudflared)"| cfd cfd -->|"http://localhost:80"| proxy --> web web --> pg & valkey web -->|SMTP| mailpit sidekiq --> pg & valkeyRequest path: browser → Cloudflare (TLS, Access SSO) → Cloudflare Tunnel →
cloudflared on the box → http://localhost:80 (kamal-proxy) → web container
(Thruster :80 → Puma :3000). No inbound web ports are open on the host; the
tunnel is dialed outbound.
Production topology — pre-cutover snapshot (historical)
Section titled “Production topology — pre-cutover snapshot (historical)”Historical. This section and the diagram below capture the pre-cutover Vultr + Capistrano topology and the original Kamal target. Production cut over to Kamal on Latitude on 2026-06-07 (Dallas primary + Chicago standby); see the Status note at the top of this file and
INFRASTRUCTURE_INVENTORY.mdfor the current state.
flowchart TB user([User]) -->|HTTPS| cf["Cloudflare edge<br/>(TLS + WAF + Access on CRM)"] cf -->|Tunnel| cfd["cloudflared (host)"] cfd -->|"localhost:80"| proxy["kamal-proxy"]
subgraph WEB1["web1 (Vultr, Ubuntu 26.04) — TODO provision"] proxy --> web["web container (Puma)"] proxy -.-> sk["sidekiq container<br/>(consolidated, co-located to start)"] end
web -->|"public IP 45.63.79.22:5432<br/>(firewall allowlist + SCRAM)"| db4[("db4 — Postgres PRIMARY<br/>heatwave + heatwave_versions")] db4 -. "async replication" .-> db3[("db3 — replica")] web -->|"TLS, allowlist"| valkey[("Vultr Managed Valkey 7")] sk --> db4 & valkeyCutover prerequisites (gated): create op://IT/Heatwave-Postgres, provision
web1, add its public IP to the db4 firewall group + the Valkey allowlist, then
bin/deploy production. Full sequence in
doc/tasks/202606022303_KAMAL_MIGRATION.md.
Network & security layers
Section titled “Network & security layers”flowchart LR subgraph internet["Public internet"] u([User]) ; op([Operator]) end
subgraph edge["Layer 1 — Cloudflare"] e1["TLS + WAF + rate limiting"] e2["Access SSO (Zero Trust)"] end
subgraph latfw["Layer 2 — Latitude edge firewall"] l1["inbound :22 ← 100.64.0.0/10 only<br/>(Tailscale CGNAT) · default-deny"] end
subgraph host["Layer 3 — host (UFW + DOCKER-USER)"] h1["UFW: default-deny in,<br/>allow lo + tailscale0 + :22"] h2["DOCKER-USER: DROP public :80/:443,<br/>RETURN on tailscale0"] end
subgraph app["Layer 4 — app"] a1["accessories bound to 127.0.0.1<br/>or the Tailscale IP — never 0.0.0.0"] a2["web reachable only via the Tunnel"] end
u -->|web| e1 --> e2 -->|"Tunnel (outbound)"| a2 op -->|SSH/psql/UI| l1 --> h1 --> a1 h2 --- a2The web tier is reachable only through the Cloudflare Tunnel (no public port).
The operator tier (SSH, psql, mailpit UI) is reachable only over Tailscale.
DOCKER-USER exists because Docker inserts iptables rules ahead of UFW for
published ports — without it, a published :80 would be world-reachable despite
UFW’s default-deny.
Key facts at a glance
Section titled “Key facts at a glance”| Thing | Value |
|---|---|
| Live staging host | dal-latitude-heatwave-01, Tailscale 100.123.47.52 (Latitude bare metal, RAID-1) |
| Staging hostnames | crm / www / api / mcp.warmlyyours.ws (TLD env = warmlyyours.ws) |
| Staging Access group | wy-employees (0de0f290-f12c-4046-ae47-b66146f1a4ac) |
| App image | ghcr.io/warmlyyours/heatwave (GHCR) |
| PG accessory image | ghcr.io/warmlyyours/heatwave-postgres:18 (GHCR) |
| Docker network | kamal (app + accessories resolve by name) |
| Web port path | Cloudflare → tunnel → kamal-proxy :80 → Thruster :80 → Puma :3000 |
| Deploy user | deploy, uid 1001 (must match container USER 1001) |
| Cloudflare account | 79b7f58cf035093b5ad11747df30369a |
| Staging zone | warmlyyours.ws (d39acaed475782c4901d4a8e5908c1cb) |
| Prod DB | PG18 primary heatwave-postgres on Dallas (100.123.47.52) + cross-DC streaming standby heatwave-postgres-replica on Chicago (100.68.157.49); app reaches it via HAProxy write-VIP heatwave-haproxy:6433 → pgbouncer. See INFRASTRUCTURE_INVENTORY.md |
| Prod cache/queue | Valkey ×3 — heatwave-valkey-cache / -sessions / -queue (3-flavor split, routed per logical DB) |
| Prod backups | Databasus PITR → Cloudflare R2 (off the Chicago standby) |
| Deploy command | `bin/deploy [staging |