Skip to content

Heatwave Kamal Stack — Architecture & Index

The containerized deployment stack that replaced Capistrano + Passenger. This directory is the single source of truth for the new infrastructure.

DocCovers
README.md (this file)Stack inventory, status, master architecture + network diagrams
DEPLOYING.mdThe deploy guidebook — bin/deploy, the deploy lifecycle, migrations, rollback
MANAGING.mdDay-2 operations — accessories, DB restore, mailpit, secrets, scaling, provisioning a new box
TROUBLESHOOTING.mdRunbook for the failure modes we’ve actually hit

Status (2026-06-14). Production and staging both run on Kamal on Latitude bare-metal. Dallas (dal-latitude-heatwave-01, Tailscale 100.123.47.52) is the primary and hosts both environments; a cross-DC PostgreSQL standby runs in Chicago (chi-latitude-heatwave-02, 100.68.157.49). The Capistrano + Passenger + Vultr stack was retired at the 2026-06-07 cutover. Historical record: doc/tasks/202606022303_KAMAL_MIGRATION.md (cutover) and doc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md (two-region HA end-state).

Note: the network diagram and a few body sections below still describe the pre-cutover Vultr topology and are being refreshed.


ConcernOld (Capistrano + Passenger)New (Kamal)
Unit of deploygit pull + bundle on the hostAn OCI image built once, pushed, rolled out
Web serverPassenger (Apache/nginx)Thruster → Puma in a container
Zero-downtimePassenger restartkamal-proxy rolling swap on a /up health check
Ingressnginx + origin TLSCloudflare Tunnel (no public ports, no origin TLS)
Asset bridginglinked_dirs public/javascripts/webpackKamal asset_path host volume
DatastoresExternal Postgres / Managed ValkeyKamal accessories (PG18 + Valkey ×3 + pgbouncer + HAProxy), co-located on the Latitude boxes in both envs
Secretsconfig/master.key on host1Password resolver-only .kamal/secrets* (tracked in git)
Deploy commandbin/deploybin/deploykamal deploy
ProvisioningHand-built hostsTerraform/OpenTofu (infra/terraform/) + cloud-init

Every moving piece of the new stack and where it’s configured.

  • Kamal 2.x — orchestrates build → push → rolling deploy. Config: config/deploy.yml (base/prod), config/deploy.staging.yml (staging overrides).
  • kamal-proxy — per-host reverse proxy giving zero-downtime rolling swaps. Listens on host :80, health-checks /up, no TLS (ssl: false).
  • Docker — installed by cloud-init (get.docker.com). All app + accessory containers attach to the kamal docker network and resolve each other by name.
  • Dockerfile — multi-stage (basebuildfinal), base ruby:4.0.5-slim. Build stage compiles gems + Yarn 4 / webpack assets; final stage is a slim runtime (gems + app + built assets, non-root rails user uid 1001). Entry: bin/docker-entrypoint; CMD bin/thrust bin/rails server.
  • Thruster — HTTP/2 + X-Sendfile front, listens :80, proxies to Puma :3000.
  • Registryeverything is on GitHub Container Registry (no Vultr CR): the app image is ghcr.io/warmlyyours/heatwave and the custom Postgres accessory image is ghcr.io/warmlyyours/heatwave-postgres:18 (the host’s single ghcr.io login covers both).
  • web — Puma (4 workers × 3 threads, jemalloc), behind kamal-proxy.
  • sidekiq — a single consolidated Sidekiq process (SIDEKIQ_CONSOLIDATED=1) running the high/low/campaign capsules + the default set + the scheduler in one container. cmd: bundle exec sidekiq -C config/sidekiq.yml. Sidekiq Pro super_fetch makes rolling restarts safe; .kamal/hooks/pre-deploy quiets it (TSTP) before the swap.

Accessories (co-located on the box; staging detail below)

Section titled “Accessories (co-located on the box; staging detail below)”

Both environments run their datastores as Kamal accessories on the Latitude boxes (prod splits Postgres across Dallas + Chicago — see the note below the table). The staging-specific accessories are:

  • postgres — custom PG18 image (ghcr.io/warmlyyours/heatwave-postgres:18, built from docker/postgresql.Dockerfile) with pgvector, hypopg, pg_repack, pg_stat_statements. Tuned down (shared_buffers=8GB) because the box is shared with the prod stack. Data on a host volume. Host-published 127.0.0.1:5432 for local psql; the app reaches it as heatwave-postgres on the kamal network.
  • Valkey ×3valkey/valkey:9.1 in a 3-flavor split: heatwave-staging-valkey-cache (allkeys-lru), -sessions (noeviction), -queue (noeviction + AOF). RedisConfig routes to them per logical DB via REDIS_CACHE_HOST / REDIS_SESSIONS_HOST / REDIS_QUEUE_HOST (no single REDIS_HOST). Internal to the kamal network — not host-published. Mirrors the prod split (heatwave-valkey-{cache,sessions,queue}).
  • mailpit — SMTP sink + web UI. App/sidekiq deliver to heatwave-mailpit:1025; the UI is bound to the Tailscale interface only (http://100.123.47.52:8025), so captured staging mail (reset tokens etc.) is never publicly exposed.

Production runs the same accessories, just split across two Latitude boxes: a PG18 primary in Dallas (heatwave-postgres) with a cross-DC streaming standby in Chicago (heatwave-postgres-replica), fronted by per-node pgbouncer and a TCP write-VIP HAProxy (heatwave-haproxy:6433, the app’s DATABASE_HOST) so a pg_promote flip reroutes with no app redeploy; the same 3-flavor Valkey split (heatwave-valkey-cache / -sessions / -queue); and Databasus PITR → Cloudflare R2 backups off the Chicago standby. The old Vultr Postgres (db4/db3) and Vultr Managed Valkey are gone. Full current topology, hosts, ports, and image tags: doc/infrastructure/INFRASTRUCTURE_INVENTORY.md and doc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md.

  • Cloudflare Tunnel (cloudflared, host systemd service, remotely managed — ingress configured in Cloudflare, not on the box). Outbound-only QUIC; the only inbound web path. Routes crm/www/api/mcp.warmlyyours.ws → http://localhost:80.
  • Cloudflare Access — SSO gate (the wy-employees group) in front of every staging hostname.
  • Tailscale — the admin/SSH plane (and, in the HA end-state, cross-region DB replication). Hosts get 100.x addresses; SSH is Tailscale-only.
  • Firewall, defense-in-depth — Latitude edge firewall (SSH from the Tailscale CGNAT range 100.64.0.0/10 only) + host UFW (default-deny inbound, allow lo + tailscale0 + :22) + a DOCKER-USER iptables chain that blocks public :80/:443 (Docker bypasses UFW for published ports) + Cloudflare Access.
  • .kamal/secrets-common — shared: RAILS_MASTER_KEY (= config/master.key), BUNDLE_GEMS__CONTRIBSYS__COM (Sidekiq Pro), KAMAL_REGISTRY_PASSWORD (GHCR).
  • .kamal/secrets.staging — staging PG password + the staging Heatwave::Configuration env-key.
  • .kamal/secrets — prod PG password + production env-key (op://IT/Heatwave-Postgres must be created before cutover).
  • All three are resolver-only (Kamal’s 1Password adapter — no literal secrets) and therefore committed. See MANAGING.md → Secrets.
  • infra/terraform/latitude/ — provisions a Latitude bare-metal box: SSH keys, cloud-init (deploy user uid 1001, Docker, Tailscale, UFW + DOCKER-USER, cloudflared), RAID-1, edge firewall.
  • infra/terraform/cloudflare/ — the tunnel (remotely managed) + DNS CNAMEs + Access app/policy for *.warmlyyours.ws.
  • infra/terraform/ (root) — the original Vultr provisioning module (being retired in favour of Latitude).
  • bin/deploy — the wrapper around kamal deploy (clean-tree gate, 1Password unlock, gated migrations, sourcemap upload, edge-cache purge). See DEPLOYING.md.
  • .kamal/hooks/pre-build — stamps REVISION (git SHA) into the build context so webpack/AppSignal report a real revision.
  • .kamal/hooks/pre-deploy — quiets Sidekiq (TSTP) before the swap.
  • .kamal/hooks/post-deploy — clears REVISION + the Sidekiq quiet marker.
  • script/db_restore_kamal.sh — fast+deferred DB restore into the staging Postgres accessory (see MANAGING.md → Database restore).

flowchart TB
user([User / browser])
subgraph CF["Cloudflare edge"]
tls["TLS termination<br/>+ WAF + cache"]
access["Access SSO gate<br/>(wy-employees group)"]
cft["Cloudflare Tunnel<br/>crm/www/api/mcp.warmlyyours.ws"]
end
subgraph BOX["Latitude bare-metal — dal-latitude-heatwave-01 (Tailscale 100.123.47.52)"]
direction TB
cfd["cloudflared<br/>(host systemd, outbound QUIC)"]
proxy["kamal-proxy :80<br/>(rolling swap, /up healthcheck)"]
subgraph NET["docker network: kamal"]
direction TB
web["web container<br/>Thruster :80 → Puma :3000"]
sidekiq["sidekiq container<br/>consolidated capsules + scheduler"]
pg[("postgres accessory<br/>PG18 · heatwave + heatwave_versions")]
valkey[("valkey accessories ×3<br/>cache / sessions / queue")]
mailpit["mailpit accessory<br/>SMTP :1025 / UI :8025"]
end
end
admin([Operator]) -. "SSH / psql / mailpit UI<br/>over Tailscale" .-> BOX
user -->|HTTPS| tls --> access --> cft
cft -->|"QUIC (dialed out by cloudflared)"| cfd
cfd -->|"http://localhost:80"| proxy --> web
web --> pg & valkey
web -->|SMTP| mailpit
sidekiq --> pg & valkey

Request path: browser → Cloudflare (TLS, Access SSO) → Cloudflare Tunnel → cloudflared on the box → http://localhost:80 (kamal-proxy) → web container (Thruster :80 → Puma :3000). No inbound web ports are open on the host; the tunnel is dialed outbound.


Production topology — pre-cutover snapshot (historical)

Section titled “Production topology — pre-cutover snapshot (historical)”

Historical. This section and the diagram below capture the pre-cutover Vultr + Capistrano topology and the original Kamal target. Production cut over to Kamal on Latitude on 2026-06-07 (Dallas primary + Chicago standby); see the Status note at the top of this file and INFRASTRUCTURE_INVENTORY.md for the current state.

flowchart TB
user([User]) -->|HTTPS| cf["Cloudflare edge<br/>(TLS + WAF + Access on CRM)"]
cf -->|Tunnel| cfd["cloudflared (host)"]
cfd -->|"localhost:80"| proxy["kamal-proxy"]
subgraph WEB1["web1 (Vultr, Ubuntu 26.04) — TODO provision"]
proxy --> web["web container (Puma)"]
proxy -.-> sk["sidekiq container<br/>(consolidated, co-located to start)"]
end
web -->|"public IP 45.63.79.22:5432<br/>(firewall allowlist + SCRAM)"| db4[("db4 — Postgres PRIMARY<br/>heatwave + heatwave_versions")]
db4 -. "async replication" .-> db3[("db3 — replica")]
web -->|"TLS, allowlist"| valkey[("Vultr Managed Valkey 7")]
sk --> db4 & valkey

Cutover prerequisites (gated): create op://IT/Heatwave-Postgres, provision web1, add its public IP to the db4 firewall group + the Valkey allowlist, then bin/deploy production. Full sequence in doc/tasks/202606022303_KAMAL_MIGRATION.md.


flowchart LR
subgraph internet["Public internet"]
u([User]) ; op([Operator])
end
subgraph edge["Layer 1 — Cloudflare"]
e1["TLS + WAF + rate limiting"]
e2["Access SSO (Zero Trust)"]
end
subgraph latfw["Layer 2 — Latitude edge firewall"]
l1["inbound :22 ← 100.64.0.0/10 only<br/>(Tailscale CGNAT) · default-deny"]
end
subgraph host["Layer 3 — host (UFW + DOCKER-USER)"]
h1["UFW: default-deny in,<br/>allow lo + tailscale0 + :22"]
h2["DOCKER-USER: DROP public :80/:443,<br/>RETURN on tailscale0"]
end
subgraph app["Layer 4 — app"]
a1["accessories bound to 127.0.0.1<br/>or the Tailscale IP — never 0.0.0.0"]
a2["web reachable only via the Tunnel"]
end
u -->|web| e1 --> e2 -->|"Tunnel (outbound)"| a2
op -->|SSH/psql/UI| l1 --> h1 --> a1
h2 --- a2

The web tier is reachable only through the Cloudflare Tunnel (no public port). The operator tier (SSH, psql, mailpit UI) is reachable only over Tailscale. DOCKER-USER exists because Docker inserts iptables rules ahead of UFW for published ports — without it, a published :80 would be world-reachable despite UFW’s default-deny.


ThingValue
Live staging hostdal-latitude-heatwave-01, Tailscale 100.123.47.52 (Latitude bare metal, RAID-1)
Staging hostnamescrm / www / api / mcp.warmlyyours.ws (TLD env = warmlyyours.ws)
Staging Access groupwy-employees (0de0f290-f12c-4046-ae47-b66146f1a4ac)
App imageghcr.io/warmlyyours/heatwave (GHCR)
PG accessory imageghcr.io/warmlyyours/heatwave-postgres:18 (GHCR)
Docker networkkamal (app + accessories resolve by name)
Web port pathCloudflare → tunnel → kamal-proxy :80 → Thruster :80 → Puma :3000
Deploy userdeploy, uid 1001 (must match container USER 1001)
Cloudflare account79b7f58cf035093b5ad11747df30369a
Staging zonewarmlyyours.ws (d39acaed475782c4901d4a8e5908c1cb)
Prod DBPG18 primary heatwave-postgres on Dallas (100.123.47.52) + cross-DC streaming standby heatwave-postgres-replica on Chicago (100.68.157.49); app reaches it via HAProxy write-VIP heatwave-haproxy:6433 → pgbouncer. See INFRASTRUCTURE_INVENTORY.md
Prod cache/queueValkey ×3 — heatwave-valkey-cache / -sessions / -queue (3-flavor split, routed per logical DB)
Prod backupsDatabasus PITR → Cloudflare R2 (off the Chicago standby)
Deploy command`bin/deploy [staging