Heatwave Kamal Stack — Architecture & Index
The containerized deployment stack that replaced Capistrano + Passenger.
This directory is the single source of truth for the new infrastructure.
| Doc | Covers |
|---|---|
| README.md (this file) | Stack inventory, status, master architecture + network diagrams |
| DEPLOYING.md | The deploy guidebook — bin/deploy, the deploy lifecycle, migrations, rollback |
| MANAGING.md | Day-2 operations — accessories, DB restore, mailpit, secrets, scaling, provisioning a new box |
| TROUBLESHOOTING.md | Runbook for the failure modes we've actually hit |
Status (2026-06-14). Production and staging both run on Kamal on
Latitude bare-metal. Dallas (dal-latitude-heatwave-01, Tailscale
100.123.47.52) is the primary and hosts both environments; a cross-DC
PostgreSQL standby runs in Chicago (chi-latitude-heatwave-02,
100.68.157.49). The Capistrano + Passenger + Vultr stack was retired at the
2026-06-07 cutover. Historical record:
doc/tasks/202606022303_KAMAL_MIGRATION.md(cutover) and
doc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md(two-region HA end-state).Note: the network diagram and a few body sections below still describe the
pre-cutover Vultr topology and are being refreshed.
What changed vs. Capistrano
| Concern | Old (Capistrano + Passenger) | New (Kamal) |
|---|---|---|
| Unit of deploy | git pull + bundle on the host |
An OCI image built once, pushed, rolled out |
| Web server | Passenger (Apache/nginx) | Thruster → Puma in a container |
| Zero-downtime | Passenger restart | kamal-proxy rolling swap on a /up health check |
| Ingress | nginx + origin TLS | Cloudflare Tunnel (no public ports, no origin TLS) |
| Asset bridging | linked_dirs public/javascripts/webpack |
Kamal asset_path host volume |
| Datastores | External Postgres / Managed Valkey | Kamal accessories (PG18 + Valkey ×3 + pgbouncer + HAProxy), co-located on the Latitude boxes in both envs |
| Secrets | config/master.key on host |
1Password resolver-only .kamal/secrets* (tracked in git) |
| Deploy command | bin/deploy |
bin/deploy → kamal deploy |
| Provisioning | Hand-built hosts | Terraform/OpenTofu (infra/terraform/) + cloud-init |
Stack inventory
Every moving piece of the new stack and where it's configured.
Compute & orchestration
- Kamal 2.x — orchestrates build → push → rolling deploy. Config:
config/deploy.yml(base/prod),config/deploy.staging.yml(staging overrides). - kamal-proxy — per-host reverse proxy giving zero-downtime rolling swaps.
Listens on host:80, health-checks/up, no TLS (ssl: false). - Docker — installed by cloud-init (
get.docker.com). All app + accessory
containers attach to thekamaldocker network and resolve each other by name.
The application image
Dockerfile— multi-stage (base→build→final), base
ruby:4.0.5-slim. Build stage compiles gems + Yarn 4 / webpack assets; final
stage is a slim runtime (gems + app + built assets, non-rootrailsuser
uid 1001). Entry:bin/docker-entrypoint; CMDbin/thrust bin/rails server.- Thruster — HTTP/2 + X-Sendfile front, listens
:80, proxies to Puma:3000. - Registry — everything is on GitHub Container Registry (no Vultr CR):
the app image isghcr.io/warmlyyours/heatwaveand the custom Postgres
accessory image isghcr.io/warmlyyours/heatwave-postgres:18(the host's single
ghcr.iologin covers both).
Roles (containers Kamal runs)
web— Puma (4 workers × 3 threads, jemalloc), behind kamal-proxy.sidekiq— a single consolidated Sidekiq process (SIDEKIQ_CONSOLIDATED=1)
running the high/low/campaign capsules + thedefaultset + the scheduler in one
container.cmd: bundle exec sidekiq -C config/sidekiq.yml. Sidekiq Pro
super_fetchmakes rolling restarts safe;.kamal/hooks/pre-deployquiets it
(TSTP) before the swap.
Accessories (co-located on the box; staging detail below)
Both environments run their datastores as Kamal accessories on the Latitude
boxes (prod splits Postgres across Dallas + Chicago — see the note below the
table). The staging-specific accessories are:
postgres— custom PG18 image (ghcr.io/warmlyyours/heatwave-postgres:18,
built fromdocker/postgresql.Dockerfile) withpgvector,hypopg,pg_repack,
pg_stat_statements. Tuned down (shared_buffers=8GB) because the box is
shared with the prod stack. Data on a host volume. Host-published
127.0.0.1:5432for localpsql; the app reaches it asheatwave-postgreson
thekamalnetwork.- Valkey ×3 —
valkey/valkey:9.1in a 3-flavor split:
heatwave-staging-valkey-cache(allkeys-lru),-sessions(noeviction),
-queue(noeviction+ AOF).RedisConfigroutes to them per logical DB via
REDIS_CACHE_HOST/REDIS_SESSIONS_HOST/REDIS_QUEUE_HOST(no single
REDIS_HOST). Internal to thekamalnetwork — not host-published. Mirrors
the prod split (heatwave-valkey-{cache,sessions,queue}). mailpit— SMTP sink + web UI. App/sidekiq deliver toheatwave-mailpit:1025;
the UI is bound to the Tailscale interface only (http://100.123.47.52:8025),
so captured staging mail (reset tokens etc.) is never publicly exposed.
Production runs the same accessories, just split across two Latitude boxes:
a PG18 primary in Dallas (heatwave-postgres) with a cross-DC streaming
standby in Chicago (heatwave-postgres-replica), fronted by per-node
pgbouncerand a TCP write-VIP HAProxy (heatwave-haproxy:6433, the app's
DATABASE_HOST) so apg_promoteflip reroutes with no app redeploy; the same
3-flavor Valkey split (heatwave-valkey-cache/-sessions/-queue);
and Databasus PITR → Cloudflare R2 backups off the Chicago standby. The old
Vultr Postgres (db4/db3) and Vultr Managed Valkey are gone. Full current
topology, hosts, ports, and image tags:doc/infrastructure/INFRASTRUCTURE_INVENTORY.md
anddoc/tasks/202606112045_DB_TIER_HA_ARCHITECTURE.md.
Ingress & network
- Cloudflare Tunnel (
cloudflared, host systemd service, remotely managed
— ingress configured in Cloudflare, not on the box). Outbound-only QUIC; the
only inbound web path. Routescrm/www/api/mcp.warmlyyours.ws → http://localhost:80. - Cloudflare Access — SSO gate (the
wy-employeesgroup) in front of every
staging hostname. - Tailscale — the admin/SSH plane (and, in the HA end-state, cross-region DB
replication). Hosts get100.xaddresses; SSH is Tailscale-only. - Firewall, defense-in-depth — Latitude edge firewall (SSH from the Tailscale
CGNAT range100.64.0.0/10only) + host UFW (default-deny inbound, allow
lo+tailscale0+:22) + a DOCKER-USER iptables chain that blocks
public:80/:443(Docker bypasses UFW for published ports) + Cloudflare Access.
Secrets
.kamal/secrets-common— shared:RAILS_MASTER_KEY(=config/master.key),
BUNDLE_GEMS__CONTRIBSYS__COM(Sidekiq Pro),KAMAL_REGISTRY_PASSWORD(GHCR)..kamal/secrets.staging— staging PG password + thestaging
Heatwave::Configurationenv-key..kamal/secrets— prod PG password +productionenv-key (op://IT/Heatwave-Postgres
must be created before cutover).- All three are resolver-only (Kamal's 1Password adapter — no literal secrets)
and therefore committed. See MANAGING.md → Secrets.
Provisioning (Infrastructure as Code)
infra/terraform/latitude/— provisions a Latitude bare-metal box: SSH keys,
cloud-init (deploy user uid 1001, Docker, Tailscale, UFW + DOCKER-USER,
cloudflared), RAID-1, edge firewall.infra/terraform/cloudflare/— the tunnel (remotely managed) + DNS CNAMEs +
Access app/policy for*.warmlyyours.ws.infra/terraform/(root) — the original Vultr provisioning module (being
retired in favour of Latitude).
Deploy tooling & lifecycle hooks
bin/deploy— the wrapper aroundkamal deploy(clean-tree gate, 1Password
unlock, gated migrations, sourcemap upload, edge-cache purge). See DEPLOYING.md..kamal/hooks/pre-build— stampsREVISION(git SHA) into the build context
so webpack/AppSignal report a real revision..kamal/hooks/pre-deploy— quiets Sidekiq (TSTP) before the swap..kamal/hooks/post-deploy— clearsREVISION+ the Sidekiq quiet marker.script/db_restore_kamal.sh— fast+deferred DB restore into the staging
Postgres accessory (see MANAGING.md → Database restore).
Master architecture — staging (live)
flowchart TB
user([User / browser])
subgraph CF["Cloudflare edge"]
tls["TLS termination<br/>+ WAF + cache"]
access["Access SSO gate<br/>(wy-employees group)"]
cft["Cloudflare Tunnel<br/>crm/www/api/mcp.warmlyyours.ws"]
end
subgraph BOX["Latitude bare-metal — dal-latitude-heatwave-01 (Tailscale 100.123.47.52)"]
direction TB
cfd["cloudflared<br/>(host systemd, outbound QUIC)"]
proxy["kamal-proxy :80<br/>(rolling swap, /up healthcheck)"]
subgraph NET["docker network: kamal"]
direction TB
web["web container<br/>Thruster :80 → Puma :3000"]
sidekiq["sidekiq container<br/>consolidated capsules + scheduler"]
pg[("postgres accessory<br/>PG18 · heatwave + heatwave_versions")]
valkey[("valkey accessories ×3<br/>cache / sessions / queue")]
mailpit["mailpit accessory<br/>SMTP :1025 / UI :8025"]
end
end
admin([Operator]) -. "SSH / psql / mailpit UI<br/>over Tailscale" .-> BOX
user -->|HTTPS| tls --> access --> cft
cft -->|"QUIC (dialed out by cloudflared)"| cfd
cfd -->|"http://localhost:80"| proxy --> web
web --> pg & valkey
web -->|SMTP| mailpit
sidekiq --> pg & valkey
Request path: browser → Cloudflare (TLS, Access SSO) → Cloudflare Tunnel →
cloudflared on the box → http://localhost:80 (kamal-proxy) → web container
(Thruster :80 → Puma :3000). No inbound web ports are open on the host; the
tunnel is dialed outbound.
Production topology — pre-cutover snapshot (historical)
Historical. This section and the diagram below capture the pre-cutover
Vultr + Capistrano topology and the original Kamal target. Production cut over
to Kamal on Latitude on 2026-06-07 (Dallas primary + Chicago standby); see the
Status note at the top of this file andINFRASTRUCTURE_INVENTORY.mdfor the
current state.
flowchart TB
user([User]) -->|HTTPS| cf["Cloudflare edge<br/>(TLS + WAF + Access on CRM)"]
cf -->|Tunnel| cfd["cloudflared (host)"]
cfd -->|"localhost:80"| proxy["kamal-proxy"]
subgraph WEB1["web1 (Vultr, Ubuntu 26.04) — TODO provision"]
proxy --> web["web container (Puma)"]
proxy -.-> sk["sidekiq container<br/>(consolidated, co-located to start)"]
end
web -->|"public IP 45.63.79.22:5432<br/>(firewall allowlist + SCRAM)"| db4[("db4 — Postgres PRIMARY<br/>heatwave + heatwave_versions")]
db4 -. "async replication" .-> db3[("db3 — replica")]
web -->|"TLS, allowlist"| valkey[("Vultr Managed Valkey 7")]
sk --> db4 & valkey
Cutover prerequisites (gated): create op://IT/Heatwave-Postgres, provision
web1, add its public IP to the db4 firewall group + the Valkey allowlist, then
bin/deploy production. Full sequence in
doc/tasks/202606022303_KAMAL_MIGRATION.md.
Network & security layers
flowchart LR
subgraph internet["Public internet"]
u([User]) ; op([Operator])
end
subgraph edge["Layer 1 — Cloudflare"]
e1["TLS + WAF + rate limiting"]
e2["Access SSO (Zero Trust)"]
end
subgraph latfw["Layer 2 — Latitude edge firewall"]
l1["inbound :22 ← 100.64.0.0/10 only<br/>(Tailscale CGNAT) · default-deny"]
end
subgraph host["Layer 3 — host (UFW + DOCKER-USER)"]
h1["UFW: default-deny in,<br/>allow lo + tailscale0 + :22"]
h2["DOCKER-USER: DROP public :80/:443,<br/>RETURN on tailscale0"]
end
subgraph app["Layer 4 — app"]
a1["accessories bound to 127.0.0.1<br/>or the Tailscale IP — never 0.0.0.0"]
a2["web reachable only via the Tunnel"]
end
u -->|web| e1 --> e2 -->|"Tunnel (outbound)"| a2
op -->|SSH/psql/UI| l1 --> h1 --> a1
h2 --- a2
The web tier is reachable only through the Cloudflare Tunnel (no public port).
The operator tier (SSH, psql, mailpit UI) is reachable only over Tailscale.
DOCKER-USER exists because Docker inserts iptables rules ahead of UFW for
published ports — without it, a published :80 would be world-reachable despite
UFW's default-deny.
Key facts at a glance
| Thing | Value |
|---|---|
| Live staging host | dal-latitude-heatwave-01, Tailscale 100.123.47.52 (Latitude bare metal, RAID-1) |
| Staging hostnames | crm / www / api / mcp.warmlyyours.ws (TLD env = warmlyyours.ws) |
| Staging Access group | wy-employees (0de0f290-f12c-4046-ae47-b66146f1a4ac) |
| App image | ghcr.io/warmlyyours/heatwave (GHCR) |
| PG accessory image | ghcr.io/warmlyyours/heatwave-postgres:18 (GHCR) |
| Docker network | kamal (app + accessories resolve by name) |
| Web port path | Cloudflare → tunnel → kamal-proxy :80 → Thruster :80 → Puma :3000 |
| Deploy user | deploy, uid 1001 (must match container USER 1001) |
| Cloudflare account | 79b7f58cf035093b5ad11747df30369a |
| Staging zone | warmlyyours.ws (d39acaed475782c4901d4a8e5908c1cb) |
| Prod DB | PG18 primary heatwave-postgres on Dallas (100.123.47.52) + cross-DC streaming standby heatwave-postgres-replica on Chicago (100.68.157.49); app reaches it via HAProxy write-VIP heatwave-haproxy:6433 → pgbouncer. See INFRASTRUCTURE_INVENTORY.md |
| Prod cache/queue | Valkey ×3 — heatwave-valkey-cache / -sessions / -queue (3-flavor split, routed per logical DB) |
| Prod backups | Databasus PITR → Cloudflare R2 (off the Chicago standby) |
| Deploy command | `bin/deploy [staging |