DB-tier target architecture — merged kamal config + controlled-failover HA + PITR

Status: Stage 1 (config merge) + the prod app cutover DONE 2026-06-12 — config/deploy.yml is live as production (bare kamal deploy); the kamal-proxy service was renamed heatwave-web-production-dallas → heatwave-web via script/cutover_prod_proxy_handoff.sh (Option A, single ~25-40s .com boot gap; deferred no-op migrate; old containers retired; verified serving real CRM traffic). The PROD_CUTOVER_PENDING guard in bin/deploy is removed — future bin/deploy production is a normal in-place rolling update of heatwave-web. Chicago accessory rename also DONE 2026-06-12 (heatwave-postgres-replica + heatwave-pgbouncer-replica, RO VIP live; archive_mode NOT yet flipped). STILL PENDING: the Databasus PITR agent + WAL archiving + the 2 logical-source host updates + the restore-test gate — a focused pass (apply-step 3 below has the gathered pieces; the Databasus API is finicky). Living doc — amend as decisions firm up.

Current state (2026-06-11)

Bare-metal Latitude, PG18 ping-pong. Dallas (100.123.47.52) = live prod primary PG18.4 — full stack (web/sidekiq + postgres/pgbouncer/valkey/playwright) via deploy.production-dallas.yml. Chicago (100.68.157.49) = streaming standby + RO services via deploy.production.yml. W3 moves prod home to Chicago.
Routing: Tailscale VIP services (heatwave-db RW + heatwave-db-ro) → pgbouncer (session mode) → local postgres. The RW VIP is repointed manually at a flip.
pgbouncer = session mode and must stay so — the app uses session-level advisory locks (advisory-locks skill); transaction mode would silently break them (also LISTEN/NOTIFY, session SET). PG18 + pgbouncer 1.21 named-prepared-statements does NOT rescue advisory locks.
PITR: Databasus agent-mode “Cluster PITR (physical + WAL)” DB created (id b8a23021-…, token in op://IT/Databasus PITR agent token); the standby recreate (WAL-queue volume + archive_mode=always) is pending. See 202606101600_DATABASUS_PITR_PILOT.md.
Config sprawl: deploy.yml (DEAD Vultr stub), deploy.production.yml (Chicago), deploy.production-dallas.yml (Dallas live), deploy.staging.yml.

Decisions

One merged deploy.yml = production. Accessories: postgres (primary) + postgres_replica (standby) + pgbouncer(s) + valkey + app roles; deploy.staging.yml = staging; delete the dead stub + the two split files. Rationale: failover is external (not kamal) — the kamal config only reflects reality, so at a flip you change the host IP post-promote. The two-file split existed only to keep both DB containers named heatwave-postgres; with external failover + role-named accessories that trick isn’t needed.
Controlled failover (Approach A), NOT Patroni. External/scripted promote + automatic routing. Patroni/etcd is reserved for if/when a same-DC HA pair + a 3rd-site witness exist and seconds-not-minutes RTO is required (cross-DC auto-failover = split-brain risk; see the earlier HA feasibility analysis). Today: HA = uptime via routing; PITR (Databasus) = the data-loss axis.
HAProxy routing layer replaces the manual VIP repoint. Two listeners: write → current primary, read → replica(s). Primary detection — two surveyed mechanisms:
- (A) pg_hba reject-by-role (Percona): native option pgsql-check user primaryuser / standbyuser; primary rejects standbyuser and each standby rejects primaryuser. No sidecar, but the failover script must flip pg_hba on every node + reload.
- (B) pg_is_in_recovery() health endpoint (virtualstaticvoid/pgsql_haproxy, Patroni-style): a tiny per-node HTTP/TCP check reporting primary/replica from the LIVE recovery state; HAProxy option httpchk. Self-detecting — on pg_promote() recovery flips false, the check follows, HAProxy reroutes with NO pg_hba editing.
- Lean (B) — self-detecting; the failover script’s only job is promote + re-stand-up the old primary. Less cross-node state to keep in sync.
pgbouncer stays session-mode (advisory locks — see Current state).

Stages (execute in order)

Config merge → clean deploy.yml (the foundation everything else plugs into). Decided.
PITR finish — Databasus agent on the standby; the standby recreate does double duty (WAL-queue volume + the postgres_replica rename). In flight — 202606101600.
HAProxy + health endpoint (B) + failover script — the controlled-failover routing layer. The script: pg_promote() the replica + reconfigure the old primary as a standby; HAProxy reroutes off the recovery-state check. Keep the Tailscale VIP as the stable cross-DC address in front of HAProxy (or fold in).
Rails read/write routing (ActiveRecord::Middleware::DatabaseSelector) — the replica is currently RO-VIP/analyst-only; app-level read-splitting offloads the primary (GET/HEAD → replica, writes → primary, delay: to outlast replication lag). Future.
PG18 tuning — idle_replication_slot_timeout (native fix for the dead-slot-fills-the-primary- disk footgun — relevant to PITR/replication safety; complements max_slot_wal_keep_size=200GB), io_method=io_uring (read-heavy replica), uuidv7() for sequential UUID keys. Future.

Stage 1 — config merge: detailed plan (mapped 2026-06-11)

Live topology found (don’t re-derive):

bin/deploy production is an alias → .kamal/prod-active-destination (= production-dallas) → live prod app on Dallas (web/sidekiq, container label dest=production-dallas). config/deploy.production.yml (Chicago) is the idle W3 target, not live.
Accessories: Dallas runs heatwave-{postgres,pgbouncer,valkey,playwright,sftp}; Chicago runs heatwave-postgres (standby) + heatwave-pgbouncer (the heatwave-db-ro VIP pooler). All on the single kamal docker network (prod + staging share it today).
sftp is host-pinned to Dallas (100.123.47.52) but defined in deploy.production.yml (Chicago) — a concrete symptom of the split-file mess.

Target merged deploy.yml (= production, reflecting current reality; at W3 you edit the hosts):

service: heatwave; app web/sidekiq on Dallas; proxy.host = prod hostnames; builder remote = Dallas; env → heatwave-pgbouncer / heatwave-valkey. Clean out the stale Vultr base (TODO hosts, db4 45.63.79.22).
Accessories (all heatwave-*, network kamal — prod KEEPS kamal so no prod recreate):
- postgres (primary) → Dallas /data/prod-replica/data (+ /data/prod-replica/tbs)
- postgres_replica (standby) → Chicago /data/postgres/data (+ tbs) + the PITR /opt/databasus/wal-queue:/wal-queue volume → container heatwave-postgres-replica
- pgbouncer (RW) → Dallas /data/pgbouncer-prod
- pgbouncer_replica (RO-VIP) → Chicago /data/pgbouncer → container heatwave-pgbouncer-replica
- valkey → Dallas; sftp → Dallas; playwright → Dallas
deploy.staging.yml stays on the shared kamal network — the “split to kamal-staging” idea is not achievable in Kamal 2.11: the app (web/sidekiq) and kamal-proxy containers are hardcoded to --network kamal (commands/app.rb, commands/proxy.rb); only accessories take a network: key, and moving just those to kamal-staging would break app→accessory DNS. Multi-app isolation is by unique service name instead — the canonical Kamal-2 multi-app pattern (nts.strzibny.name/multiple-apps-single-server-kamal-2: one shared kamal net + one shared proxy, distinct service: per app). Staging floats as heatwave-staging-*; any prod-only accessory it structurally inherits is inert (never auto-booted by kamal deploy). Decided 2026-06-11 (user: “fine as long as all the network names are unique, we don’t need two networks”).

Rename cascade (the *_replica accessories on Chicago — accepted ripple):

PITR agent --pg-docker-container-name → heatwave-postgres-replica.
pg-maintenance.sh (in provision-host.sh / TFC host-config) does docker exec heatwave-postgres — parameterize PG_CONTAINER per host (Chicago = heatwave-postgres-replica); it already self-skips on the standby via pg_is_in_recovery().
Chicago RO pgbouncer backend (/data/pgbouncer/conf.d/databases.ini) host → heatwave-postgres-replica.
heatwave-db-ro Tailscale VIP serve target = the pgbouncer published port → unaffected by the rename.

bin/deploy changes: production → bare kamal deploy (uses deploy.yml); drop the .kamal/prod-active-destination alias indirection (W3 becomes “edit deploy.yml host”, per the external-failover model); keep staging → -d staging; delete .kamal/prod-active-destination. DFLAG is now () for production (empty-array expansion under set -u is fine on the bash 5.x the deploy runs on), (-d "$DEST") otherwise; the destination-validity check skips the config/deploy.<dest>.yml existence test for production (it has no per-dest file — it IS the base).

Deletions: deploy.production.yml + deploy.production-dallas.yml (folded into deploy.yml); the dead Vultr deploy.yml stub IS replaced by the new production deploy.yml. .kamal/secrets.production → renamed to .kamal/secrets (see corrections); .kamal/secrets.production-dallas deleted (was an identical copy).

Two Kamal-2.11 corrections (found while reviewing the committed merge; both applied):

service: override on the *_replica accessories. Kamal names the container/DNS as <service>-<accessory-key>, so the keys postgres_replica/pgbouncer_replica would resolve to heatwave-postgres_replica/heatwave-pgbouncer_replica (underscore — an invalid hostname that libpq/DNS reject), NOT the hyphenated heatwave-postgres-replica/heatwave-pgbouncer-replica the PITR agent / pg-maintenance / RO-pgbouncer backend / every comment assume. Fix: explicit service: heatwave-postgres-replica and service: heatwave-pgbouncer-replica on the two accessories (Accessory#service_name honours service:; the validator allows it). Verified via YAML render.
.kamal/secrets base file. Bare kamal deploy (no -d) loads .kamal/secrets-common + .kamal/secrets (Kamal::Secrets#secrets_filenames). The base .kamal/secrets didn’t exist — only .production / .production-dallas / .staging — so DATABASE_PASSWORD, DATABASE_PASSWORD_VERSIONS, and the production env_key would not resolve. Fix: renamed .kamal/secrets.production → .kamal/secrets.

Apply sequence (gated — file changes are inert w.r.t. live infra; nothing on a DB until step 2):

DONE (working tree, pending commit): merged deploy.yml + the two service: overrides + the bin/deploy rework (alias dropped → bare kamal deploy) + .kamal/secrets base file + delete the two dest files + .kamal/secrets.production-dallas + .kamal/prod-active-destination + dangling-ref cleanup. Why this is inert: kamal deploy (production) boots/swaps only the app on Dallas and ensures the proxy — it does NOT boot accessories and never SSHes to Chicago, and the Dallas app/env/hosts are identical to the live production-dallas config. So committing changes only which file bin/deploy production reads, not any running container. (The bin/deploy cutover IS the alias drop in this step — there’s no separate “cut over” later.)
DONE 2026-06-12 — Chicago accessory rename. docker stop heatwave-postgres (kept stopped = rollback) → kamal accessory boot postgres_replica → heatwave-postgres-replica (streaming, 0 lag, /opt/databasus/wal-queue:/wal-queue mounted; /opt/databasus/wal-queue created uid 999 on the host). archive_mode left on (NOT flipped to always) — deferred to step 3 until the agent’s WAL mechanism is confirmed (avoids an archive backlog with no drain; disk risk is anyway negligible — /opt has 406 GB free + WAL ~idle). Repointed /data/pgbouncer/conf.d/databases.ini host → heatwave-postgres-replica, docker stop heatwave-pgbouncer → kamal accessory boot pgbouncer_replica → heatwave-pgbouncer-replica (RO VIP live). STILL TODO: pg-maintenance.sh PG_CONTAINER per host (provision-host.sh / TFC — Chicago = heatwave-postgres-replica; it self-skips on a standby so harmless until W3), and rm the old stopped heatwave-{postgres,pgbouncer}.
Finish PITR (remaining — a focused pass; the Databasus API is finicky/undocumented). Pieces gathered:
- Databasus = Postgresus rebrand. Controller databasus on Chicago :4005 (tailnet); admin op://IT/Databasus (Postgres Backup) (username=admin); signin POST /api/v1/users/signin {email,password} → JWT. Workspace c5ab3ebb-ce00-4b59-a1fa-8c8f92d0eb9b. List DBs: GET /api/v1/databases?workspace_id=….
- Update the 2 Phase-1 logical sources (else the next scheduled logical backup fails — heatwave daily 03:00 UTC, ~21 h buffer from 2026-06-12 ~06:00 UTC): 4c468fe8-21ab-49e5-8dee-84495636380f (heatwave)
  - d98fb103-621b-4f46-883c-e981b6f6cc7e (versions), postgresql.host heatwave-postgres → heatwave-postgres-replica. ⚠️ The update endpoint is elusive — PUT/POST/PATCH /databases/{id} hit the SPA catch-all (200 + HTML, no write); passwords come back masked (need the real one = op://IT/Databasus backup role (postgres), len 28). Easiest reliable path = the UI at http://100.68.157.49:4005, or find the SPA’s actual save call.
- Deploy the agent (binary in the controller image at /app/agent-binaries/databasus-agent-linux-amd64; commands: start/stop/status/restore). Agent DB b8a23021-b93c-4f12-ad0e-00b5b45c233a (“Cluster PITR”) is a stub (host/container null — the agent configures it). Run on the Chicago host (it docker-execs the container): databasus-agent start -databasus-host http://100.68.157.49:4005 -db-id b8a23021-… -pg-type docker -pg-docker-container-name heatwave-postgres-replica -pg-wal-dir /opt/databasus/wal-queue -token <op://IT/Databasus PITR agent token> -pg-user databasus -pg-password <…role pw…> -pg-port 5432. Watch its logs to learn the WAL mechanism: pg_receivewal (streaming — then NO archive_mode change needed) vs archive_command (then ALTER SYSTEM SET archive_mode='always' + the cp %p /wal-queue/%f.tmp && mv … command + a restart). Either way add the standby pg_hba line host replication databasus 127.0.0.1/32 scram-sha-256 + reload for the agent’s replication connection.
- Acceptance gate: a real PITR restore to a chosen second into a throwaway container (the whole point; multi-day) — see 202606101600_DATABASUS_PITR_PILOT.md. Keep SimpleBackups/logical in parallel ≥1 cycle.
Production cutover is NOT a no-op — it RENAMES the kamal-proxy service. Live prod is registered on the shared Dallas kamal-proxy as heatwave-web-production-dallas (proxy service = container_prefix = service-role-destination; role.rb:124). A bare kamal deploy (destination nil) registers a NEW service heatwave-web claiming the SAME five .com hosts → kamal-proxy rejects it with Error: host settings conflict with another service (the new container boots + health-checks, then the deploy aborts and the new container is stopped; live prod keeps serving — graceful, no downtime, but the cutover does NOT complete). To actually cut over, FIRST retire the old service — kamal app stop -d production-dallas (or remove its kamal-proxy host claims) — THEN bare kamal deploy. That leaves a ~boot-time (~30–60s) gap on the .com hosts unless you script a manual kamal-proxy host hand-off. Plan it as a deliberate cutover event, not a routine deploy. After this one-time rename, every future bare kamal deploy is an in-place rolling update of heatwave-web (no further rename). kamal config + kamal secrets print are verified to resolve the merged config
- base secrets, with zero underscore accessory names (2026-06-11).
Surfaced by the staging test deploy, which ALSO caught that the merge made staging inherit prod’s proxy.host (same conflict class) — fixed: deploy.staging.yml now overrides proxy.host with staging’s own .ws hosts. Staging redeployed clean; prod + staging now coexist on the shared proxy, each host-routed to its distinct hostname set.

References

fromthekeyboard.com/hosting-multiple-postgres-databases-with-kamal/ — merged multi-accessory config
percona.com/blog/configure-haproxy-with-postgresql-using-built-in-pgsql-check/ — mechanism (A)
github.com/virtualstaticvoid/pgsql_haproxy — mechanism (B), Docker-packaged demo
Rails 8 multi-db DatabaseSelector blueprint + HAProxy/Patroni failover blueprint (user-provided)
doc/tasks/202606101600_DATABASUS_PITR_PILOT.md — PITR; the earlier Patroni-vs-controlled HA analysis