Skip to content

timestamptz backfill — converting legacy naive timestamp columns

Drafted 2026-06-05. Status: NOT STARTED — planning only. Nothing in here is approved to run. The default recommendation is opportunistic conversion (Tier 0 below), not a wholesale sweep; the full sweep is documented so that if/when we decide to do it, the scope, risks, and mechanics are already worked out.

As of 2026-06-05, config/initializers/071_active_record_timestamptz.rb sets PostgreSQLAdapter.datetime_type = :timestamptz, so new datetime columns are timestamp with time zone. Per rails/rails#41084 that flip only affects DDL generated going forward — the existing schema is untouched and deliberately mixed.

This doc is about the other half: the ~621 legacy timestamp without time zone columns already in the schema.

Key fact that de-risks the whole thing: the legacy columns are not broken. Rails stores and reads them in UTC (config.active_record.default_timezone defaults to :utc, and we don’t override it), so every naive column round-trips correctly through ActiveRecord today. The only benefit of converting is correctness for consumers that bypass Rails and read the raw column with a non-UTC session TimeZone: raw SQL, pghero, logical replicas, BI/ETL tools, psql spelunking. That benefit is real but marginal, which is why this is opportunistic, not urgent.

Authoritative inventory (from db/structure.sql, the schema source):

  • 253 tables carry at least one timestamp without time zone column.
  • 621 naive columns total.
  • Out of scope: the PaperTrail versions database (POSTGRES_VERSIONS_URI, db/versions_structure.sql). It’s a separate FDW-backed partitioned database with its own restore gotchas (see the versions-db-partition-restore-gotcha note). Converting partitioned timestamp columns is a distinct, riskier exercise — handle it on its own if ever, never bundled into this.

Prod size tiering (row/byte estimates from the replica, 2026-06-05 — must be re-measured at execution time per the bulk-op protocol):

TierSize bandTablesCombinedStrategy
A≥ 1 GB~18~54 GBOne table per migration, off-peak, online_migrations, monitored
B100 MB – 1 GB~36~13 GBSmall batches, online_migrations
C10 – 100 MB~59~2 GBBatched, brief locks acceptable
D< 10 MBlong tail< 300 MBCan go in one migration

(Tier counts inflate slightly vs. the 253 logical tables because the size query joins partition children by name; treat them as order-of-magnitude, not exact.)

Tier-A / hot tables (handle individually, with named risks)

Section titled “Tier-A / hot tables (handle individually, with named risks)”
Tablenaive cols~rows~sizeRisk note
communications38.7M12 GBThe hash-index table — index rebuild took 82 min during a restore. A TYPE rewrite reindexes the whole table. See 202606041530_HASH_INDEX_REMEDIATION.md. Worst single table.
visits / visit_events1 / 110M / 15M9.5 GB / 3.7 GBHigh write volume; pick a genuine traffic trough.
edi_communication_logs50.5M3.9 GBWide rows (5 cols × rewrite).
communication_recipients38.9M3.3 GB
line_items211M2.9 GBCore commerce path — hot.
activities65.3M2.4 GB6 cols in one rewrite.
campaign_deliveries213M1.8 GBHighest row count.
event_store_events / …_in_streams2 / 12.0M / 2.0M1.4 GB / 0.6 GBRails Event Store internals. RES reads created_at/valid_at. Verify RES still reads correctly post-conversion in staging before prod.
ledger_entries32.5M766 MBAppend-only ledger with closed-period triggers (heatwave.allow_ledger_admin GUC). ALTER COLUMN TYPE is a table rewrite, not row DML, so row-level triggers should not fire — but verify on staging that the rewrite isn’t blocked, and whether SET LOCAL heatwave.allow_ledger_admin='on' is needed. See ledger-append-only-admin-bypass note.
audit_trails23.8M514 MBAudit data — don’t mutate values, only the type.
receipts / receipt_details2 / 2Part of the receipts/ledger integrity program; same closed-period caution as ledger_entries.

The naive values are UTC wall-clock, so the conversion must pin the source zone explicitly. A bare ALTER COLUMN … TYPE timestamptz uses the session TimeZone and silently shifts every value if the session isn’t UTC — this is the data-corruption trap the dev.to timezone article and the PG wiki warn about.

# ✅ correct: interpret the naive value AS UTC, produce timestamptz
execute <<~SQL
ALTER TABLE foos
ALTER COLUMN created_at TYPE timestamptz USING (created_at AT TIME ZONE 'UTC'),
ALTER COLUMN updated_at TYPE timestamptz USING (updated_at AT TIME ZONE 'UTC')
SQL

Belt-and-suspenders: SET LOCAL TimeZone = 'UTC'; at the top of the migration so even a USING-less column can’t go wrong.

Combine all of a table’s naive columns into one ALTER TABLE so the table is rewritten once, not once per column.

ALTER COLUMN … TYPE takes an ACCESS EXCLUSIVE lock and fully rewrites the table (and reindexes it). For Tier-A tables that’s tens of minutes of exclusive lock — unacceptable inline. This is exactly the kind of operation CLAUDE.md’s bulk-operation protocol governs:

  1. Count first. Re-measure rows/size against prod (not this doc’s stale replica estimates) immediately before each batch.
  2. Two confirmations, separated by a scope/window question.
  3. Narrow default. Start with Tier D, prove the pattern end-to-end, then widen one tier at a time. Do not attempt all 253 at once.
  4. Route big tables through online_migrations so locks are taken with a timeout and retried, not held indefinitely against live traffic. See the migrations skill.
  5. Off-peak only for Tiers A/B. Coordinate with deploys; a rewrite mid-deploy is asking for a lock pile-up.
  • Tier 0 (default, no approval needed beyond normal review): opportunistic. Whenever a migration already rewrites a table for another reason, convert that table’s naive columns in the same migration. Zero marginal cost. This alone drains the list slowly with no dedicated risk.
  • Phase 1 — Tier D (small tail): one migration converting all <10 MB tables (brief locks, low risk). Validates the USING AT TIME ZONE pattern and the verification queries at scale.
  • Phase 2 — Tier C: batched, online_migrations.
  • Phase 3 — Tier B: small batches, off-peak.
  • Phase 4 — Tier A: one table per migration, each its own bulk-op confirmation, monitored live. communications last and alone.
  • Never in scope here: the versions FDW database — separate plan.

After each batch, confirm the type changed and no values shifted (spot-check a few rows pre/post — the wall-clock instant in UTC must be identical):

-- type flipped?
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'foos' AND data_type LIKE 'timestamp%';
-- value preserved? (run pre- and post-, compare)
SELECT id, created_at AT TIME ZONE 'UTC' AS created_utc FROM foos ORDER BY id LIMIT 5;

Then db:migrate regenerates db/structure.sql (schema_format = :sql) — diff it to confirm only the intended tables flipped from timestamp without time zone to timestamp with time zone.

App-level: run the model’s tests; for event_store_events and the ledger/receipts tables, exercise RES replay and a closed-period guard respectively on staging before prod.

ALTER COLUMN … TYPE timestamp without time zone USING (col AT TIME ZONE 'UTC') reverses it (another full rewrite). Because the instant is preserved in both directions, rollback is value-safe — but it’s another long lock, so the real mitigation is doing each Tier-A table in its own small, separately-deployable migration rather than a batch that’s hard to partially revert.