timestamptz backfill — converting legacy naive timestamp columns

Drafted 2026-06-05. Status: NOT STARTED — planning only. Nothing in here is approved to run. The default recommendation is opportunistic conversion (Tier 0 below), not a wholesale sweep; the full sweep is documented so that if/when we decide to do it, the scope, risks, and mechanics are already worked out.

Background

As of 2026-06-05, config/initializers/071_active_record_timestamptz.rb sets PostgreSQLAdapter.datetime_type = :timestamptz, so new datetime columns are timestamp with time zone. Per rails/rails#41084 that flip only affects DDL generated going forward — the existing schema is untouched and deliberately mixed.

This doc is about the other half: the ~621 legacy timestamp without time zone columns already in the schema.

Key fact that de-risks the whole thing: the legacy columns are not broken. Rails stores and reads them in UTC (config.active_record.default_timezone defaults to :utc, and we don’t override it), so every naive column round-trips correctly through ActiveRecord today. The only benefit of converting is correctness for consumers that bypass Rails and read the raw column with a non-UTC session TimeZone: raw SQL, pghero, logical replicas, BI/ETL tools, psql spelunking. That benefit is real but marginal, which is why this is opportunistic, not urgent.

Scope

Authoritative inventory (from db/structure.sql, the schema source):

253 tables carry at least one timestamp without time zone column.
621 naive columns total.
Out of scope: the PaperTrail versions database (POSTGRES_VERSIONS_URI, db/versions_structure.sql). It’s a separate FDW-backed partitioned database with its own restore gotchas (see the versions-db-partition-restore-gotcha note). Converting partitioned timestamp columns is a distinct, riskier exercise — handle it on its own if ever, never bundled into this.

Prod size tiering (row/byte estimates from the replica, 2026-06-05 — must be re-measured at execution time per the bulk-op protocol):

Tier	Size band	Tables	Combined	Strategy
A	≥ 1 GB	~18	~54 GB	One table per migration, off-peak, `online_migrations`, monitored
B	100 MB – 1 GB	~36	~13 GB	Small batches, `online_migrations`
C	10 – 100 MB	~59	~2 GB	Batched, brief locks acceptable
D	< 10 MB	long tail	< 300 MB	Can go in one migration

(Tier counts inflate slightly vs. the 253 logical tables because the size query joins partition children by name; treat them as order-of-magnitude, not exact.)

Tier-A / hot tables (handle individually, with named risks)

Table	naive cols	~rows	~size	Risk note
`communications`	3	8.7M	12 GB	The hash-index table — index rebuild took 82 min during a restore. A `TYPE` rewrite reindexes the whole table. See `202606041530_HASH_INDEX_REMEDIATION.md`. Worst single table.
`visits` / `visit_events`	1 / 1	10M / 15M	9.5 GB / 3.7 GB	High write volume; pick a genuine traffic trough.
`edi_communication_logs`	5	0.5M	3.9 GB	Wide rows (5 cols × rewrite).
`communication_recipients`	3	8.9M	3.3 GB
`line_items`	2	11M	2.9 GB	Core commerce path — hot.
`activities`	6	5.3M	2.4 GB	6 cols in one rewrite.
`campaign_deliveries`	2	13M	1.8 GB	Highest row count.
`event_store_events` / `…_in_streams`	2 / 1	2.0M / 2.0M	1.4 GB / 0.6 GB	Rails Event Store internals. RES reads `created_at`/`valid_at`. Verify RES still reads correctly post-conversion in staging before prod.
`ledger_entries`	3	2.5M	766 MB	Append-only ledger with closed-period triggers (`heatwave.allow_ledger_admin` GUC). `ALTER COLUMN TYPE` is a table rewrite, not row DML, so row-level triggers should not fire — but verify on staging that the rewrite isn’t blocked, and whether `SET LOCAL heatwave.allow_ledger_admin='on'` is needed. See `ledger-append-only-admin-bypass` note.
`audit_trails`	2	3.8M	514 MB	Audit data — don’t mutate values, only the type.
`receipts` / `receipt_details`	2 / 2	—	—	Part of the receipts/ledger integrity program; same closed-period caution as `ledger_entries`.

The conversion mechanic

The naive values are UTC wall-clock, so the conversion must pin the source zone explicitly. A bare ALTER COLUMN … TYPE timestamptz uses the session TimeZone and silently shifts every value if the session isn’t UTC — this is the data-corruption trap the dev.to timezone article and the PG wiki warn about.

# ✅ correct: interpret the naive value AS UTC, produce timestamptz
execute <<~SQL
  ALTER TABLE foos
    ALTER COLUMN created_at TYPE timestamptz USING (created_at AT TIME ZONE 'UTC'),
    ALTER COLUMN updated_at TYPE timestamptz USING (updated_at AT TIME ZONE 'UTC')
SQL

Belt-and-suspenders: SET LOCAL TimeZone = 'UTC'; at the top of the migration so even a USING-less column can’t go wrong.

Combine all of a table’s naive columns into one ALTER TABLE so the table is rewritten once, not once per column.

Why this is bulk-op gated

ALTER COLUMN … TYPE takes an ACCESS EXCLUSIVE lock and fully rewrites the table (and reindexes it). For Tier-A tables that’s tens of minutes of exclusive lock — unacceptable inline. This is exactly the kind of operation CLAUDE.md’s bulk-operation protocol governs:

Count first. Re-measure rows/size against prod (not this doc’s stale replica estimates) immediately before each batch.
Two confirmations, separated by a scope/window question.
Narrow default. Start with Tier D, prove the pattern end-to-end, then widen one tier at a time. Do not attempt all 253 at once.
Route big tables through online_migrations so locks are taken with a timeout and retried, not held indefinitely against live traffic. See the migrations skill.
Off-peak only for Tiers A/B. Coordinate with deploys; a rewrite mid-deploy is asking for a lock pile-up.

Suggested phasing

Tier 0 (default, no approval needed beyond normal review): opportunistic. Whenever a migration already rewrites a table for another reason, convert that table’s naive columns in the same migration. Zero marginal cost. This alone drains the list slowly with no dedicated risk.
Phase 1 — Tier D (small tail): one migration converting all <10 MB tables (brief locks, low risk). Validates the USING AT TIME ZONE pattern and the verification queries at scale.
Phase 2 — Tier C: batched, online_migrations.
Phase 3 — Tier B: small batches, off-peak.
Phase 4 — Tier A: one table per migration, each its own bulk-op confirmation, monitored live. communications last and alone.
Never in scope here: the versions FDW database — separate plan.

Verification

After each batch, confirm the type changed and no values shifted (spot-check a few rows pre/post — the wall-clock instant in UTC must be identical):

-- type flipped?
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'foos' AND data_type LIKE 'timestamp%';

-- value preserved? (run pre- and post-, compare)
SELECT id, created_at AT TIME ZONE 'UTC' AS created_utc FROM foos ORDER BY id LIMIT 5;

Then db:migrate regenerates db/structure.sql (schema_format = :sql) — diff it to confirm only the intended tables flipped from timestamp without time zone to timestamp with time zone.

App-level: run the model’s tests; for event_store_events and the ledger/receipts tables, exercise RES replay and a closed-period guard respectively on staging before prod.

Rollback

ALTER COLUMN … TYPE timestamp without time zone USING (col AT TIME ZONE 'UTC') reverses it (another full rewrite). Because the instant is preserved in both directions, rollback is value-safe — but it’s another long lock, so the real mitigation is doing each Tier-A table in its own small, separately-deployable migration rather than a batch that’s hard to partially revert.

References

config/initializers/071_active_record_timestamptz.rb — the going-forward default
AGENTS.md → Migration safety → “New datetime columns are timestamptz”
rails/rails#41084 — datetime_type config + t.timestamp/t.timestamptz helpers
PostgreSQL: Don’t use timestamp without time zone
Handling Timezones in Rails (dev.to)
doc/tasks/202606041530_HASH_INDEX_REMEDIATION.md — communications index context
migrations skill — online_migrations usage