Class: OnlineMigrations::DataMigrations::BackfillVisitMetaJsonb
- Inherits:
-
OnlineMigrations::DataMigration
- Object
- OnlineMigrations::DataMigration
- OnlineMigrations::DataMigrations::BackfillVisitMetaJsonb
- Defined in:
- lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb
Overview
Backfills the new visits.marketing_meta and visits.device_meta JSONB
columns from the existing scalar tracking/device columns, as Phase 0 of
the Visit JSONB consolidation (see
doc/tasks/202605141109_OPENAI_ADS_INTEGRATION.md).
Strategy
The Visit model dual-writes scalar → JSONB on every save from the
moment the schema migration deploys, so brand-new rows and any updated
legacy row already arrive with their JSONB blobs populated. This
backfill handles the long tail — historical visits that won't be saved
again — by issuing one UPDATE … SET marketing_meta = jsonb_build_object (…) SQL statement per batch.
The work is idempotent: each batch's WHERE clause filters to rows
whose target JSONB blob is still the default empty hash, so re-running
the migration touches only un-migrated rows. We do this at the SQL
level (no per-row Ruby roundtrips) because visits is a ~9.5M-row
table and per-record UPDATE would dominate runtime.
Drained when…
BackfillVisitMetaJsonb.new.count returns 0, which means every row
with any scalar tracking/device value has its JSONB blobs populated.
That's the gate for proceeding to PR 6 (scalar column drop).
Monitor:
OnlineMigrations::BackgroundDataMigrations::Migration
.find_by(migration_name: 'BackfillVisitMetaJsonb')
Constant Summary collapse
- BATCH_SIZE =
Rows per batched SQL UPDATE. Tuned so each statement finishes well
under the autovacuum/lock-timeout windows on the primary; on local
dev a 10k-row batch completes in ~250ms. 5_000- MARKETING_COLUMNS =
Scalar (now-legacy) columns to mirror into the
marketing_metaJSONB
blob. The Tracker writes JSONB directly (Phase 2) and the underlying
columns were renamed tolegacy_*in Phase 3 — this class survives
only as a rollback safety net: ifmarketing_metaever needs to be
rebuilt from the frozen scalar data, run this migration against the
legacy_* columns. %w[legacy_gclid legacy_gbraid legacy_wbraid].freeze
- DEVICE_COLUMNS =
Scalar (now-legacy) columns to mirror into the
device_metaJSONB blob. %w[ legacy_browser legacy_os legacy_device_type legacy_screen_width legacy_screen_height legacy_dnt legacy_locale ].freeze
Instance Method Summary collapse
-
#collection ⇒ Array<Array<Integer>>
Yields one ID batch per
BATCH_SIZErows that still need backfilling (either JSONB blob still equals'{}'::jsonb). -
#count ⇒ Integer
Drives the online_migrations progress bar — counts rows that still have an empty JSONB blob on either side.
-
#process(ids) ⇒ void
Migrates one batch of visit IDs in a single SQL
UPDATE.
Instance Method Details
#collection ⇒ Array<Array<Integer>>
Yields one ID batch per BATCH_SIZE rows that still need backfilling
(either JSONB blob still equals '{}'::jsonb). Returned eagerly as
an Array because OnlineMigrations::MigrationJob#build_enumerator
only handles ActiveRecord::Relation, BatchEnumerator, and
Array from #collection — a raw Enumerator would raise
ArgumentError before on_start runs and wedge the migration row
at status='enqueued'.
66 67 68 69 70 71 72 73 74 |
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 66 def collection batches = [] ::Visit.where("marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb") .in_batches(of: BATCH_SIZE) do |relation| ids = relation.pluck(:id) batches << ids if ids.any? end batches end |
#count ⇒ Integer
Drives the online_migrations progress bar — counts rows that still
have an empty JSONB blob on either side.
135 136 137 |
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 135 def count ::Visit.where("marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb").count end |
#process(ids) ⇒ void
This method returns an undefined value.
Migrates one batch of visit IDs in a single SQL UPDATE. jsonb_strip_nulls
drops keys whose scalar value is NULL so we don't store {"gclid": null}
noise. The AND … = '{}'::jsonb predicate keeps the write idempotent:
rows already migrated (e.g. by the model's dual-write) are skipped.
Per-blob CASE: the WHERE clause matches when either blob is empty,
but the SET only rewrites the side that's actually empty. Without the
per-blob guard, a row with populated marketing_meta and empty
device_meta would have marketing_meta rebuilt from legacy_* —
and the legacy columns are NULL on every row created after Phase 2
(PR #820), so the rewrite would silently overwrite good JSONB data
with {} (CodeRabbit catch on PR #822).
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 91 def process(ids) return if ids.empty? # `ids` are integers from `Visit#pluck(:id)` in `#collection`, so the # `IN (…)` interpolation is safe by construction — no untrusted input # can reach the SQL string. We interpolate (rather than using a # placeholder) because the batch contains up to BATCH_SIZE (5_000) # IDs, which pushes against PostgreSQL's prepared-statement parameter # limits at the higher end of the per-iteration sub-batch fan-out. sql = <<~SQL.squish UPDATE visits SET marketing_meta = CASE WHEN marketing_meta = '{}'::jsonb THEN jsonb_strip_nulls(jsonb_build_object( 'gclid', legacy_gclid, 'gbraid', legacy_gbraid, 'wbraid', legacy_wbraid )) ELSE marketing_meta END, device_meta = CASE WHEN device_meta = '{}'::jsonb THEN jsonb_strip_nulls(jsonb_build_object( 'browser', legacy_browser, 'os', legacy_os, 'device_type', legacy_device_type, 'screen_width', legacy_screen_width, 'screen_height', legacy_screen_height, 'dnt', legacy_dnt, 'locale', legacy_locale )) ELSE device_meta END WHERE id IN (#{ids.join(',')}) AND (marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb) SQL ApplicationRecord.with_connection { |conn| conn.execute(sql) } end |