Class: OnlineMigrations::DataMigrations::BackfillVisitMetaJsonb

Inherits:
OnlineMigrations::DataMigration
  • Object
show all
Defined in:
lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb

Overview

Backfills the new visits.marketing_meta and visits.device_meta JSONB
columns from the existing scalar tracking/device columns, as Phase 0 of
the Visit JSONB consolidation (see
doc/tasks/202605141109_OPENAI_ADS_INTEGRATION.md).

Strategy

The Visit model dual-writes scalar → JSONB on every save from the
moment the schema migration deploys, so brand-new rows and any updated
legacy row already arrive with their JSONB blobs populated. This
backfill handles the long tail — historical visits that won't be saved
again — by issuing one UPDATE … SET marketing_meta = jsonb_build_object (…) SQL statement per batch.

The work is idempotent: each batch's WHERE clause filters to rows
whose target JSONB blob is still the default empty hash, so re-running
the migration touches only un-migrated rows. We do this at the SQL
level (no per-row Ruby roundtrips) because visits is a ~9.5M-row
table and per-record UPDATE would dominate runtime.

Drained when…

BackfillVisitMetaJsonb.new.count returns 0, which means every row
with any scalar tracking/device value has its JSONB blobs populated.
That's the gate for proceeding to PR 6 (scalar column drop).

Monitor:
OnlineMigrations::BackgroundDataMigrations::Migration
.find_by(migration_name: 'BackfillVisitMetaJsonb')

Constant Summary collapse

BATCH_SIZE =

Rows per batched SQL UPDATE. Tuned so each statement finishes well
under the autovacuum/lock-timeout windows on the primary; on local
dev a 10k-row batch completes in ~250ms.

5_000
MARKETING_COLUMNS =

Scalar (now-legacy) columns to mirror into the marketing_meta JSONB
blob. The Tracker writes JSONB directly (Phase 2) and the underlying
columns were renamed to legacy_* in Phase 3 — this class survives
only as a rollback safety net: if marketing_meta ever needs to be
rebuilt from the frozen scalar data, run this migration against the
legacy_* columns.

%w[legacy_gclid legacy_gbraid legacy_wbraid].freeze
DEVICE_COLUMNS =

Scalar (now-legacy) columns to mirror into the device_meta JSONB blob.

%w[
  legacy_browser legacy_os legacy_device_type
  legacy_screen_width legacy_screen_height legacy_dnt legacy_locale
].freeze

Instance Method Summary collapse

Instance Method Details

#collectionArray<Array<Integer>>

Yields one ID batch per BATCH_SIZE rows that still need backfilling
(either JSONB blob still equals '{}'::jsonb). Returned eagerly as
an Array because OnlineMigrations::MigrationJob#build_enumerator
only handles ActiveRecord::Relation, BatchEnumerator, and
Array from #collection — a raw Enumerator would raise
ArgumentError before on_start runs and wedge the migration row
at status='enqueued'.

Returns:

  • (Array<Array<Integer>>)

    one array of IDs per batch



66
67
68
69
70
71
72
73
74
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 66

def collection
  batches = []
  ::Visit.where("marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb")
         .in_batches(of: BATCH_SIZE) do |relation|
    ids = relation.pluck(:id)
    batches << ids if ids.any?
  end
  batches
end

#countInteger

Drives the online_migrations progress bar — counts rows that still
have an empty JSONB blob on either side.

Returns:

  • (Integer)


135
136
137
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 135

def count
  ::Visit.where("marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb").count
end

#process(ids) ⇒ void

This method returns an undefined value.

Migrates one batch of visit IDs in a single SQL UPDATE. jsonb_strip_nulls
drops keys whose scalar value is NULL so we don't store {"gclid": null}
noise. The AND … = '{}'::jsonb predicate keeps the write idempotent:
rows already migrated (e.g. by the model's dual-write) are skipped.

Per-blob CASE: the WHERE clause matches when either blob is empty,
but the SET only rewrites the side that's actually empty. Without the
per-blob guard, a row with populated marketing_meta and empty
device_meta would have marketing_meta rebuilt from legacy_*
and the legacy columns are NULL on every row created after Phase 2
(PR #820), so the rewrite would silently overwrite good JSONB data
with {} (CodeRabbit catch on PR #822).

Parameters:

  • ids (Array<Integer>)

    visit IDs to migrate in this batch.



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/online_migrations/data_migrations/backfill_visit_meta_jsonb.rb', line 91

def process(ids)
  return if ids.empty?

  # `ids` are integers from `Visit#pluck(:id)` in `#collection`, so the
  # `IN (…)` interpolation is safe by construction — no untrusted input
  # can reach the SQL string. We interpolate (rather than using a
  # placeholder) because the batch contains up to BATCH_SIZE (5_000)
  # IDs, which pushes against PostgreSQL's prepared-statement parameter
  # limits at the higher end of the per-iteration sub-batch fan-out.
  sql = <<~SQL.squish
    UPDATE visits
    SET marketing_meta = CASE
          WHEN marketing_meta = '{}'::jsonb THEN
            jsonb_strip_nulls(jsonb_build_object(
              'gclid',  legacy_gclid,
              'gbraid', legacy_gbraid,
              'wbraid', legacy_wbraid
            ))
          ELSE marketing_meta
        END,
        device_meta = CASE
          WHEN device_meta = '{}'::jsonb THEN
            jsonb_strip_nulls(jsonb_build_object(
              'browser',       legacy_browser,
              'os',            legacy_os,
              'device_type',   legacy_device_type,
              'screen_width',  legacy_screen_width,
              'screen_height', legacy_screen_height,
              'dnt',           legacy_dnt,
              'locale',        legacy_locale
            ))
          ELSE device_meta
        END
    WHERE id IN (#{ids.join(',')})
      AND (marketing_meta = '{}'::jsonb OR device_meta = '{}'::jsonb)
  SQL

  ApplicationRecord.with_connection { |conn| conn.execute(sql) }
end