Class: OnlineMigrations::DataMigrations::ScrubNullClickIdsFromVisitMarketingMeta
- Inherits:
-
OnlineMigrations::DataMigration
- Object
- OnlineMigrations::DataMigration
- OnlineMigrations::DataMigrations::ScrubNullClickIdsFromVisitMarketingMeta
- Defined in:
- lib/online_migrations/data_migrations/scrub_null_click_ids_from_visit_marketing_meta.rb
Overview
Strips literal JSON-null keys (gclid, gbraid, wbraid) from
visits.marketing_meta. Pairs with the tracker fix in PR #878
(Tracking::Tracker was routing nil click-ids through
data[:marketing_meta_*], which the jsonb_attributes accessor
persisted as {gclid: null} rather than dropping the key).
Scope
~46.8K rows at write time on a ~10.1M-row, ~7.9GB visits table.
Way under the "must be background" threshold by row count, but
visits is a high-write table — the single-statement UPDATE
variant of this migration would hold its transaction open long
enough to interfere with autovacuum and produce contention on the
marketing_meta GIN index. Splitting into batched UPDATE … WHERE id IN (…) statements keeps each transaction sub-second and lets
autovacuum/lock-retries breathe.
Mirrors the BackfillVisitMetaJsonb shape so the monitoring +
operational pattern is the same one ops/dev are already familiar
with from the Phase 0 backfill.
Idempotency
Each batch's UPDATE uses PG's - operator (marketing_meta - 'gclid' - 'gbraid' - 'wbraid') which is a no-op on rows where the
named keys aren't present. The WHERE clause filters batches to
the still-polluted rows; re-running after a partial drain only
touches the remainder.
Monitor:
OnlineMigrations::BackgroundDataMigrations::Migration
.find_by(migration_name: 'ScrubNullClickIdsFromVisitMarketingMeta')
Constant Summary collapse
- BATCH_SIZE =
Rows per batched SQL UPDATE. Tuned the same as
BackfillVisitMetaJsonb — a 5K-row batch finishes well under
the lock-timeout/autovacuum windows on the primary and stays
below PostgreSQL's prepared-statement parameter limits. 5_000- POLLUTED_PREDICATE =
Filter for rows where the JSONB accessor wrote a literal JSON
null for any of the three click-id keys. Selected as a module
constant so #collection and #count share the exact predicate. <<~SQL.squish marketing_meta -> 'gclid' = 'null'::jsonb OR marketing_meta -> 'gbraid' = 'null'::jsonb OR marketing_meta -> 'wbraid' = 'null'::jsonb SQL
Instance Method Summary collapse
-
#collection ⇒ Array<Array<Integer>>
Yields one ID batch per
BATCH_SIZErows that still carry at least one JSON-null click-id key. -
#count ⇒ Integer
Drives the online_migrations progress bar — counts rows still carrying a JSON-null click-id key.
-
#process(ids) ⇒ void
Strips the three click-id keys from one batch of visits in a single SQL
UPDATE.
Instance Method Details
#collection ⇒ Array<Array<Integer>>
Yields one ID batch per BATCH_SIZE rows that still carry at
least one JSON-null click-id key. Eagerly materialized to an
Array because OnlineMigrations::MigrationJob#build_enumerator
only handles ActiveRecord::Relation, BatchEnumerator, and
Array from #collection — a raw Enumerator would wedge the
migration at status='enqueued'.
64 65 66 67 68 69 70 71 72 |
# File 'lib/online_migrations/data_migrations/scrub_null_click_ids_from_visit_marketing_meta.rb', line 64 def collection batches = [] ::Visit.where(POLLUTED_PREDICATE) .in_batches(of: BATCH_SIZE) do |relation| ids = relation.pluck(:id) batches << ids if ids.any? end batches end |
#count ⇒ Integer
Drives the online_migrations progress bar — counts rows still
carrying a JSON-null click-id key.
105 106 107 |
# File 'lib/online_migrations/data_migrations/scrub_null_click_ids_from_visit_marketing_meta.rb', line 105 def count ::Visit.where(POLLUTED_PREDICATE).count end |
#process(ids) ⇒ void
This method returns an undefined value.
Strips the three click-id keys from one batch of visits in a
single SQL UPDATE. PG's jsonb - text operator removes the
named key, no-op when the key isn't present — so the statement
is safe on rows where the tracker had only stamped one or two of
the three keys as null.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/online_migrations/data_migrations/scrub_null_click_ids_from_visit_marketing_meta.rb', line 82 def process(ids) return if ids.empty? # `ids` are integers from `Visit#pluck(:id)` in `#collection`, so # the `IN (…)` interpolation is safe by construction — no # untrusted input can reach the SQL string. We interpolate (not # bind) because the batch contains up to BATCH_SIZE (5_000) IDs, # which pushes against PG's prepared-statement parameter limits # at the higher end of the per-iteration fan-out. sql = <<~SQL.squish UPDATE visits SET marketing_meta = marketing_meta - 'gclid' - 'gbraid' - 'wbraid' WHERE id IN (#{ids.join(',')}) AND (#{POLLUTED_PREDICATE}) SQL ApplicationRecord.with_connection { |conn| conn.execute(sql) } end |