Class: TextUnifiedEmbeddingBackfillWorker
- Inherits:
-
Object
- Object
- TextUnifiedEmbeddingBackfillWorker
- Includes:
- Sidekiq::IterableJob, Sidekiq::Job
- Defined in:
- app/workers/text_unified_embedding_backfill_worker.rb
Overview
Backfills Gemini Embedding 2 vectors into the unified vector space for TEXT
content, so text and images become cross-modally searchable from one query.
Dual-write, no delete: for each source content_type='primary' text row it
upserts a content_type='unified' sibling tagged gemini-embedding-2 (via
Embedding::TextUnifier), leaving the OpenAI embedding rows and the live
search path untouched. The read path is cut over to the unified space only
after coverage reaches 100% — a deliberate follow-up, not part of this worker.
Self-throttling: Embedding::Gemini caps at 300 req/min, and each iteration is
one batchEmbedContents request (≤100 rows), so the corpus drains without any
mass enqueue. Resumable: Sidekiq::IterableJob checkpoints the PK cursor after
every batch, so a deploy or restart resumes mid-run.
Scheduled nightly; idempotent (skips rows that already have a Gemini unified
sibling), so it's safe to re-run and converges to full coverage.
Constant Summary collapse
- BATCH_SIZE =
One Gemini batchEmbedContents request per iteration.
Embedding::Gemini::MAX_BATCH_SIZE
Instance Method Summary collapse
- #build_enumerator(options = nil, cursor:) ⇒ Object
- #each_iteration(batch, *_args) ⇒ Object
- #on_complete ⇒ Object
Instance Method Details
#build_enumerator(options = nil, cursor:) ⇒ Object
36 37 38 39 40 41 42 43 44 45 |
# File 'app/workers/text_unified_embedding_backfill_worker.rb', line 36 def build_enumerator( = nil, cursor:) opts = .to_h.with_indifferent_access @types = Array(opts[:types]).presence || Embedding::TextUnifier::TEXT_TYPES @counts = { processed: 0, skipped: 0, failed: 0 } scope = candidate_scope log_info "Starting: #{scope.count} primary rows need a Gemini unified sibling (types: #{@types.join(',')})" active_record_batches_enumerator(scope, cursor: cursor, batch_size: BATCH_SIZE) end |
#each_iteration(batch, *_args) ⇒ Object
47 48 49 50 51 52 |
# File 'app/workers/text_unified_embedding_backfill_worker.rb', line 47 def each_iteration(batch, *_args) result = Embedding::TextUnifier.backfill(batch) @counts.each_key { |key| @counts[key] += result[key] } log_info "Progress: #{@counts.inspect}" if (@counts[:processed] % 1000).zero? && @counts[:processed].positive? end |
#on_complete ⇒ Object
54 55 56 |
# File 'app/workers/text_unified_embedding_backfill_worker.rb', line 54 def on_complete log_info "Complete: #{@counts.inspect}" end |