Class: ImageEmbeddingPopulationWorker
- Inherits:
-
Object
- Object
- ImageEmbeddingPopulationWorker
- Includes:
- Sidekiq::IterableJob, Sidekiq::Worker
- Defined in:
- app/workers/image_embedding_population_worker.rb
Overview
Nightly worker to backfill images missing Gemini Embedding 2 unified embeddings,
replacing all legacy Jina v4 embeddings in the process.
Uses Sidekiq::IterableJob so progress is checkpointed after each image — a
mid-run deploy or worker restart resumes from the last successful record rather
than restarting from scratch.
Each iteration queues an ImageFullAnalysisWorker job which runs the full
pipeline: pHash → Gemini Embedding 2 → Gemini Flash vision.
Prioritizes product primary images first (most critical for search), then all
remaining active images, both groups ordered by id for stable cursor resumption.
Scheduled: Nightly at 2:30 AM CT
Constant Summary collapse
- DEFAULT_LIMIT =
5000- TARGET_MODEL =
'gemini-embedding-2-preview'
Instance Method Summary collapse
- #build_enumerator(options = nil, cursor:) ⇒ Object
- #each_iteration(image, *_args) ⇒ Object
- #on_complete ⇒ Object
Instance Method Details
#build_enumerator(options = nil, cursor:) ⇒ Object
36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'app/workers/image_embedding_population_worker.rb', line 36 def build_enumerator( = nil, cursor:) opts = .to_h.with_indifferent_access @limit = (opts[:limit] || DEFAULT_LIMIT).to_i @product_images_only = opts[:product_images_only].to_b @queued_count = 0 scope = build_scope candidate_count = scope.count log_info "Starting: #{candidate_count} images need Gemini embedding (limit: #{@limit}, product_only: #{@product_images_only})" return nil if candidate_count.zero? active_record_records_enumerator(scope.limit(@limit), cursor: cursor) end |
#each_iteration(image, *_args) ⇒ Object
51 52 53 54 55 56 |
# File 'app/workers/image_embedding_population_worker.rb', line 51 def each_iteration(image, *_args) ImageFullAnalysisWorker.perform_async(image.id, { 'force' => false }) @queued_count += 1 log_info "Progress: #{@queued_count} queued" if (@queued_count % 500).zero? end |
#on_complete ⇒ Object
58 59 60 61 62 63 64 |
# File 'app/workers/image_embedding_population_worker.rb', line 58 def on_complete if @queued_count.zero? log_info 'All images already have Gemini embeddings — nothing to queue' else log_info "Complete: queued #{@queued_count} images for AI analysis" end end |