Class: ImageEmbeddingPopulationWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::IterableJob, Sidekiq::Worker
Defined in:
app/workers/image_embedding_population_worker.rb

Overview

Nightly worker to backfill images missing Gemini Embedding 2 unified embeddings,
replacing all legacy Jina v4 embeddings in the process.

Uses Sidekiq::IterableJob so progress is checkpointed after each image — a
mid-run deploy or worker restart resumes from the last successful record rather
than restarting from scratch.

Each iteration queues an ImageFullAnalysisWorker job which runs the full
pipeline: pHash → Gemini Embedding 2 → Gemini Flash vision.

Prioritizes product primary images first (most critical for search), then all
remaining active images, both groups ordered by id for stable cursor resumption.

Scheduled: Nightly at 2:30 AM CT

Examples:

Manual trigger (default: 5000 images/run)

ImageEmbeddingPopulationWorker.perform_async

Custom limit

ImageEmbeddingPopulationWorker.perform_async('limit' => 1000)

Product primary images only

ImageEmbeddingPopulationWorker.perform_async('product_images_only' => true)

Constant Summary collapse

DEFAULT_LIMIT =
5000
TARGET_MODEL =
'gemini-embedding-2-preview'

Instance Method Summary collapse

Instance Method Details

#build_enumerator(options = nil, cursor:) ⇒ Object



36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'app/workers/image_embedding_population_worker.rb', line 36

def build_enumerator(options = nil, cursor:)
  opts = options.to_h.with_indifferent_access
  @limit = (opts[:limit] || DEFAULT_LIMIT).to_i
  @product_images_only = opts[:product_images_only].to_b
  @queued_count = 0

  scope = build_scope
  candidate_count = scope.count
  log_info "Starting: #{candidate_count} images need Gemini embedding (limit: #{@limit}, product_only: #{@product_images_only})"

  return nil if candidate_count.zero?

  active_record_records_enumerator(scope.limit(@limit), cursor: cursor)
end

#each_iteration(image, *_args) ⇒ Object



51
52
53
54
55
56
# File 'app/workers/image_embedding_population_worker.rb', line 51

def each_iteration(image, *_args)
  ImageFullAnalysisWorker.perform_async(image.id, { 'force' => false })
  @queued_count += 1

  log_info "Progress: #{@queued_count} queued" if (@queued_count % 500).zero?
end

#on_completeObject



58
59
60
61
62
63
64
# File 'app/workers/image_embedding_population_worker.rb', line 58

def on_complete
  if @queued_count.zero?
    log_info 'All images already have Gemini embeddings — nothing to queue'
  else
    log_info "Complete: queued #{@queued_count} images for AI analysis"
  end
end