Class: ImageEmbeddingPopulationWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::IterableJob, Sidekiq::Job
Defined in:
app/workers/image_embedding_population_worker.rb

Overview

Nightly worker to backfill images missing Gemini Embedding 2 unified embeddings,
replacing all legacy Jina v4 embeddings in the process.

Uses Sidekiq::IterableJob so progress is checkpointed after each image — a
mid-run deploy or worker restart resumes from the last successful record rather
than restarting from scratch.

Each iteration queues an ImageFullAnalysisWorker job which runs the full
pipeline: pHash → Gemini Embedding 2 → Gemini Flash vision.

Prioritizes product primary images first (most critical for search), then all
remaining active images, both groups ordered by id for stable cursor resumption.

Scheduled: Nightly at 2:30 AM CT

Examples:

Manual trigger (default: 5000 images/run)

ImageEmbeddingPopulationWorker.perform_async

Custom limit

ImageEmbeddingPopulationWorker.perform_async('limit' => 1000)

Product primary images only

ImageEmbeddingPopulationWorker.perform_async('product_images_only' => true)

Constant Summary collapse

DEFAULT_LIMIT =

Default limit.

5000
TARGET_MODEL =

Target model (GA), sourced from the canonical registry. Changing the
registry id from the pre-GA preview arms the throttled re-embed: images
still tagged with the old id fall back into the candidate scope and are
re-embedded to the GA model over nightly runs.

AiModelConstants.id(:unified_embedding)

Instance Method Summary collapse

Instance Method Details

#build_enumerator(options = nil, cursor:) ⇒ Object



41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'app/workers/image_embedding_population_worker.rb', line 41

def build_enumerator(options = nil, cursor:)
  opts = options.to_h.with_indifferent_access
  @limit = (opts[:limit] || DEFAULT_LIMIT).to_i
  @product_images_only = opts[:product_images_only].to_b
  @queued_count = 0

  scope = build_scope
  candidate_count = scope.count
  log_info "Starting: #{candidate_count} images need Gemini embedding (limit: #{@limit}, product_only: #{@product_images_only})"

  return nil if candidate_count.zero?

  active_record_records_enumerator(scope.limit(@limit), cursor: cursor)
end

#each_iteration(image, *_args) ⇒ Object



56
57
58
59
60
61
# File 'app/workers/image_embedding_population_worker.rb', line 56

def each_iteration(image, *_args)
  ImageFullAnalysisWorker.perform_async(image.id, { 'force' => false })
  @queued_count += 1

  log_info "Progress: #{@queued_count} queued" if (@queued_count % 500).zero?
end

#on_completeObject



63
64
65
66
67
68
69
# File 'app/workers/image_embedding_population_worker.rb', line 63

def on_complete
  if @queued_count.zero?
    log_info 'All images already have Gemini embeddings — nothing to queue'
  else
    log_info "Complete: queued #{@queued_count} images for AI analysis"
  end
end