Class: TextEmbeddingPopulationWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::IterableJob, Sidekiq::Worker
Defined in:
app/workers/text_embedding_population_worker.rb

Overview

Nightly worker to backfill text-based content embeddings for any embeddable type.

Uses Sidekiq::IterableJob so progress is saved after each record — a mid-run
deploy or worker restart resumes from the last successful record rather than
restarting from scratch.

Each iteration enqueues an EmbeddingWorker job rather than generating the
embedding inline, keeping this worker fast and letting the ai_embeddings
queue handle rate limiting and retries.

Supports all text-embedding types: Activity, Post, Article, Showcase, Video,
Item, ProductLine, SiteMap, ReviewsIo, CallRecord, AssistantBrainEntry.
(Images use a separate pipeline via ImageEmbeddingPopulationWorker.)

Scheduled: Nightly via config/sidekiq_production_schedule.yml

Examples:

Backfill activity embeddings (default: 1000/run)

TextEmbeddingPopulationWorker.perform_async('embeddable_type' => 'Activity')

Backfill with custom limit

TextEmbeddingPopulationWorker.perform_async('embeddable_type' => 'Activity', 'limit' => 5000)

Constant Summary collapse

ALLOWED_TYPES =
%w[
  Activity Post Article Showcase Video Item ProductLine
  SiteMap ReviewsIo CallRecord AssistantBrainEntry
].freeze
DEFAULT_LIMIT =
1000

Instance Method Summary collapse

Instance Method Details

#build_enumerator(options = nil, cursor:) ⇒ Object



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'app/workers/text_embedding_population_worker.rb', line 38

def build_enumerator(options = nil, cursor:)
  opts = options.to_h.with_indifferent_access
  @embeddable_type = opts[:embeddable_type]
  @limit = (opts[:limit] || DEFAULT_LIMIT).to_i
  @queued_count = 0

  if @embeddable_type.blank?
    log_error 'embeddable_type is required'
    return nil
  end

  unless ALLOWED_TYPES.include?(@embeddable_type)
    log_error "Invalid embeddable_type: #{@embeddable_type}"
    return nil
  end

  scope = build_scope
  log_info "Starting population for #{@embeddable_type}: #{scope.count} candidates (limit: #{@limit})"

  active_record_records_enumerator(scope.limit(@limit), cursor: cursor)
end

#each_iteration(record, *_args) ⇒ Object



60
61
62
63
64
65
66
67
68
69
70
71
# File 'app/workers/text_embedding_population_worker.rb', line 60

def each_iteration(record, *_args)
  content = record.content_for_embedding rescue nil
  if content.blank?
    log_debug "Skipping #{@embeddable_type}##{record.id}: no embeddable content"
    return
  end

  EmbeddingWorker.perform_async(@embeddable_type, record.id)
  @queued_count += 1

  log_info "Progress: #{@queued_count} queued" if (@queued_count % 200).zero?
end

#on_completeObject



73
74
75
# File 'app/workers/text_embedding_population_worker.rb', line 73

def on_complete
  log_info "Complete: queued #{@queued_count} #{@embeddable_type || 'mixed'} records for embedding"
end