Class: TextEmbeddingPopulationWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::IterableJob, Sidekiq::Job
Defined in:
app/workers/text_embedding_population_worker.rb

Overview

Nightly worker to backfill text-based content embeddings for any embeddable type.

Uses Sidekiq::IterableJob so progress is saved after each record — a mid-run
deploy or worker restart resumes from the last successful record rather than
restarting from scratch.

Each iteration enqueues an EmbeddingWorker job rather than generating the
embedding inline, keeping this worker fast and letting the ai_embeddings
queue handle rate limiting and retries.

Supports all text-embedding types: Activity, Post, Article, Showcase, Video,
Item, ProductLine, SiteMap, ReviewsIo, CallRecord, AssistantBrainEntry.
(Images use a separate pipeline via ImageEmbeddingPopulationWorker.)

Scheduled: Nightly via config/sidekiq_production_schedule.yml

Examples:

Backfill activity embeddings (default: 1000/run)

TextEmbeddingPopulationWorker.perform_async('embeddable_type' => 'Activity')

Backfill with custom limit

TextEmbeddingPopulationWorker.perform_async('embeddable_type' => 'Activity', 'limit' => 5000)

Constant Summary collapse

ALLOWED_TYPES =

Recognised allowed types.

%w[
  Activity Post Article Showcase Video Item ProductLine
  SiteMap ReviewsIo CallRecord AssistantBrainEntry
].freeze
DEFAULT_LIMIT =

Default limit.

1000

Instance Method Summary collapse

Instance Method Details

#build_enumerator(options = nil, cursor:) ⇒ Object



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'app/workers/text_embedding_population_worker.rb', line 40

def build_enumerator(options = nil, cursor:)
  opts = options.to_h.with_indifferent_access
  @embeddable_type = opts[:embeddable_type]
  @limit = (opts[:limit] || DEFAULT_LIMIT).to_i
  @queued_count = 0

  if @embeddable_type.blank?
    log_error 'embeddable_type is required'
    return nil
  end

  unless ALLOWED_TYPES.include?(@embeddable_type)
    log_error "Invalid embeddable_type: #{@embeddable_type}"
    return nil
  end

  scope = build_scope
  log_info "Starting population for #{@embeddable_type}: #{scope.count} candidates (limit: #{@limit})"

  active_record_records_enumerator(scope.limit(@limit), cursor: cursor)
end

#each_iteration(record, *_args) ⇒ Object



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'app/workers/text_embedding_population_worker.rb', line 62

def each_iteration(record, *_args)
  content = begin
    record.content_for_embedding
  rescue StandardError
    nil
  end
  if content.blank?
    log_debug "Skipping #{@embeddable_type}##{record.id}: no embeddable content"
    return
  end

  EmbeddingWorker.perform_async(@embeddable_type, record.id)
  @queued_count += 1

  log_info "Progress: #{@queued_count} queued" if (@queued_count % 200).zero?
end

#on_completeObject



79
80
81
# File 'app/workers/text_embedding_population_worker.rb', line 79

def on_complete
  log_info "Complete: queued #{@queued_count} #{@embeddable_type || 'mixed'} records for embedding"
end