Class: TextEmbeddingPopulationWorker
- Inherits:
-
Object
- Object
- TextEmbeddingPopulationWorker
- Includes:
- Sidekiq::IterableJob, Sidekiq::Worker
- Defined in:
- app/workers/text_embedding_population_worker.rb
Overview
Nightly worker to backfill text-based content embeddings for any embeddable type.
Uses Sidekiq::IterableJob so progress is saved after each record — a mid-run
deploy or worker restart resumes from the last successful record rather than
restarting from scratch.
Each iteration enqueues an EmbeddingWorker job rather than generating the
embedding inline, keeping this worker fast and letting the ai_embeddings
queue handle rate limiting and retries.
Supports all text-embedding types: Activity, Post, Article, Showcase, Video,
Item, ProductLine, SiteMap, ReviewsIo, CallRecord, AssistantBrainEntry.
(Images use a separate pipeline via ImageEmbeddingPopulationWorker.)
Scheduled: Nightly via config/sidekiq_production_schedule.yml
Constant Summary collapse
- ALLOWED_TYPES =
%w[ Activity Post Article Showcase Video Item ProductLine SiteMap ReviewsIo CallRecord AssistantBrainEntry ].freeze
- DEFAULT_LIMIT =
1000
Instance Method Summary collapse
- #build_enumerator(options = nil, cursor:) ⇒ Object
- #each_iteration(record, *_args) ⇒ Object
- #on_complete ⇒ Object
Instance Method Details
#build_enumerator(options = nil, cursor:) ⇒ Object
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'app/workers/text_embedding_population_worker.rb', line 38 def build_enumerator( = nil, cursor:) opts = .to_h.with_indifferent_access @embeddable_type = opts[:embeddable_type] @limit = (opts[:limit] || DEFAULT_LIMIT).to_i @queued_count = 0 if @embeddable_type.blank? log_error 'embeddable_type is required' return nil end unless ALLOWED_TYPES.include?(@embeddable_type) log_error "Invalid embeddable_type: #{@embeddable_type}" return nil end scope = build_scope log_info "Starting population for #{@embeddable_type}: #{scope.count} candidates (limit: #{@limit})" active_record_records_enumerator(scope.limit(@limit), cursor: cursor) end |
#each_iteration(record, *_args) ⇒ Object
60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'app/workers/text_embedding_population_worker.rb', line 60 def each_iteration(record, *_args) content = record. rescue nil if content.blank? log_debug "Skipping #{@embeddable_type}##{record.id}: no embeddable content" return end EmbeddingWorker.perform_async(@embeddable_type, record.id) @queued_count += 1 log_info "Progress: #{@queued_count} queued" if (@queued_count % 200).zero? end |
#on_complete ⇒ Object
73 74 75 |
# File 'app/workers/text_embedding_population_worker.rb', line 73 def on_complete log_info "Complete: queued #{@queued_count} #{@embeddable_type || 'mixed'} records for embedding" end |