Class: TextEmbeddingPopulationWorker
- Inherits:
-
Object
- Object
- TextEmbeddingPopulationWorker
- Includes:
- Sidekiq::IterableJob, Sidekiq::Job
- Defined in:
- app/workers/text_embedding_population_worker.rb
Overview
Nightly worker to backfill text-based content embeddings for any embeddable type.
Uses Sidekiq::IterableJob so progress is saved after each record — a mid-run
deploy or worker restart resumes from the last successful record rather than
restarting from scratch.
Each iteration enqueues an EmbeddingWorker job rather than generating the
embedding inline, keeping this worker fast and letting the ai_embeddings
queue handle rate limiting and retries.
Supports all text-embedding types: Activity, Post, Article, Showcase, Video,
Item, ProductLine, SiteMap, ReviewsIo, CallRecord, AssistantBrainEntry.
(Images use a separate pipeline via ImageEmbeddingPopulationWorker.)
Scheduled: Nightly via config/sidekiq_production_schedule.yml
Constant Summary collapse
- ALLOWED_TYPES =
Recognised allowed types.
%w[ Activity Post Article Showcase Video Item ProductLine SiteMap ReviewsIo CallRecord AssistantBrainEntry ].freeze
- DEFAULT_LIMIT =
Default limit.
1000
Instance Method Summary collapse
- #build_enumerator(options = nil, cursor:) ⇒ Object
- #each_iteration(record, *_args) ⇒ Object
- #on_complete ⇒ Object
Instance Method Details
#build_enumerator(options = nil, cursor:) ⇒ Object
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'app/workers/text_embedding_population_worker.rb', line 40 def build_enumerator( = nil, cursor:) opts = .to_h.with_indifferent_access @embeddable_type = opts[:embeddable_type] @limit = (opts[:limit] || DEFAULT_LIMIT).to_i @queued_count = 0 if @embeddable_type.blank? log_error 'embeddable_type is required' return nil end unless ALLOWED_TYPES.include?(@embeddable_type) log_error "Invalid embeddable_type: #{@embeddable_type}" return nil end scope = build_scope log_info "Starting population for #{@embeddable_type}: #{scope.count} candidates (limit: #{@limit})" active_record_records_enumerator(scope.limit(@limit), cursor: cursor) end |
#each_iteration(record, *_args) ⇒ Object
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'app/workers/text_embedding_population_worker.rb', line 62 def each_iteration(record, *_args) content = begin record. rescue StandardError nil end if content.blank? log_debug "Skipping #{@embeddable_type}##{record.id}: no embeddable content" return end EmbeddingWorker.perform_async(@embeddable_type, record.id) @queued_count += 1 log_info "Progress: #{@queued_count} queued" if (@queued_count % 200).zero? end |
#on_complete ⇒ Object
79 80 81 |
# File 'app/workers/text_embedding_population_worker.rb', line 79 def on_complete log_info "Complete: queued #{@queued_count} #{@embeddable_type || 'mixed'} records for embedding" end |