Class: EmbeddingRefreshWorker
- Inherits:
-
Object
- Object
- EmbeddingRefreshWorker
- Includes:
- Sidekiq::Worker
- Defined in:
- app/workers/embedding_refresh_worker.rb
Overview
Scheduled worker to detect and regenerate stale embeddings.
Runs weekly to catch content changes that weren't detected by model callbacks,
such as HABTM association changes (product_lines, categories, items).
For Images, checks for missing unified embeddings and Vision analysis.
Uses the Gemini pipeline: pHash → Gemini Embedding 2 → Gemini Flash Vision
Scheduled: Sundays at 3:30am (after sitemap generation at 1:10am)
Queue: low priority to avoid impacting user-facing operations
Constant Summary collapse
- EMBEDDABLE_TYPES =
Content types to check for stale embeddings
Order matters: check smaller collections first to spread loadNote: CallRecord transcription is handled by DailyCallRecordTranscriptionWorker,
but we still check here for:- LeMUR re-runs that update content without regenerating embeddings
- Failed embedding jobs (transcribed but no embedding)
- Manual edits to call record metadata
%w[ Post Article Showcase Video Image SiteMap ReviewsIo Item CallRecord ].freeze
- MAX_RECORDS_PER_TYPE =
Maximum records to queue per run to avoid overwhelming the queue
500- MAX_IMAGES_FULL_ANALYSIS =
Maximum images to queue for full analysis per run
100- TYPE_CHECK_DELAY =
Delay between checking each type (seconds)
2
Instance Method Summary collapse
Instance Method Details
#perform ⇒ Object
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'app/workers/embedding_refresh_worker.rb', line 50 def perform log_info 'Starting weekly embedding refresh check...' total_queued = 0 stats = {} EMBEDDABLE_TYPES.each do |type| # Small delay between types to spread database load sleep(TYPE_CHECK_DELAY) unless total_queued.zero? count = check_and_queue_stale(type) stats[type] = count total_queued += count log_info "#{type}: #{count} stale embeddings queued" if count.positive? end # Queue images missing Vision analysis or unified embedding for full pipeline full_analysis_count = queue_images_for_full_analysis stats['Image_FullAnalysis'] = full_analysis_count total_queued += full_analysis_count # Also extract fresh content for static pages before embedding extract_static_pages_if_needed log_info "Embedding refresh complete. Total queued: #{total_queued}" log_info "Stats: #{stats.select { |_, v| v.positive? }}" if total_queued.positive? stats rescue StandardError => e log_error "Embedding refresh failed: #{e.}" ErrorReporting.error(e) raise end |