Class: ImageFullAnalysisWorker
- Inherits:
-
Object
- Object
- ImageFullAnalysisWorker
- Includes:
- Sidekiq::Job, Workers::StatusBroadcastable
- Defined in:
- app/workers/image_full_analysis_worker.rb
Overview
Background worker for running the complete AI analysis pipeline on an image.
== Pipeline (Gemini)
- pHash Fingerprint - Perceptual hash for duplicate detection (local)
- Gemini Embedding 2 - Native multimodal embedding (image + metadata text)
- Gemini Flash Vision - Describes image content (independent, for CRM/metadata)
The embedding step no longer depends on vision analysis — Gemini Embedding 2
natively understands images without a text intermediary.
Constant Summary collapse
- JOB_TRACKING_KEY =
Key used for job tracking.
"image_full_analysis_jid"- JOB_TRACKING_TTL =
Job tracking ttl.
2.hours.to_i
- MODEL =
Model (GA) — sourced from the canonical registry.
AiModelConstants.id(:unified_embedding)
- DIMENSIONS =
Dimensions.
1536
Instance Attribute Summary
Attributes included from Workers::StatusBroadcastable
Class Method Summary collapse
Instance Method Summary collapse
Methods included from Workers::StatusBroadcastable::Overrides
Class Method Details
.find_running_jid(image_id) ⇒ Object
37 38 39 40 41 42 |
# File 'app/workers/image_full_analysis_worker.rb', line 37 def self.find_running_jid(image_id) jid = Sidekiq.redis { |r| r.get("#{JOB_TRACKING_KEY}:#{image_id}") } return nil unless jid Sidekiq::Status.status(jid).in?(%i[queued working]) ? jid : nil end |
.lock_args(args) ⇒ Object
24 25 26 |
# File 'app/workers/image_full_analysis_worker.rb', line 24 def self.lock_args(args) [args[0]] end |
.track_jid(image_id, jid) ⇒ Object
33 34 35 |
# File 'app/workers/image_full_analysis_worker.rb', line 33 def self.track_jid(image_id, jid) Sidekiq.redis { |r| r.set("#{JOB_TRACKING_KEY}:#{image_id}", jid, ex: JOB_TRACKING_TTL) } end |
Instance Method Details
#perform(image_id, options = {}) ⇒ Object
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'app/workers/image_full_analysis_worker.rb', line 53 def perform(image_id, = {}) @force = [:force].to_b total 4 at 0, 'Starting AI analysis...' image = Image.find_by(id: image_id) unless image store error_message: "Image #{image_id} not found" return log_info("Image #{image_id} not found") end redirect_to_path = [:redirect_to] redirect_to_path ||= "/en-US/images/#{image.slug}/tab_embeddings?target_id=ai_embeddings" store redirect_to: redirect_to_path if image.inactive? store error_message: "Image #{image_id} is inactive" return log_info("Image #{image_id} inactive") end if image.asset&.dig('file_type') == 'non-image' store error_message: 'Non-image file type' return log_info("Image #{image_id} is non-image") end analysis = analyze_needs(image) log_info "Image #{image_id}: #{analysis}" unless analysis[:needs_anything] at 4, 'Already complete' store info_message: 'All up to date' return log_info("Image #{image_id}: Up to date") end # Step 1: pHash at 1, 'Step 1/3: pHash...' if analysis[:needs_phash] begin ImageFingerprintWorker.new.perform(image_id, { force: @force }) log_info "Image #{image_id}: pHash complete" rescue StandardError => e log_error "Image #{image_id}: pHash failed: #{e.}" end end # Step 2: Multimodal embedding (Gemini Embedding 2 — image + metadata text) at 2, 'Step 2/3: Multimodal embedding...' if analysis[:needs_embedding] begin (image) log_info "Image #{image_id}: Multimodal embedding complete" rescue Embedding::Gemini::PermanentError => e # Broken/invalid source image (404 URL, corrupt file). Retrying can't fix # it, so QUARANTINE the asset (invalidate! excludes it from fetches + # embedding and starts the 30-day purge clock) and continue — do NOT # re-raise. Transient errors (rate limit, timeout, connection, 5xx) are # deliberately NOT rescued here: they propagate to the outer rescue, which # re-raises so Sidekiq RETRIES the embed instead of silently dropping it. log_error "Image #{image_id}: Embedding permanently failed (quarantining): #{e.}" image.invalidate!(reason: "embedding: #{e..truncate(120)}") end end # Step 3: Vision description (Gemini Flash — independent, for CRM/metadata) at 3, 'Step 3/3: Vision analysis...' if analysis[:needs_vision] begin result = ImageAnalysis::VisionAnalyzer.call(image, force: @force) if result.success? log_info "Image #{image_id}: Vision analysis complete" else log_error "Image #{image_id}: Vision failed: #{result.error}" end rescue StandardError => e log_error "Image #{image_id}: Vision failed: #{e.}" end end at 4, 'Complete!' store info_message: 'AI analysis complete' log_info "Image #{image_id}: Complete" rescue Embedding::Gemini::RateLimitError, Faraday::TimeoutError, Faraday::ConnectionFailed => e # Transient (rate limit / network). Let Sidekiq retry with backoff; don't # spam error reporting for expected throttling. log_error "Image #{image_id}: transient embedding error, will retry: #{e.}" raise rescue StandardError => e store error_message: "Error: #{e.}" log_error "Error for Image #{image_id}: #{e.}" ErrorReporting.error(e) raise end |