Skip to content

Image Embeddings (Gemini Embedding 2)

Note: The CLIP/Jina pipeline was removed in early 2026. This document now describes the replacement: Gemini Embedding 2 multimodal embeddings.

Images are embedded using Gemini Embedding 2 (gemini-embedding-2-preview), a native multimodal model that encodes images and text into a shared 1536-dimensional vector space. This enables cross-modal search: describe what you want in words and find matching images.

Find images using natural language descriptions (text → image cross-modal search).

Identify duplicate or near-duplicate images using perceptual hash (pHash) fingerprinting stored in digital_assets.fingerprint.

Find images visually similar to a product by querying the unified embedding space.

ImageFullAnalysisWorker
├─ Step 1: pHash fingerprint (local, no API)
│ → stored in digital_assets.fingerprint
├─ Step 2: Gemini Embedding 2 (image + metadata text)
│ → stored in content_embeddings (content_type='unified',
│ embedding_model='gemini-embedding-2-preview',
│ unified_embedding vector(1536))
└─ Step 3: Gemini Flash vision analysis (optional, for CRM metadata)
→ stored in digital_assets.ai_visual_description

ImageEmbeddingPopulationWorker (Sidekiq::IterableJob) runs nightly at 2:30 AM CT. It queues up to 5,000 images per run, prioritising product primary images first. Uses cursor-based checkpointing so a deploy mid-run resumes from the last record.

The content_embeddings table (partitioned; image partition: content_embeddings_images):

ColumnTypeDescription
content_typestring'unified' for Gemini image embeddings
embedding_modelstring'gemini-embedding-2-preview'
unified_embeddingvector1536-dimensional vector
embedding_dimensionsinteger1536

Fingerprints live on the parent record:

ColumnTypeDescription
digital_assets.fingerprintbigintpHash perceptual hash

The Gemini API key is stored in Rails credentials at google.gemini.api_key.

Rate limiting is handled by Embedding::Gemini via a Redis sliding-window limiter (default: 300 requests/minute, configurable via GEMINI_EMBED_REQUESTS_PER_MINUTE).

ImageFullAnalysisWorker.perform_async(image.id)
# Find images matching a text description
ContentEmbedding::ImageEmbedding.semantic_search("bathroom with heated floors", limit: 10)
# Via the top-level service
ContentEmbedding.unified_visual_search("snow melting driveway", limit: 10)
image = Image.find(123)
image.image_embeddings.unified_content
.nearest_neighbors(:unified_embedding, image_vector, distance: :cosine)
.limit(10)
Terminal window
# Check backfill progress
rake embeddings:progress
# Check detailed image essentials (fingerprints + Gemini coverage)
rake embeddings:essentials_stats
# Trigger nightly backfill worker manually (all active images)
# Enqueues ImageEmbeddingPopulationWorker via Sidekiq
rake embeddings:queue_all_image_full
# Product primary images only
rake embeddings:queue_all_product_full
# Incremental batches (resumable)
rake embeddings:populate_image_full # full pipeline, batch: 50
rake embeddings:populate_image_vision # vision analysis only, batch: 100
# Fingerprints
rake embeddings:populate_fingerprints # incremental, batch: 100
rake embeddings:queue_all_fingerprints # all at once
# Duplicate detection (pHash)
rake embeddings:find_phash_duplicates
rake embeddings:find_phash_duplicates[10] # Hamming distance <= 10
# Test Gemini API connectivity
rake embeddings:test_gemini_embed
# Embedding statistics
rake embeddings:stats
  • Embedding latency: ~1–3s per image (includes image download + Gemini API call)
  • Rate limit: 300 requests/minute (shared across all workers)
  • Vector dimensions: 1536 (Matryoshka truncation from 3072 full quality)
  • Index type: HNSW with cosine distance, per-partition (~1–5ms queries)
  • Storage: ~6KB per embedding (1536 floats × 4 bytes)