Class: Embedding::TextUnifier
- Inherits:
-
Object
- Object
- Embedding::TextUnifier
- Defined in:
- app/services/embedding/text_unifier.rb
Overview
Backfills Gemini Embedding 2 vectors into content_embeddings.unified_embedding
for text content, so text and images share one multimodal vector space —
the headline benefit of Gemini Embedding 2 over the legacy OpenAI text path.
It reads each source content_type = 'primary' text row, re-embeds its
content with Gemini (batched), and upserts a sibling content_type = 'unified' row tagged gemini-embedding-2. The OpenAI embedding column and
rows are left untouched, so the existing ContentEmbedding.semantic_search
keeps serving search until the read path is cut over to unified_search
(a deliberate follow-up once coverage reaches 100%).
INERT by default: nothing invokes this from a model callback or the live
search path. Run it explicitly via rake embeddings:backfill_unified_text
(count-first, gated) per the runbook.
Constant Summary collapse
- TEXT_TYPES =
Embeddable TEXT types eligible for unification — every embeddable type
except Image (images are embedded multimodally by the image pipeline).
Includes the sensitive internal types (CallRecord/Activity/Communication). %w[ Post Article Showcase Video Item ProductLine SiteMap ReviewsIo CallRecord Activity Communication AssistantBrainEntry ].freeze
- MODEL =
GA multimodal model written into unified_embedding.
ContentEmbedding::UNIFIED_MODEL
- DIMENSIONS =
MRL output width (HNSW-compatible; matches image unified embeddings).
1536- BATCH_SIZE =
Items per Gemini batchEmbedContents request.
Embedding::Gemini::MAX_BATCH_SIZE
- MAX_CONTENT_LENGTH =
Truncate to stay within the model's ~8k-token text window.
Models::Embeddable::MAX_CONTENT_LENGTH
Class Method Summary collapse
-
.backfill(primary_rows, dimensions: DIMENSIONS) ⇒ Hash
Backfill a set of source primary rows.
Instance Method Summary collapse
- #backfill(primary_rows) ⇒ Object
-
#initialize(dimensions: DIMENSIONS) ⇒ TextUnifier
constructor
A new instance of TextUnifier.
Constructor Details
#initialize(dimensions: DIMENSIONS) ⇒ TextUnifier
Returns a new instance of TextUnifier.
48 49 50 |
# File 'app/services/embedding/text_unifier.rb', line 48 def initialize(dimensions: DIMENSIONS) @dimensions = dimensions end |
Class Method Details
.backfill(primary_rows, dimensions: DIMENSIONS) ⇒ Hash
Backfill a set of source primary rows.
44 45 46 |
# File 'app/services/embedding/text_unifier.rb', line 44 def self.backfill(primary_rows, dimensions: DIMENSIONS) new(dimensions: dimensions).backfill(primary_rows) end |
Instance Method Details
#backfill(primary_rows) ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'app/services/embedding/text_unifier.rb', line 52 def backfill(primary_rows) counts = { processed: 0, skipped: 0, failed: 0 } each_batch(primary_rows) do |rows| prepared = rows.filter_map do |row| content = content_for(row) if content.blank? counts[:skipped] += 1 nil else { row: row, content: content } end end next if prepared.empty? vectors = Embedding::Gemini.(prepared.pluck(:content), dimensions: @dimensions) write_batch(prepared, vectors, counts) end counts end |