Module: ContentEmbedding::UnifiedSearchable
- Extended by:
- ActiveSupport::Concern
- Included in:
- ContentEmbedding
- Defined in:
- app/models/concerns/content_embedding/unified_searchable.rb
Overview
Concern providing cross-modal "unified" search over Gemini Embedding 2 vectors
stored in the unified_embedding column. Text and images are embedded into the
same multimodal space, so a single query retrieves both.
These live alongside the OpenAI-space search (semantic_search / hybrid_search)
on the base model. SemanticSearchService routes here when the unified-cutover
flag is on. Sensitive types are excluded by default (exclude_sensitive:) —
required parity with the OpenAI path before any public/MCP cutover.
Base-model constants are referenced fully-qualified (e.g.
ContentEmbedding::UNIFIED_MODEL) because the compact module ContentEmbedding::UnifiedSearchable form does not place ContentEmbedding in
the lexical constant-lookup scope.
Class Method Summary collapse
-
.generate_unified_query_embedding(query, model: ContentEmbedding::UNIFIED_MODEL, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS) ⇒ Array<Float>?
Generate (and cache) a query embedding via Gemini Embedding 2 — the only embedding model.
-
.unified_hybrid_search(query, limit: 10, types: nil, locale: 'en', published_only: true, k: 60, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true, model: ContentEmbedding::UNIFIED_MODEL) ⇒ Array<ContentEmbedding>
Hybrid (vector + keyword RRF) search over the unified space — parity with
hybrid_search, but on Gemini vectors and cross-modal. -
.unified_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10, types: nil, locale: 'en', published_only: true, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true) ⇒ ActiveRecord::Relation
Vector search over the unified (Gemini) space.
-
.unified_visual_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10) ⇒ ActiveRecord::Relation
Cross-modal visual search (text -> image) over the unified space.
Class Method Details
.generate_unified_query_embedding(query, model: ContentEmbedding::UNIFIED_MODEL, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS) ⇒ Array<Float>?
Generate (and cache) a query embedding via Gemini Embedding 2 — the only
embedding model. The model arg is retained for cache-key namespacing and
caller back-compat; every query embeds through Gemini regardless.
157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'app/models/concerns/content_embedding/unified_searchable.rb', line 157 def (query, model: ContentEmbedding::UNIFIED_MODEL, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS) cache_key = "unified_query_embedding:#{model}:#{Digest::SHA256.hexdigest(query.downcase.strip)[0..15]}" cached = Rails.cache.read(cache_key) return cached if cached.present? vector = Embedding::Gemini.(query, dimensions: dimensions) Rails.cache.write(cache_key, vector, expires_in: 24.hours) if vector.present? vector rescue StandardError => e Rails.logger.error "Failed to generate unified query embedding (#{model}): #{e.}" nil end |
.unified_hybrid_search(query, limit: 10, types: nil, locale: 'en', published_only: true, k: 60, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true, model: ContentEmbedding::UNIFIED_MODEL) ⇒ Array<ContentEmbedding>
Hybrid (vector + keyword RRF) search over the unified space — parity with
hybrid_search, but on Gemini vectors and cross-modal.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'app/models/concerns/content_embedding/unified_searchable.rb', line 86 def unified_hybrid_search(query, limit: 10, types: nil, locale: 'en', published_only: true, k: 60, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true, model: ContentEmbedding::UNIFIED_MODEL) return [] if query.blank? fetch_limit = [limit * 3, 30].max vector_results = unified_search(query, model: model, limit: fetch_limit, types: types, locale: locale, published_only: published_only, min_similarity: min_similarity, exclude_sensitive: exclude_sensitive).to_a # Keyword half must rank the SAME row set as the vector half (unified rows), # so RRF fuses by aligned content_embedding ids — mirror unified_search's # model selection + with_unified_embedding exactly. search_models = model == ContentEmbedding::UNIFIED_MODEL ? ContentEmbedding::UNIFIED_MODELS : model unified_scope = where(content_type: 'unified', embedding_model: search_models). keyword_results = keyword_search_for_rrf(query, fetch_limit, types, locale, published_only, exclude_sensitive: exclude_sensitive, base_scope: unified_scope) rrf_scores = calculate_rrf_scores(vector_results, keyword_results, k) sorted_entries = rrf_scores.sort_by { |_id, score| -score }.first(limit) return [] if sorted_entries.empty? score_map = sorted_entries.to_h sorted_ids = sorted_entries.map(&:first) records = where(id: sorted_ids).includes(:embeddable).index_by(&:id) sorted_ids.filter_map do |id| record = records[id] next unless record # Store RRF score as a virtual distance for similarity_score parity. record.define_singleton_method(:neighbor_distance) { 1.0 - score_map[id] } record end end |
.unified_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10, types: nil, locale: 'en', published_only: true, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true) ⇒ ActiveRecord::Relation
Vector search over the unified (Gemini) space. Spans every embeddable type —
images included — since they share one vector space.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'app/models/concerns/content_embedding/unified_searchable.rb', line 35 def unified_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10, types: nil, locale: 'en', published_only: true, min_similarity: ContentEmbedding::SEMANTIC_SIMILARITY_THRESHOLD, exclude_sensitive: true) return none if query.blank? model_config = ContentEmbedding::EMBEDDING_MODELS[model] raise ArgumentError, "Unknown embedding model: #{model}" unless model_config dimensions = model_config[:dimensions] = (query, model: model, dimensions: dimensions) return none unless # Query vector is bound via sanitize_sql_array; dimensions is a trusted integer. vector_literal = "[#{.join(',')}]" distance_sql = sanitize_sql_array(["unified_embedding::vector(#{dimensions.to_i}) <=> ?::vector", vector_literal]) # A GA-model search also matches transitional preview rows pending re-embed. scope = by_model(model == ContentEmbedding::UNIFIED_MODEL ? ContentEmbedding::UNIFIED_MODELS : model) . .select("#{table_name}.*, #{distance_sql} AS neighbor_distance") .order(Arel.sql(distance_sql)) scope = scope.mcp_safe if exclude_sensitive if min_similarity.positive? # Cosine distance is 0–2; convert the similarity floor with the same # convention as ContentEmbedding#similarity_score (similarity = # 1 - distance/2) so min_similarity matches the displayed score. max_distance = (2.0 * (1.0 - min_similarity)).round(6) scope = scope.where(sanitize_sql_array(["unified_embedding::vector(#{dimensions.to_i}) <=> ?::vector <= ?", vector_literal, max_distance])) end scope = scope.by_type(types) if types.present? scope = scope.for_locale(locale) scope = scope.published_only if published_only scope.limit(limit).includes(:embeddable) end |
.unified_visual_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10) ⇒ ActiveRecord::Relation
Cross-modal visual search (text -> image) over the unified space.
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'app/models/concerns/content_embedding/unified_searchable.rb', line 128 def unified_visual_search(query, model: ContentEmbedding::UNIFIED_MODEL, limit: 10) return none if query.blank? model_config = ContentEmbedding::EMBEDDING_MODELS[model] raise ArgumentError, "Unknown embedding model: #{model}" unless model_config dimensions = model_config[:dimensions] = (query, model: model, dimensions: dimensions) return none unless distance_sql = sanitize_sql_array(["unified_embedding::vector(#{dimensions.to_i}) <=> ?::vector", "[#{.join(',')}]"]) by_model(model == ContentEmbedding::UNIFIED_MODEL ? ContentEmbedding::UNIFIED_MODELS : model) .where(embeddable_type: 'Image') . .select("#{table_name}.*, #{distance_sql} AS neighbor_distance") .order(Arel.sql(distance_sql)) .limit(limit) .includes(:embeddable) end |