Module: ContentEmbedding::TextSearchable
- Extended by:
- ActiveSupport::Concern
- Included in:
- ActivityEmbedding, ArticleEmbedding, CallRecordEmbedding, ItemEmbedding, PostEmbedding, ProductLineEmbedding, ReviewsIoEmbedding, ShowcaseEmbedding, SiteMapEmbedding, VideoEmbedding
- Defined in:
- app/models/concerns/content_embedding/text_searchable.rb
Overview
Concern providing semantic search for text-based content embeddings.
Query vectors are generated by Gemini Embedding 2 (the only embedding model);
stored vectors live in the unified_embedding column.
Include this in partition models that use text embeddings:
- PostEmbedding, ArticleEmbedding, ShowcaseEmbedding
- VideoEmbedding, SiteMapEmbedding, ItemEmbedding
- CallRecordEmbedding, ProductLineEmbedding, ReviewsIoEmbedding
Constant Summary collapse
- SIMILARITY_THRESHOLD =
Minimum similarity threshold (0 = no filtering)
0.0
Class Method Summary collapse
-
.apply_locale_filter(scope, locale) ⇒ ActiveRecord::Relation
Apply locale filter using the partition's table name.
-
.apply_published_filter(scope) ⇒ ActiveRecord::Relation
Override in subclass to apply model-specific published filter.
-
.generate_text_query_embedding(query) ⇒ Array<Float>?
Generate a query embedding via Gemini (cached).
-
.locale_filtered? ⇒ Boolean
Override in partition models that have locale-specific content.
-
.semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD) ⇒ ActiveRecord::Relation
Semantic search within this partition's content type.
Class Method Details
.apply_locale_filter(scope, locale) ⇒ ActiveRecord::Relation
Apply locale filter using the partition's table name.
Only called when locale_filtered? returns true.
83 84 85 86 87 88 89 90 91 92 |
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 83 def apply_locale_filter(scope, locale) locale_str = locale.to_s if locale_str.include?('-') # Exact match for regional locales (en-US, en-CA, fr-CA) scope.where(locale: locale_str) else # Base locale matches itself and all regional variants scope.where("#{table_name}.locale = ? OR #{table_name}.locale LIKE ?", locale_str, "#{locale_str}-%") end end |
.apply_published_filter(scope) ⇒ ActiveRecord::Relation
Override in subclass to apply model-specific published filter
106 107 108 |
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 106 def apply_published_filter(scope) scope # Default: no filtering (override in subclass) end |
.generate_text_query_embedding(query) ⇒ Array<Float>?
Generate a query embedding via Gemini (cached). Delegates to the shared
ContentEmbedding helper so all query embeddings come from one place.
99 100 101 |
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 99 def (query) ContentEmbedding.(query) end |
.locale_filtered? ⇒ Boolean
Override in partition models that have locale-specific content.
Default is false - most content is English-only.
73 74 75 |
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 73 def locale_filtered? false end |
.semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD) ⇒ ActiveRecord::Relation
Semantic search within this partition's content type
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 37 def semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD) return none if query.blank? = (query) return none unless # `threshold:` (neighbor 1.2+) pushes the similarity floor into the # nearest-neighbors query itself as a max cosine-distance filter — no # second hand-rolled `<=>` expression re-binding the query vector. # Convert min_similarity with the same 0–2 cosine convention as # ContentEmbedding#similarity_score (similarity = 1 - distance/2), so a # min_similarity of 0.8 keeps exactly the rows whose displayed # similarity_score is >= 0.8, i.e. cosine distance <= 2 * (1 - 0.8). = { distance: :cosine } [:threshold] = (2.0 * (1.0 - min_similarity)).round(6) if min_similarity.positive? # Search the canonical Gemini "unified" rows for this partition (the # cross-modal space text and images share). scope = unified_content .by_model(ContentEmbedding::UNIFIED_MODELS) . .nearest_neighbors(:unified_embedding, , **) # Apply locale filter only for models with locale-specific content (e.g., SiteMap) scope = apply_locale_filter(scope, locale) if locale_filtered? # Apply published filter if model supports it scope = apply_published_filter(scope) if published_only scope.limit(limit).includes(:embeddable) end |