Module: ContentEmbedding::TextSearchable

Extended by:
ActiveSupport::Concern
Included in:
ActivityEmbedding, ArticleEmbedding, CallRecordEmbedding, ItemEmbedding, PostEmbedding, ProductLineEmbedding, ReviewsIoEmbedding, ShowcaseEmbedding, SiteMapEmbedding, VideoEmbedding
Defined in:
app/models/concerns/content_embedding/text_searchable.rb

Overview

Concern providing semantic search for text-based content embeddings.
Query vectors are generated by Gemini Embedding 2 (the only embedding model);
stored vectors live in the unified_embedding column.

Include this in partition models that use text embeddings:

  • PostEmbedding, ArticleEmbedding, ShowcaseEmbedding
  • VideoEmbedding, SiteMapEmbedding, ItemEmbedding
  • CallRecordEmbedding, ProductLineEmbedding, ReviewsIoEmbedding

Examples:

ContentEmbedding::PostEmbedding.semantic_search("spa wellness articles")

Constant Summary collapse

SIMILARITY_THRESHOLD =

Minimum similarity threshold (0 = no filtering)

0.0

Class Method Summary collapse

Class Method Details

.apply_locale_filter(scope, locale) ⇒ ActiveRecord::Relation

Apply locale filter using the partition's table name.
Only called when locale_filtered? returns true.

Parameters:

  • scope (ActiveRecord::Relation)
  • locale (String)

Returns:

  • (ActiveRecord::Relation)


83
84
85
86
87
88
89
90
91
92
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 83

def apply_locale_filter(scope, locale)
  locale_str = locale.to_s
  if locale_str.include?('-')
    # Exact match for regional locales (en-US, en-CA, fr-CA)
    scope.where(locale: locale_str)
  else
    # Base locale matches itself and all regional variants
    scope.where("#{table_name}.locale = ? OR #{table_name}.locale LIKE ?", locale_str, "#{locale_str}-%")
  end
end

.apply_published_filter(scope) ⇒ ActiveRecord::Relation

Override in subclass to apply model-specific published filter

Parameters:

  • scope (ActiveRecord::Relation)

Returns:

  • (ActiveRecord::Relation)


106
107
108
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 106

def apply_published_filter(scope)
  scope # Default: no filtering (override in subclass)
end

.generate_text_query_embedding(query) ⇒ Array<Float>?

Generate a query embedding via Gemini (cached). Delegates to the shared
ContentEmbedding helper so all query embeddings come from one place.

Parameters:

  • query (String)

    Text to embed

Returns:

  • (Array<Float>, nil)

    Embedding vector or nil on error



99
100
101
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 99

def generate_text_query_embedding(query)
  ContentEmbedding.generate_query_embedding(query)
end

.locale_filtered?Boolean

Override in partition models that have locale-specific content.
Default is false - most content is English-only.

Returns:

  • (Boolean)

    true if locale filtering should be applied



73
74
75
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 73

def locale_filtered?
  false
end

.semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD) ⇒ ActiveRecord::Relation

Semantic search within this partition's content type

Examples:

Basic search

PostEmbedding.semantic_search("bathroom heating tips")

With options

PostEmbedding.semantic_search("internal docs", published_only: false, limit: 20)

Parameters:

  • query (String)

    Natural language search query

  • limit (Integer) (defaults to: 10)

    Maximum results (default: 10)

  • locale (String) (defaults to: 'en')

    Locale filter (only used if locale_filtered? returns true)

  • published_only (Boolean) (defaults to: true)

    Filter to published/active content (default: true)

  • min_similarity (Float) (defaults to: SIMILARITY_THRESHOLD)

    Minimum similarity threshold (default: 0.0)

Returns:

  • (ActiveRecord::Relation)

    Embeddings ordered by similarity



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 37

def semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD)
  return none if query.blank?

  query_embedding = generate_text_query_embedding(query)
  return none unless query_embedding

  # `threshold:` (neighbor 1.2+) pushes the similarity floor into the
  # nearest-neighbors query itself as a max cosine-distance filter — no
  # second hand-rolled `<=>` expression re-binding the query vector.
  # Convert min_similarity with the same 0–2 cosine convention as
  # ContentEmbedding#similarity_score (similarity = 1 - distance/2), so a
  # min_similarity of 0.8 keeps exactly the rows whose displayed
  # similarity_score is >= 0.8, i.e. cosine distance <= 2 * (1 - 0.8).
  nn_options = { distance: :cosine }
  nn_options[:threshold] = (2.0 * (1.0 - min_similarity)).round(6) if min_similarity.positive?

  # Search the canonical Gemini "unified" rows for this partition (the
  # cross-modal space text and images share).
  scope = unified_content
          .by_model(ContentEmbedding::UNIFIED_MODELS)
          .with_unified_embedding
          .nearest_neighbors(:unified_embedding, query_embedding, **nn_options)

  # Apply locale filter only for models with locale-specific content (e.g., SiteMap)
  scope = apply_locale_filter(scope, locale) if locale_filtered?

  # Apply published filter if model supports it
  scope = apply_published_filter(scope) if published_only

  scope.limit(limit).includes(:embeddable)
end