Module: ContentEmbedding::TextSearchable

Extended by:
ActiveSupport::Concern
Included in:
ActivityEmbedding, ArticleEmbedding, CallRecordEmbedding, ItemEmbedding, PostEmbedding, ProductLineEmbedding, ReviewsIoEmbedding, ShowcaseEmbedding, SiteMapEmbedding, VideoEmbedding
Defined in:
app/models/concerns/content_embedding/text_searchable.rb

Overview

Concern providing semantic search for text-based content embeddings.
Uses OpenAI text-embedding-3-small model for query embedding generation.

Include this in partition models that use text embeddings:

  • PostEmbedding, ArticleEmbedding, ShowcaseEmbedding
  • VideoEmbedding, SiteMapEmbedding, ItemEmbedding
  • CallRecordEmbedding, ProductLineEmbedding, ReviewsIoEmbedding

Examples:

ContentEmbedding::PostEmbedding.semantic_search("spa wellness articles")

Constant Summary collapse

EMBEDDING_MODEL =

OpenAI text embedding model for text content

'text-embedding-3-small'
SIMILARITY_THRESHOLD =

Minimum similarity threshold (0 = no filtering)

0.0

Class Method Summary collapse

Class Method Details

.apply_locale_filter(scope, locale) ⇒ ActiveRecord::Relation

Apply locale filter using the partition's table name.
Only called when locale_filtered? returns true.

Parameters:

  • scope (ActiveRecord::Relation)
  • locale (String)

Returns:

  • (ActiveRecord::Relation)


82
83
84
85
86
87
88
89
90
91
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 82

def apply_locale_filter(scope, locale)
  locale_str = locale.to_s
  if locale_str.include?('-')
    # Exact match for regional locales (en-US, en-CA, fr-CA)
    scope.where(locale: locale_str)
  else
    # Base locale matches itself and all regional variants
    scope.where("#{table_name}.locale = ? OR #{table_name}.locale LIKE ?", locale_str, "#{locale_str}-%")
  end
end

.apply_published_filter(scope) ⇒ ActiveRecord::Relation

Override in subclass to apply model-specific published filter

Parameters:

  • scope (ActiveRecord::Relation)

Returns:

  • (ActiveRecord::Relation)


116
117
118
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 116

def apply_published_filter(scope)
  scope # Default: no filtering (override in subclass)
end

.generate_text_query_embedding(query) ⇒ Array<Float>?

Generate query embedding using OpenAI

Parameters:

  • query (String)

    Text to embed

Returns:

  • (Array<Float>, nil)

    Embedding vector or nil on error



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 97

def generate_text_query_embedding(query)
  cache_key = "query_embedding:#{EMBEDDING_MODEL}:#{Digest::SHA256.hexdigest(query.downcase.strip)[0..15]}"

  cached = Rails.cache.read(cache_key)
  return cached if cached.present?

  result = RubyLLM.embed(query, model: EMBEDDING_MODEL, provider: :openai, assume_model_exists: true)
  vector = result.vectors

  Rails.cache.write(cache_key, vector, expires_in: 24.hours) if vector.present?
  vector
rescue StandardError => e
  Rails.logger.error "[#{name}] Failed to generate query embedding: #{e.message}"
  nil
end

.locale_filtered?Boolean

Override in partition models that have locale-specific content.
Default is false - most content is English-only.

Returns:

  • (Boolean)

    true if locale filtering should be applied



72
73
74
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 72

def locale_filtered?
  false
end

.semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD) ⇒ ActiveRecord::Relation

Semantic search within this partition's content type

Examples:

Basic search

PostEmbedding.semantic_search("bathroom heating tips")

With options

PostEmbedding.semantic_search("internal docs", published_only: false, limit: 20)

Parameters:

  • query (String)

    Natural language search query

  • limit (Integer) (defaults to: 10)

    Maximum results (default: 10)

  • locale (String) (defaults to: 'en')

    Locale filter (only used if locale_filtered? returns true)

  • published_only (Boolean) (defaults to: true)

    Filter to published/active content (default: true)

  • min_similarity (Float) (defaults to: SIMILARITY_THRESHOLD)

    Minimum similarity threshold (default: 0.0)

Returns:

  • (ActiveRecord::Relation)

    Embeddings ordered by similarity



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'app/models/concerns/content_embedding/text_searchable.rb', line 39

def semantic_search(query, limit: 10, locale: 'en', published_only: true, min_similarity: SIMILARITY_THRESHOLD)
  return none if query.blank?

  query_embedding = generate_text_query_embedding(query)
  return none unless query_embedding

  # Build query using nearest_neighbors
  scope = primary_content
          .with_embedding
          .nearest_neighbors(:embedding, query_embedding, distance: :cosine)

  # Apply locale filter only for models with locale-specific content (e.g., SiteMap)
  scope = apply_locale_filter(scope, locale) if locale_filtered?

  # Apply published filter if model supports it
  scope = apply_published_filter(scope) if published_only

  # Apply similarity threshold if specified
  if min_similarity.positive?
    max_distance = 1.0 - min_similarity
    vector_literal = "[#{query_embedding.join(',')}]"
    scope = scope.where(
      sanitize_sql_array(['embedding <=> ?::vector <= ?', vector_literal, max_distance])
    )
  end

  scope.limit(limit).includes(:embeddable)
end