Module: Models::Embeddable

Extended by:
ActiveSupport::Concern
Included in:
Activity, Article, AssistantBrainEntry, CallRecord, Communication, Image, Item, ProductLine, ReviewsIo, Showcase, SiteMap, Video
Defined in:
app/concerns/models/embeddable.rb

Overview

Concern for models that support vector embeddings for semantic search.
Include this in any model that should be searchable via AI-powered
semantic search.

Examples:

Basic usage

class Showcase < ApplicationRecord
  include Models::Embeddable

  def self.embeddable_content_types
    [:primary, :visual]
  end

  def content_for_embedding(content_type = :primary)
    case content_type.to_sym
    when :primary
      [name, description, tags&.join(', ')].compact.join("\n\n")
    when :visual
      main_image&.meta_description
    end
  end
end

Finding similar content

showcase = Showcase.find(123)
showcase.find_similar(limit: 5)

Manual embedding generation

showcase.generate_embedding!(:primary, force: true)

Constant Summary collapse

MAX_CONTENT_LENGTH =

Maximum content length for embedding (roughly 30k chars, within the
Gemini text window).

30_000

Has many collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.embeddable_content_typesArray<Symbol>

Override in model to define what content types are embeddable.
Common types: :primary, :visual, :transcript, :specifications.

Examples:

def self.embeddable_content_types
  [:primary, :transcript]
end

Returns:

  • (Array<Symbol>)

    list of content types to embed.



54
55
56
# File 'app/concerns/models/embeddable.rb', line 54

def embeddable_content_types
  [:primary]
end

.embedding_partition_classClass?

Returns the partition embedding class for this model. Maps model
names to their ContentEmbedding partition subclasses by convention.

Examples:

Post.embedding_partition_class # => ContentEmbedding::PostEmbedding
Image.embedding_partition_class # => ContentEmbedding::ImageEmbedding

Returns:

  • (Class, nil)

    partition class, or nil when no
    ContentEmbedding::<Model>Embedding constant is defined.



105
106
107
108
# File 'app/concerns/models/embeddable.rb', line 105

def embedding_partition_class
  partition_class_name = "ContentEmbedding::#{name}Embedding"
  partition_class_name.safe_constantize
end

.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Integer

Batch regenerate embeddings for all records by enqueueing
EmbeddingWorker for each record in scope.

Examples:

Regenerate all

Post.regenerate_all_embeddings

Regenerate published only

Post.regenerate_all_embeddings(scope: Post.published)

Parameters:

  • batch_size (Integer) (defaults to: 100)

    number of records to process per batch.

  • scope (ActiveRecord::Relation, nil) (defaults to: nil)

    optional scope to filter
    records; defaults to all.

Returns:

  • (Integer)

    count of records queued for embedding generation.



69
70
71
72
73
74
75
76
77
78
79
80
# File 'app/concerns/models/embeddable.rb', line 69

def regenerate_all_embeddings(batch_size: 100, scope: nil)
  records = scope || all
  count = 0

  records.find_each(batch_size: batch_size) do |record|
    EmbeddingWorker.perform_async(record.class.name, record.id)
    count += 1
  end

  Rails.logger.info "Queued #{count} #{name} records for embedding generation"
  count
end

.semantic_search(query, limit: 10) ⇒ Array<ApplicationRecord>

Semantic search within this model type, over the unified Gemini space.

Examples:

Post.semantic_search("spa wellness tips")
Image.semantic_search("bathroom with heated floors")

Parameters:

  • query (String)

    natural-language search query.

  • limit (Integer) (defaults to: 10)

    maximum results to return.

  • kwargs (Hash)

    forwarded to ContentEmbedding.unified_hybrid_search
    (e.g. :locale, :exclude_sensitive).

Returns:



92
93
94
95
# File 'app/concerns/models/embeddable.rb', line 92

def semantic_search(query, limit: 10, **)
  ContentEmbedding.unified_hybrid_search(query, limit: limit, types: [name], **)
                  .map(&:embeddable)
end

Instance Method Details

#content_embeddingsActiveRecord::Relation<ContentEmbedding>

Returns:

See Also:



39
# File 'app/concerns/models/embeddable.rb', line 39

has_many :content_embeddings, as: :embeddable, dependent: :destroy

#content_for_embedding(_content_type = :primary) ⇒ String

Override in model to provide content for embedding. The returned
string is what gets converted to a vector embedding.

Examples:

def content_for_embedding(content_type = :primary)
  [title, description, body].compact.join("\n\n")
end

Parameters:

  • _content_type (Symbol) (defaults to: :primary)

    type of content to embed.

Returns:

  • (String)

    text content to embed.

Raises:

  • (NotImplementedError)


120
121
122
# File 'app/concerns/models/embeddable.rb', line 120

def content_for_embedding(_content_type = :primary)
  raise NotImplementedError, "#{self.class} must implement #content_for_embedding"
end

#embeddable_localesArray<String>

Override in model to specify all locales that should have embeddings.
Return an array if content exists in multiple languages. Defaults to
only the primary locale.

Examples:

Model with multiple translations

def embeddable_locales
  publication_locales.presence || ['en']
end

Returns:

  • (Array<String>)

    list of locale codes.



150
151
152
# File 'app/concerns/models/embeddable.rb', line 150

def embeddable_locales
  [locale_for_embedding]
end

#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String

Generate content hash for change detection.

When a model's content_for_embedding accepts a locale: keyword
(e.g. Post, where Liquid rendering varies per locale) the hash is
computed for that locale so stale-detection is also per-locale.
Models that do not declare a locale: keyword receive the same hash
regardless of locale, preserving their existing behaviour.

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content.

  • locale (String, nil) (defaults to: nil)

    locale to hash for; defaults to the
    model's locale_for_embedding.

Returns:

  • (String)

    first 32 chars of SHA256 hash of the content.



166
167
168
169
# File 'app/concerns/models/embeddable.rb', line 166

def embedding_content_hash(content_type = :primary, locale: nil)
  content = locale_aware_content_for_embedding(content_type, locale || locale_for_embedding).to_s
  Digest::SHA256.hexdigest(content)[0..31]
end

#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean

Check whether the embedding for content_type/locale needs
regeneration.

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content.

  • locale (String, nil) (defaults to: nil)

    locale to check; defaults to the model's
    locale_for_embedding.

Returns:

  • (Boolean)

    true when the embedding is stale or missing.



178
179
180
181
182
183
184
185
186
187
# File 'app/concerns/models/embeddable.rb', line 178

def embedding_stale?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  embedding = find_content_embedding(content_type, locale: locale)
  return true unless embedding
  # A row with no Gemini vector yet (e.g. an old OpenAI row after the column
  # drop) needs (re)embedding even if the content hash is unchanged.
  return true if embedding.unified_embedding.blank?

  embedding.content_hash != embedding_content_hash(content_type, locale: locale)
end

#embedding_type_nameString

Returns the type name to use for content_embeddings. Uses the actual
class name instead of the base class for STI models, so semantic
searches can filter by type.

Returns:

  • (String)

    type name for the polymorphic association.



284
285
286
# File 'app/concerns/models/embeddable.rb', line 284

def embedding_type_name
  self.class.name
end

#embedding_vectorArray<Float>?

Returns the primary embedding vector for this record.

Returns:

  • (Array<Float>, nil)

    embedding vector, or nil when no primary
    embedding exists.



396
397
398
# File 'app/concerns/models/embeddable.rb', line 396

def embedding_vector
  find_content_embedding(:primary)&.unified_embedding
end

#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?

Find a content embedding using the correct type name for STI models.
The logical content_type is mapped to its canonical Gemini "unified"
row (see #unified_content_type), since that is where vectors now live.

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content.

  • locale (String, nil) (defaults to: nil)

    locale to find; defaults to the model's
    locale_for_embedding.

Returns:



209
210
211
212
213
214
215
216
217
218
# File 'app/concerns/models/embeddable.rb', line 209

def find_content_embedding(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding

  ContentEmbedding.find_by(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    content_type: unified_content_type(content_type),
    locale: locale.to_s
  )
end

#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>

Find content similar to this record via the shared ContentEmbedding
similarity index.

Examples:

showcase.find_similar(limit: 5)
post.find_similar(same_type_only: false) # Cross-type search

Parameters:

  • limit (Integer) (defaults to: 5)

    maximum number of results to return.

  • same_type_only (Boolean) (defaults to: true)

    when true, restrict results to the
    same model type.

Returns:



387
388
389
390
# File 'app/concerns/models/embeddable.rb', line 387

def find_similar(limit: 5, same_type_only: true)
  ContentEmbedding.find_similar(self, limit: limit, same_type_only: same_type_only)
                  .map(&:embeddable)
end

#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>

Generate embeddings for all content types declared by
embeddable_content_types.

Parameters:

  • force (Boolean) (defaults to: false)

    regenerate even if existing embeddings are not
    stale.

Returns:

  • (Array<ContentEmbedding>)

    created or updated embeddings (one
    per content type, blank entries skipped).



370
371
372
373
374
# File 'app/concerns/models/embeddable.rb', line 370

def generate_all_embeddings!(force: false)
  self.class.embeddable_content_types.filter_map do |content_type|
    generate_embedding!(content_type, force: force)
  end
end

#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>

Generate chunked embeddings for long content. Splits content into
overlapping chunks and creates an embedding for each. Use this for
documents that exceed the token limit (e.g. long articles, PDFs).

Examples:

Embed a long document in chunks

article.generate_chunked_embeddings!(:primary)

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content to embed.

  • force (Boolean) (defaults to: false)

    regenerate even if the existing embedding is
    not stale.

  • locale (String, nil) (defaults to: nil)

    locale for the content; defaults to the
    model's locale_for_embedding.

Returns:



300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
# File 'app/concerns/models/embeddable.rb', line 300

def generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return [] unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return [] if content.blank?

  chunker = Embedding::ContentChunker.new(content)

  # If no chunking needed, use regular embedding
  return [generate_embedding!(content_type, force, locale: locale)].compact unless chunker.needs_chunking?

  # Delete existing chunk embeddings for this content type and locale
  chunk_prefix = unified_content_type(content_type)
  ContentEmbedding.where(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    locale: locale_str
  ).where('content_type LIKE ?', "#{chunk_prefix}_chunk_%").delete_all

  # Generate embedding for each chunk (batched Gemini call).
  chunks = chunker.chunks
  vectors = begin
    Embedding::Gemini.embed_texts(chunks, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS)
  rescue Embedding::Gemini::Error => e
    Rails.logger.error "Chunk embedding failed for #{self.class}##{id}: #{e.message}"
    []
  end

  embeddings = []
  chunks.each_with_index do |chunk, index|
    vector = vectors[index]
    next if vector.blank?

    emb = ContentEmbedding.create!(
      embeddable_type: embedding_type_name,
      embeddable_id: id,
      content_type: "#{chunk_prefix}_chunk_#{index}",
      locale: locale_str,
      unified_embedding: vector,
      embedding_model: ContentEmbedding::UNIFIED_MODEL,
      embedding_dimensions: ContentEmbedding::UNIFIED_DIMENSIONS,
      content_hash: Digest::SHA256.hexdigest(chunk)[0..31]
    )
    embeddings << emb
  end

  Rails.logger.info "Generated #{embeddings.size} chunk embeddings for #{self.class}##{id} (locale: #{locale_str})"
  embeddings
end

#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?

Generate or update an embedding for this record.

Examples:

post.generate_embedding!(:primary)
video.generate_embedding!(:transcript, force: true)
site_map.generate_embedding!(:primary, locale: 'fr')

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content to embed.

  • force (Boolean) (defaults to: false)

    regenerate even if the existing embedding is
    not stale.

  • locale (String, nil) (defaults to: nil)

    locale for the content; defaults to the
    model's locale_for_embedding.

Returns:

  • (ContentEmbedding, nil)

    created or updated embedding, or nil
    when content is blank or generation failed.



233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# File 'app/concerns/models/embeddable.rb', line 233

def generate_embedding!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return if content.blank?

  # Truncate content to fit within the Gemini text window (~8k tokens).
  truncated_content = content.to_s.truncate(MAX_CONTENT_LENGTH, omission: '...')

  begin
    vector = Embedding::Gemini.embed_text(truncated_content, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS)
    return if vector.blank?

    # Use the actual class name for STI models (Video, Image, Post) rather than base class
    # This allows proper filtering by type in semantic searches. The row lives in
    # the canonical Gemini "unified" space (cross-modal with images).
    embedding_attrs = {
      embeddable_type: embedding_type_name,
      embeddable_id: id,
      content_type: unified_content_type(content_type),
      locale: locale_str
    }
    embedding_values = {
      unified_embedding: vector,
      embedding_model: ContentEmbedding::UNIFIED_MODEL,
      embedding_dimensions: ContentEmbedding::UNIFIED_DIMENSIONS,
      content_hash: embedding_content_hash(content_type, locale: locale)
    }
    begin
      ContentEmbedding.find_or_initialize_by(embedding_attrs).tap do |emb|
        emb.assign_attributes(embedding_values)
        emb.save!
      end
    rescue ActiveRecord::RecordNotUnique
      # Two concurrent workers raced to insert — find the winner's record and update it.
      ContentEmbedding.find_by(embedding_attrs)&.update!(embedding_values)
    end
  rescue Embedding::Gemini::Error => e
    Rails.logger.error "Embedding generation failed for #{self.class}##{id} (#{content_type}): #{e.message}"
    nil
  end
end

#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean

Check whether this record has an embedding for the given content type
and locale.

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content.

  • locale (String, nil) (defaults to: nil)

    locale to check; defaults to the model's
    locale_for_embedding.

Returns:

  • (Boolean)

    true when an embedding exists.



196
197
198
199
# File 'app/concerns/models/embeddable.rb', line 196

def has_embedding?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  find_content_embedding(content_type, locale: locale).present?
end

#locale_for_embeddingString

Override in model to specify the locale for embedding content. This
determines which locale's content is embedded and enables
locale-filtered searches.

Examples:

SiteMap with locale column

def locale_for_embedding
  locale.to_s.split('-').first # 'en-US' -> 'en'
end

Item with publication_locales array

def locale_for_embedding
  publication_locales&.first || 'en'
end

Returns:

  • (String)

    locale code (e.g. 'en', 'fr', 'en-US').



137
138
139
# File 'app/concerns/models/embeddable.rb', line 137

def locale_for_embedding
  'en' # Default to English
end

#needs_chunking?(content_type = :primary) ⇒ Boolean

Check whether content for content_type would need chunking before
embedding (i.e. exceeds the token limit).

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    type of content to evaluate.

Returns:

  • (Boolean)

    true when content exceeds the token limit.



358
359
360
361
# File 'app/concerns/models/embeddable.rb', line 358

def needs_chunking?(content_type = :primary)
  content = locale_aware_content_for_embedding(content_type, locale_for_embedding).to_s
  Embedding::ContentChunker.new(content).needs_chunking?
end