Module: Models::Embeddable

Extended by:
ActiveSupport::Concern
Included in:
Activity, Article, AssistantBrainEntry, CallRecord, Communication, Image, Item, ProductLine, ReviewsIo, Showcase, SiteMap, Video
Defined in:
app/concerns/models/embeddable.rb

Overview

Concern for models that support vector embeddings for semantic search.
Include this in any model that should be searchable via AI-powered semantic search.

Examples:

Basic usage

class Showcase < ApplicationRecord
  include Models::Embeddable

  def self.embeddable_content_types
    [:primary, :visual]
  end

  def content_for_embedding(content_type = :primary)
    case content_type.to_sym
    when :primary
      [name, description, tags&.join(', ')].compact.join("\n\n")
    when :visual
      main_image&.meta_description
    end
  end
end

Finding similar content

showcase = Showcase.find(123)
showcase.find_similar(limit: 5)

Manual embedding generation

showcase.generate_embedding!(:primary, force: true)

Constant Summary collapse

MAX_CONTENT_LENGTH =

Maximum content length for embedding (roughly 30k chars = ~7500 tokens)

30_000
DEFAULT_MODEL =

Default embedding model

'text-embedding-3-small'

Has many collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.embeddable_content_typesArray<Symbol>

Override in model to define what content types are embeddable.
Common types: :primary, :visual, :transcript, :specifications

Examples:

def self.embeddable_content_types
  [:primary, :transcript]
end

Returns:

  • (Array<Symbol>)

    List of content types to embed



59
60
61
# File 'app/concerns/models/embeddable.rb', line 59

def embeddable_content_types
  [:primary]
end

.embedding_partition_classClass?

Returns the partition embedding class for this model.
Maps model names to their ContentEmbedding partition subclasses.

Examples:

Post.embedding_partition_class # => ContentEmbedding::PostEmbedding
Image.embedding_partition_class # => ContentEmbedding::ImageEmbedding

Returns:

  • (Class, nil)

    The partition class or nil if not found



121
122
123
124
# File 'app/concerns/models/embeddable.rb', line 121

def embedding_partition_class
  partition_class_name = "ContentEmbedding::#{name}Embedding"
  partition_class_name.safe_constantize
end

.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Object

Batch regenerate embeddings for all records

Examples:

Regenerate all

Post.regenerate_all_embeddings

Regenerate published only

Post.regenerate_all_embeddings(scope: Post.published)

Parameters:

  • batch_size (Integer) (defaults to: 100)

    Number of records to process per batch

  • scope (ActiveRecord::Relation) (defaults to: nil)

    Optional scope to filter records



74
75
76
77
78
79
80
81
82
83
84
85
# File 'app/concerns/models/embeddable.rb', line 74

def regenerate_all_embeddings(batch_size: 100, scope: nil)
  records = scope || all
  count = 0

  records.find_each(batch_size: batch_size) do |record|
    EmbeddingWorker.perform_async(record.class.name, record.id)
    count += 1
  end

  Rails.logger.info "Queued #{count} #{name} records for embedding generation"
  count
end

.semantic_search(query, limit: 10, **options) ⇒ Array<ApplicationRecord>

Semantic search within this model type.
Delegates to the appropriate partition embedding class for model-specific
embedding configuration (OpenAI for text, Gemini for images).

Examples:

Post.semantic_search("spa wellness tips")
Image.semantic_search("bathroom with heated floors")

Parameters:

  • query (String)

    Natural language search query

  • limit (Integer) (defaults to: 10)

    Maximum results

  • options (Hash)

    Additional options passed to partition class

Returns:



100
101
102
103
104
105
106
107
108
109
110
# File 'app/concerns/models/embeddable.rb', line 100

def semantic_search(query, limit: 10, **options)
  partition_class = embedding_partition_class
  unless partition_class
    Rails.logger.warn "[#{name}] No embedding partition class found, falling back to ContentEmbedding"
    embeddings = ContentEmbedding.semantic_search(query, limit: limit, types: [name], **options)
    return embeddings.map(&:embeddable)
  end

  embeddings = partition_class.semantic_search(query, limit: limit, **options)
  embeddings.map(&:embeddable)
end

Instance Method Details

#content_embeddingsActiveRecord::Relation<ContentEmbedding>

Returns:

See Also:



42
# File 'app/concerns/models/embeddable.rb', line 42

has_many :content_embeddings, as: :embeddable, dependent: :destroy

#content_for_embedding(_content_type = :primary) ⇒ String

Override in model to provide content for embedding.
This is the text that will be converted to a vector embedding.

Examples:

def content_for_embedding(content_type = :primary)
  [title, description, body].compact.join("\n\n")
end

Parameters:

  • content_type (Symbol)

    The type of content to embed

  • _content_type (Symbol) (defaults to: :primary)

Returns:

  • (String)

    Text content to embed

Raises:

  • (NotImplementedError)


139
140
141
# File 'app/concerns/models/embeddable.rb', line 139

def content_for_embedding(_content_type = :primary)
  raise NotImplementedError, "#{self.class} must implement #content_for_embedding"
end

#embeddable_localesArray<String>

Override in model to specify all locales that should have embeddings.
Return an array if content exists in multiple languages.
By default, returns only the primary locale.

Examples:

Model with multiple translations

def embeddable_locales
  publication_locales.presence || ['en']
end

Returns:

  • (Array<String>)

    List of locale codes



174
175
176
# File 'app/concerns/models/embeddable.rb', line 174

def embeddable_locales
  [locale_for_embedding]
end

#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String

Generate content hash for change detection.

When a model's +content_for_embedding+ accepts a +locale:+ keyword
(e.g. Post, where Liquid rendering varies per locale) the hash is
computed for that locale so stale-detection is also per-locale.
Models that do not declare a +locale:+ keyword receive the same
hash regardless of locale, preserving their existing behaviour.

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content

  • locale (String, nil) (defaults to: nil)

    Locale to hash for (defaults to model's locale)

Returns:

  • (String)

    SHA256 hash of the content (first 32 chars)



190
191
192
193
# File 'app/concerns/models/embeddable.rb', line 190

def embedding_content_hash(content_type = :primary, locale: nil)
  content = locale_aware_content_for_embedding(content_type, locale || locale_for_embedding).to_s
  Digest::SHA256.hexdigest(content)[0..31]
end

#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean

Check if embedding needs regeneration

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content

  • locale (String) (defaults to: nil)

    The locale to check (defaults to model's locale)

Returns:

  • (Boolean)

    true if embedding is stale or missing



201
202
203
204
205
206
207
# File 'app/concerns/models/embeddable.rb', line 201

def embedding_stale?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  embedding = find_content_embedding(content_type, locale: locale)
  return true unless embedding

  embedding.content_hash != embedding_content_hash(content_type, locale: locale)
end

#embedding_type_nameString

Returns the type name to use for content_embeddings
Uses actual class name instead of base class for STI models

Returns:

  • (String)

    Type name for polymorphic association



297
298
299
# File 'app/concerns/models/embeddable.rb', line 297

def embedding_type_name
  self.class.name
end

#embedding_vectorArray<Float>?

Get the primary embedding vector

Returns:

  • (Array<Float>, nil)

    The embedding vector or nil



399
400
401
# File 'app/concerns/models/embeddable.rb', line 399

def embedding_vector
  find_content_embedding(:primary)&.embedding
end

#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?

Find content embedding using correct type name for STI models

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content

  • locale (String) (defaults to: nil)

    The locale to find (defaults to model's locale)

Returns:



226
227
228
229
230
231
232
233
234
235
# File 'app/concerns/models/embeddable.rb', line 226

def find_content_embedding(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding

  ContentEmbedding.find_by(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    content_type: content_type.to_s,
    locale: locale.to_s
  )
end

#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>

Find similar content to this record

Examples:

showcase.find_similar(limit: 5)
post.find_similar(same_type_only: false) # Cross-type search

Parameters:

  • limit (Integer) (defaults to: 5)

    Maximum number of results

  • same_type_only (Boolean) (defaults to: true)

    Only return same model type

Returns:



390
391
392
393
# File 'app/concerns/models/embeddable.rb', line 390

def find_similar(limit: 5, same_type_only: true)
  ContentEmbedding.find_similar(self, limit: limit, same_type_only: same_type_only)
                  .map(&:embeddable)
end

#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>

Generate embeddings for all content types

Parameters:

  • force (Boolean) (defaults to: false)

    Regenerate even if not stale

Returns:



374
375
376
377
378
# File 'app/concerns/models/embeddable.rb', line 374

def generate_all_embeddings!(force: false)
  self.class.embeddable_content_types.filter_map do |content_type|
    generate_embedding!(content_type, force: force)
  end
end

#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>

Generate chunked embeddings for long content
Splits content into overlapping chunks and creates an embedding for each.
Use this for documents that exceed the token limit (e.g., long articles, PDFs).

Examples:

Embed a long document in chunks

article.generate_chunked_embeddings!(:primary)

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content to embed

  • force (Boolean) (defaults to: false)

    Regenerate even if not stale

  • locale (String) (defaults to: nil)

    The locale for the content (defaults to model's locale)

Returns:



313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# File 'app/concerns/models/embeddable.rb', line 313

def generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return [] unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return [] if content.blank?

  chunker = Embedding::ContentChunker.new(content)

  # If no chunking needed, use regular embedding
  return [generate_embedding!(content_type, force, locale: locale)].compact unless chunker.needs_chunking?

  # Delete existing chunk embeddings for this content type and locale
  ContentEmbedding.where(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    locale: locale_str
  ).where('content_type LIKE ?', "#{content_type}_chunk_%").delete_all

  # Generate embedding for each chunk
  embeddings = []
  chunker.chunks.each_with_index do |chunk, index|
    chunk_type = "#{content_type}_chunk_#{index}"

    begin
      result = RubyLLM.embed(chunk, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true)

      emb = ContentEmbedding.create!(
        embeddable_type: embedding_type_name,
        embeddable_id: id,
        content_type: chunk_type,
        locale: locale_str,
        embedding: result.vectors,
        content_hash: Digest::SHA256.hexdigest(chunk)[0..31],
        token_count: result.input_tokens,
        model: result.model
      )
      embeddings << emb
    rescue RubyLLM::Error => e
      Rails.logger.error "Chunk embedding failed for #{self.class}##{id} chunk #{index}: #{e.message}"
    end
  end

  Rails.logger.info "Generated #{embeddings.size} chunk embeddings for #{self.class}##{id} (locale: #{locale_str})"
  embeddings
end

#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?

Generate or update embedding for this record

Examples:

post.generate_embedding!(:primary)
video.generate_embedding!(:transcript, force: true)
site_map.generate_embedding!(:primary, locale: 'fr')

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content to embed

  • force (Boolean) (defaults to: false)

    Regenerate even if not stale

  • locale (String) (defaults to: nil)

    The locale for the content (defaults to model's locale)

Returns:



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
# File 'app/concerns/models/embeddable.rb', line 249

def generate_embedding!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return if content.blank?

  # Truncate content to fit within token limits
  truncated_content = content.to_s.truncate(MAX_CONTENT_LENGTH, omission: '...')

  begin
    result = RubyLLM.embed(truncated_content, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true)

    # Use the actual class name for STI models (Video, Image, Post) rather than base class
    # This allows proper filtering by type in semantic searches
    embedding_attrs = {
      embeddable_type: embedding_type_name,
      embeddable_id: id,
      content_type: content_type.to_s,
      locale: locale_str
    }
    embedding_values = {
      embedding: result.vectors,
      content_hash: embedding_content_hash(content_type),
      token_count: result.input_tokens,
      model: result.model
    }
    begin
      ContentEmbedding.find_or_initialize_by(embedding_attrs).tap do |emb|
        emb.assign_attributes(embedding_values)
        emb.save!
      end
    rescue ActiveRecord::RecordNotUnique
      # Two concurrent workers raced to insert — find the winner's record and update it.
      ContentEmbedding.find_by(embedding_attrs)&.update!(embedding_values)
    end
  rescue RubyLLM::Error => e
    Rails.logger.error "Embedding generation failed for #{self.class}##{id} (#{content_type}): #{e.message}"
    nil
  end
end

#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean

Check if record has an embedding

Parameters:

  • content_type (Symbol) (defaults to: :primary)

    The type of content

  • locale (String) (defaults to: nil)

    The locale to check (defaults to model's locale)

Returns:

  • (Boolean)

    true if embedding exists



215
216
217
218
# File 'app/concerns/models/embeddable.rb', line 215

def has_embedding?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  find_content_embedding(content_type, locale: locale).present?
end

#locale_for_embeddingString

Override in model to specify the locale for embedding content.
This determines which locale's content is embedded and enables
locale-filtered searches.

Examples:

SiteMap with locale column

def locale_for_embedding
  locale.to_s.split('-').first # 'en-US' -> 'en'
end

Item with publication_locales array

def locale_for_embedding
  publication_locales&.first || 'en'
end

Returns:

  • (String)

    The locale code (e.g., 'en', 'fr', 'en-US')



159
160
161
# File 'app/concerns/models/embeddable.rb', line 159

def locale_for_embedding
  'en' # Default to English
end

#needs_chunking?(content_type = :primary) ⇒ Boolean

Check if content needs chunking

Returns:

  • (Boolean)

    true if content exceeds the token limit



364
365
366
367
# File 'app/concerns/models/embeddable.rb', line 364

def needs_chunking?(content_type = :primary)
  content = locale_aware_content_for_embedding(content_type, locale_for_embedding).to_s
  Embedding::ContentChunker.new(content).needs_chunking?
end