Module: Models::Embeddable

Extended by:: ActiveSupport::Concern

Included in:: Activity, Article, AssistantBrainEntry, CallRecord, Communication, Image, Item, ProductLine, ReviewsIo, Showcase, SiteMap, Video

Defined in:: app/concerns/models/embeddable.rb

Overview

Concern for models that support vector embeddings for semantic search.
Include this in any model that should be searchable via AI-powered semantic search.

Examples:

Basic usage

class Showcase < ApplicationRecord
  include Models::Embeddable

  def self.embeddable_content_types
    [:primary, :visual]
  end

  def content_for_embedding(content_type = :primary)
    case content_type.to_sym
    when :primary
      [name, description, tags&.join(', ')].compact.join("\n\n")
    when :visual
      main_image&.meta_description
    end
  end
end

Finding similar content

showcase = Showcase.find(123)
showcase.find_similar(limit: 5)

Manual embedding generation

showcase.generate_embedding!(:primary, force: true)

Constant Summary collapse

MAX_CONTENT_LENGTH = Maximum content length for embedding (roughly 30k chars = ~7500 tokens)

30_000

DEFAULT_MODEL = Default embedding model

'text-embedding-3-small'

Has many collapse

#content_embeddings ⇒ ActiveRecord::Relation<ContentEmbedding>

Class Method Summary collapse

.embeddable_content_types ⇒ Array<Symbol>
Override in model to define what content types are embeddable.
.embedding_partition_class ⇒ Class^?
Returns the partition embedding class for this model.
.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Object
Batch regenerate embeddings for all records.
.semantic_search(query, limit: 10, **options) ⇒ Array<ApplicationRecord>
Semantic search within this model type.

Instance Method Summary collapse

#content_for_embedding(_content_type = :primary) ⇒ String
Override in model to provide content for embedding.
#embeddable_locales ⇒ Array<String>
Override in model to specify all locales that should have embeddings.
#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String
Generate content hash for change detection.
#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean
Check if embedding needs regeneration.
#embedding_type_name ⇒ String
Returns the type name to use for content_embeddings Uses actual class name instead of base class for STI models.
#embedding_vector ⇒ Array<Float>^?
Get the primary embedding vector.
#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding^?
Find content embedding using correct type name for STI models.
#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>
Find similar content to this record.
#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>
Generate embeddings for all content types.
#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>
Generate chunked embeddings for long content Splits content into overlapping chunks and creates an embedding for each.
#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding^?
Generate or update embedding for this record.
#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean
Check if record has an embedding.
#locale_for_embedding ⇒ String
Override in model to specify the locale for embedding content.
#needs_chunking?(content_type = :primary) ⇒ Boolean
Check if content needs chunking.

Class Method Details

.embeddable_content_types ⇒ `Array<Symbol>`

Override in model to define what content types are embeddable.
Common types: :primary, :visual, :transcript, :specifications

Examples:

def self.embeddable_content_types
  [:primary, :transcript]
end

Returns:

(Array<Symbol>) —
List of content types to embed



59
60
61

# File 'app/concerns/models/embeddable.rb', line 59

def embeddable_content_types
  [:primary]
end

.embedding_partition_class ⇒ `Class`^?

Returns the partition embedding class for this model.
Maps model names to their ContentEmbedding partition subclasses.

Examples:

Post.embedding_partition_class # => ContentEmbedding::PostEmbedding
Image.embedding_partition_class # => ContentEmbedding::ImageEmbedding

Returns:

(Class, nil) —
The partition class or nil if not found

# File 'app/concerns/models/embeddable.rb', line 121

def embedding_partition_class
  partition_class_name = "ContentEmbedding::#{name}Embedding"
  partition_class_name.safe_constantize
end

.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ `Object`

Batch regenerate embeddings for all records

Examples:

Regenerate all

Post.regenerate_all_embeddings

Regenerate published only

Post.regenerate_all_embeddings(scope: Post.published)

Parameters:

batch_size (Integer) (defaults to: 100) —
Number of records to process per batch
scope (ActiveRecord::Relation) (defaults to: nil) —
Optional scope to filter records

# File 'app/concerns/models/embeddable.rb', line 74

def regenerate_all_embeddings(batch_size: 100, scope: nil)
  records = scope || all
  count = 0

  records.find_each(batch_size: batch_size) do |record|
    EmbeddingWorker.perform_async(record.class.name, record.id)
    count += 1
  end

  Rails.logger.info "Queued #{count} #{name} records for embedding generation"
  count
end

.semantic_search(query, limit: 10, **options) ⇒ `Array<ApplicationRecord>`

Semantic search within this model type.
Delegates to the appropriate partition embedding class for model-specific
embedding configuration (OpenAI for text, Gemini for images).

Examples:

Post.semantic_search("spa wellness tips")
Image.semantic_search("bathroom with heated floors")

Parameters:

query (String) —
Natural language search query
limit (Integer) (defaults to: 10) —
Maximum results
options (Hash) —
Additional options passed to partition class

Returns:

(Array<ApplicationRecord>) —
Records ordered by similarity

# File 'app/concerns/models/embeddable.rb', line 100

def semantic_search(query, limit: 10, **options)
  partition_class = embedding_partition_class
  unless partition_class
    Rails.logger.warn "[#{name}] No embedding partition class found, falling back to ContentEmbedding"
    embeddings = ContentEmbedding.semantic_search(query, limit: limit, types: [name], **options)
    return embeddings.map(&:embeddable)
  end

  embeddings = partition_class.semantic_search(query, limit: limit, **options)
  embeddings.map(&:embeddable)
end

Instance Method Details

#content_embeddings ⇒ `ActiveRecord::Relation<ContentEmbedding>`

Returns:

(ActiveRecord::Relation<ContentEmbedding>)

#content_for_embedding(_content_type = :primary) ⇒ `String`

Override in model to provide content for embedding.
This is the text that will be converted to a vector embedding.

Examples:

def content_for_embedding(content_type = :primary)
  [title, description, body].compact.join("\n\n")
end

Parameters:

content_type (Symbol) —
The type of content to embed
_content_type (Symbol) (defaults to: :primary)

Returns:

(String) —
Text content to embed

Raises:

(NotImplementedError)



139
140
141

# File 'app/concerns/models/embeddable.rb', line 139

def content_for_embedding(_content_type = :primary)
  raise NotImplementedError, "#{self.class} must implement #content_for_embedding"
end

#embeddable_locales ⇒ `Array<String>`

Override in model to specify all locales that should have embeddings.
Return an array if content exists in multiple languages.
By default, returns only the primary locale.

Examples:

Model with multiple translations

def embeddable_locales
  publication_locales.presence || ['en']
end

Returns:

(Array<String>) —
List of locale codes



174
175
176

# File 'app/concerns/models/embeddable.rb', line 174

def embeddable_locales
  [locale_for_embedding]
end

#embedding_content_hash(content_type = :primary, locale: nil) ⇒ `String`

Generate content hash for change detection.

When a model's +content_for_embedding+ accepts a +locale:+ keyword
(e.g. Post, where Liquid rendering varies per locale) the hash is
computed for that locale so stale-detection is also per-locale.
Models that do not declare a +locale:+ keyword receive the same
hash regardless of locale, preserving their existing behaviour.

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content
locale (String, nil) (defaults to: nil) —
Locale to hash for (defaults to model's locale)

Returns:

(String) —
SHA256 hash of the content (first 32 chars)

# File 'app/concerns/models/embeddable.rb', line 190

def embedding_content_hash(content_type = :primary, locale: nil)
  content = locale_aware_content_for_embedding(content_type, locale || locale_for_embedding).to_s
  Digest::SHA256.hexdigest(content)[0..31]
end

#embedding_stale?(content_type = :primary, locale: nil) ⇒ `Boolean`

Check if embedding needs regeneration

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content
locale (String) (defaults to: nil) —
The locale to check (defaults to model's locale)

Returns:

(Boolean) —
true if embedding is stale or missing

# File 'app/concerns/models/embeddable.rb', line 201

def embedding_stale?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  embedding = find_content_embedding(content_type, locale: locale)
  return true unless embedding

  embedding.content_hash != embedding_content_hash(content_type, locale: locale)
end

#embedding_type_name ⇒ `String`

Returns the type name to use for content_embeddings
Uses actual class name instead of base class for STI models

Returns:

(String) —
Type name for polymorphic association



297
298
299

# File 'app/concerns/models/embeddable.rb', line 297

def embedding_type_name
  self.class.name
end

#embedding_vector ⇒ `Array<Float>`^?

Get the primary embedding vector

Returns:

(Array<Float>, nil) —
The embedding vector or nil



399
400
401

# File 'app/concerns/models/embeddable.rb', line 399

def embedding_vector
  find_content_embedding(:primary)&.embedding
end

#find_content_embedding(content_type = :primary, locale: nil) ⇒ `ContentEmbedding`^?

Find content embedding using correct type name for STI models

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content
locale (String) (defaults to: nil) —
The locale to find (defaults to model's locale)

Returns:

(ContentEmbedding, nil) —
The embedding or nil

# File 'app/concerns/models/embeddable.rb', line 226

def find_content_embedding(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding

  ContentEmbedding.find_by(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    content_type: content_type.to_s,
    locale: locale.to_s
  )
end

#find_similar(limit: 5, same_type_only: true) ⇒ `Array<ApplicationRecord>`

Find similar content to this record

Examples:

showcase.find_similar(limit: 5)
post.find_similar(same_type_only: false) # Cross-type search

Parameters:

limit (Integer) (defaults to: 5) —
Maximum number of results
same_type_only (Boolean) (defaults to: true) —
Only return same model type

Returns:

(Array<ApplicationRecord>) —
Similar records

# File 'app/concerns/models/embeddable.rb', line 390

def find_similar(limit: 5, same_type_only: true)
  ContentEmbedding.find_similar(self, limit: limit, same_type_only: same_type_only)
                  .map(&:embeddable)
end

#generate_all_embeddings!(force: false) ⇒ `Array<ContentEmbedding>`

Generate embeddings for all content types

Parameters:

force (Boolean) (defaults to: false) —
Regenerate even if not stale

Returns:

(Array<ContentEmbedding>) —
Created/updated embeddings

# File 'app/concerns/models/embeddable.rb', line 374

def generate_all_embeddings!(force: false)
  self.class.embeddable_content_types.filter_map do |content_type|
    generate_embedding!(content_type, force: force)
  end
end

#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ `Array<ContentEmbedding>`

Generate chunked embeddings for long content
Splits content into overlapping chunks and creates an embedding for each.
Use this for documents that exceed the token limit (e.g., long articles, PDFs).

Examples:

Embed a long document in chunks

article.generate_chunked_embeddings!(:primary)

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content to embed
force (Boolean) (defaults to: false) —
Regenerate even if not stale
locale (String) (defaults to: nil) —
The locale for the content (defaults to model's locale)

Returns:

(Array<ContentEmbedding>) —
Created embeddings for each chunk

# File 'app/concerns/models/embeddable.rb', line 313

def generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return [] unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return [] if content.blank?

  chunker = Embedding::ContentChunker.new(content)

  # If no chunking needed, use regular embedding
  return [generate_embedding!(content_type, force, locale: locale)].compact unless chunker.needs_chunking?

  # Delete existing chunk embeddings for this content type and locale
  ContentEmbedding.where(
    embeddable_type: embedding_type_name,
    embeddable_id: id,
    locale: locale_str
  ).where('content_type LIKE ?', "#{content_type}_chunk_%").delete_all

  # Generate embedding for each chunk
  embeddings = []
  chunker.chunks.each_with_index do |chunk, index|
    chunk_type = "#{content_type}_chunk_#{index}"

    begin
      result = RubyLLM.embed(chunk, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true)

      emb = ContentEmbedding.create!(
        embeddable_type: embedding_type_name,
        embeddable_id: id,
        content_type: chunk_type,
        locale: locale_str,
        embedding: result.vectors,
        content_hash: Digest::SHA256.hexdigest(chunk)[0..31],
        token_count: result.input_tokens,
        model: result.model
      )
      embeddings << emb
    rescue RubyLLM::Error => e
      Rails.logger.error "Chunk embedding failed for #{self.class}##{id} chunk #{index}: #{e.message}"
    end
  end

  Rails.logger.info "Generated #{embeddings.size} chunk embeddings for #{self.class}##{id} (locale: #{locale_str})"
  embeddings
end

#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ `ContentEmbedding`^?

Generate or update embedding for this record

Examples:

post.generate_embedding!(:primary)
video.generate_embedding!(:transcript, force: true)
site_map.generate_embedding!(:primary, locale: 'fr')

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content to embed
force (Boolean) (defaults to: false) —
Regenerate even if not stale
locale (String) (defaults to: nil) —
The locale for the content (defaults to model's locale)

Returns:

(ContentEmbedding, nil) —
The created/updated embedding or nil

# File 'app/concerns/models/embeddable.rb', line 249

def generate_embedding!(content_type = :primary, force: false, locale: nil)
  locale ||= locale_for_embedding
  locale_str = locale.to_s

  return unless force || embedding_stale?(content_type, locale: locale)

  content = locale_aware_content_for_embedding(content_type, locale_str)
  return if content.blank?

  # Truncate content to fit within token limits
  truncated_content = content.to_s.truncate(MAX_CONTENT_LENGTH, omission: '...')

  begin
    result = RubyLLM.embed(truncated_content, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true)

    # Use the actual class name for STI models (Video, Image, Post) rather than base class
    # This allows proper filtering by type in semantic searches
    embedding_attrs = {
      embeddable_type: embedding_type_name,
      embeddable_id: id,
      content_type: content_type.to_s,
      locale: locale_str
    }
    embedding_values = {
      embedding: result.vectors,
      content_hash: embedding_content_hash(content_type),
      token_count: result.input_tokens,
      model: result.model
    }
    begin
      ContentEmbedding.find_or_initialize_by(embedding_attrs).tap do |emb|
        emb.assign_attributes(embedding_values)
        emb.save!
      end
    rescue ActiveRecord::RecordNotUnique
      # Two concurrent workers raced to insert — find the winner's record and update it.
      ContentEmbedding.find_by(embedding_attrs)&.update!(embedding_values)
    end
  rescue RubyLLM::Error => e
    Rails.logger.error "Embedding generation failed for #{self.class}##{id} (#{content_type}): #{e.message}"
    nil
  end
end

#has_embedding?(content_type = :primary, locale: nil) ⇒ `Boolean`

Check if record has an embedding

Parameters:

content_type (Symbol) (defaults to: :primary) —
The type of content
locale (String) (defaults to: nil) —
The locale to check (defaults to model's locale)

Returns:

(Boolean) —
true if embedding exists

# File 'app/concerns/models/embeddable.rb', line 215

def has_embedding?(content_type = :primary, locale: nil)
  locale ||= locale_for_embedding
  find_content_embedding(content_type, locale: locale).present?
end

#locale_for_embedding ⇒ `String`

Override in model to specify the locale for embedding content.
This determines which locale's content is embedded and enables
locale-filtered searches.

Examples:

SiteMap with locale column

def locale_for_embedding
  locale.to_s.split('-').first # 'en-US' -> 'en'
end

Item with publication_locales array

def locale_for_embedding
  publication_locales&.first || 'en'
end

Returns:

(String) —
The locale code (e.g., 'en', 'fr', 'en-US')



159
160
161

# File 'app/concerns/models/embeddable.rb', line 159

def locale_for_embedding
  'en' # Default to English
end

#needs_chunking?(content_type = :primary) ⇒ `Boolean`

Check if content needs chunking

Returns:

(Boolean) —
true if content exceeds the token limit

# File 'app/concerns/models/embeddable.rb', line 364

def needs_chunking?(content_type = :primary)
  content = locale_aware_content_for_embedding(content_type, locale_for_embedding).to_s
  Embedding::ContentChunker.new(content).needs_chunking?
end

Module: Models::Embeddable

Overview

Examples:

Basic usage

Finding similar content

Manual embedding generation

Constant Summary collapse

Has many collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.embeddable_content_types ⇒ Array<Symbol>

Examples:

.embedding_partition_class ⇒ Class?

Examples:

.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Object

Examples:

Regenerate all

Regenerate published only

.semantic_search(query, limit: 10, **options) ⇒ Array<ApplicationRecord>

Examples:

Instance Method Details

#content_embeddings ⇒ ActiveRecord::Relation<ContentEmbedding>

#content_for_embedding(_content_type = :primary) ⇒ String

Examples:

#embeddable_locales ⇒ Array<String>

Examples:

Model with multiple translations

#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String

#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean

#embedding_type_name ⇒ String

#embedding_vector ⇒ Array<Float>?

#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?

#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>

Examples:

#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>

#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>

Examples:

Embed a long document in chunks

#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?

Examples:

#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean

#locale_for_embedding ⇒ String

Examples:

SiteMap with locale column

Item with publication_locales array

#needs_chunking?(content_type = :primary) ⇒ Boolean

.embeddable_content_types ⇒ `Array<Symbol>`

.embedding_partition_class ⇒ `Class`^?

.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ `Object`

.semantic_search(query, limit: 10, **options) ⇒ `Array<ApplicationRecord>`

#content_embeddings ⇒ `ActiveRecord::Relation<ContentEmbedding>`

#content_for_embedding(_content_type = :primary) ⇒ `String`

#embeddable_locales ⇒ `Array<String>`

#embedding_content_hash(content_type = :primary, locale: nil) ⇒ `String`

#embedding_stale?(content_type = :primary, locale: nil) ⇒ `Boolean`

#embedding_type_name ⇒ `String`

#embedding_vector ⇒ `Array<Float>`^?

#find_content_embedding(content_type = :primary, locale: nil) ⇒ `ContentEmbedding`^?

#find_similar(limit: 5, same_type_only: true) ⇒ `Array<ApplicationRecord>`

#generate_all_embeddings!(force: false) ⇒ `Array<ContentEmbedding>`

#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ `Array<ContentEmbedding>`

#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ `ContentEmbedding`^?

#has_embedding?(content_type = :primary, locale: nil) ⇒ `Boolean`

#locale_for_embedding ⇒ `String`

#needs_chunking?(content_type = :primary) ⇒ `Boolean`