Module: Models::Embeddable
- Extended by:
- ActiveSupport::Concern
- Included in:
- Activity, Article, AssistantBrainEntry, CallRecord, Communication, Image, Item, ProductLine, ReviewsIo, Showcase, SiteMap, Video
- Defined in:
- app/concerns/models/embeddable.rb
Overview
Concern for models that support vector embeddings for semantic search.
Include this in any model that should be searchable via AI-powered semantic search.
Constant Summary collapse
- MAX_CONTENT_LENGTH =
Maximum content length for embedding (roughly 30k chars = ~7500 tokens)
30_000- DEFAULT_MODEL =
Default embedding model
'text-embedding-3-small'
Has many collapse
Class Method Summary collapse
-
.embeddable_content_types ⇒ Array<Symbol>
Override in model to define what content types are embeddable.
-
.embedding_partition_class ⇒ Class?
Returns the partition embedding class for this model.
-
.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Object
Batch regenerate embeddings for all records.
-
.semantic_search(query, limit: 10, **options) ⇒ Array<ApplicationRecord>
Semantic search within this model type.
Instance Method Summary collapse
-
#content_for_embedding(_content_type = :primary) ⇒ String
Override in model to provide content for embedding.
-
#embeddable_locales ⇒ Array<String>
Override in model to specify all locales that should have embeddings.
-
#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String
Generate content hash for change detection.
-
#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean
Check if embedding needs regeneration.
-
#embedding_type_name ⇒ String
Returns the type name to use for content_embeddings Uses actual class name instead of base class for STI models.
-
#embedding_vector ⇒ Array<Float>?
Get the primary embedding vector.
-
#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?
Find content embedding using correct type name for STI models.
-
#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>
Find similar content to this record.
-
#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>
Generate embeddings for all content types.
-
#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>
Generate chunked embeddings for long content Splits content into overlapping chunks and creates an embedding for each.
-
#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?
Generate or update embedding for this record.
-
#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean
Check if record has an embedding.
-
#locale_for_embedding ⇒ String
Override in model to specify the locale for embedding content.
-
#needs_chunking?(content_type = :primary) ⇒ Boolean
Check if content needs chunking.
Class Method Details
.embeddable_content_types ⇒ Array<Symbol>
Override in model to define what content types are embeddable.
Common types: :primary, :visual, :transcript, :specifications
59 60 61 |
# File 'app/concerns/models/embeddable.rb', line 59 def [:primary] end |
.embedding_partition_class ⇒ Class?
Returns the partition embedding class for this model.
Maps model names to their ContentEmbedding partition subclasses.
121 122 123 124 |
# File 'app/concerns/models/embeddable.rb', line 121 def partition_class_name = "ContentEmbedding::#{name}Embedding" partition_class_name.safe_constantize end |
.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Object
Batch regenerate embeddings for all records
74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'app/concerns/models/embeddable.rb', line 74 def (batch_size: 100, scope: nil) records = scope || all count = 0 records.find_each(batch_size: batch_size) do |record| EmbeddingWorker.perform_async(record.class.name, record.id) count += 1 end Rails.logger.info "Queued #{count} #{name} records for embedding generation" count end |
.semantic_search(query, limit: 10, **options) ⇒ Array<ApplicationRecord>
Semantic search within this model type.
Delegates to the appropriate partition embedding class for model-specific
embedding configuration (OpenAI for text, Gemini for images).
100 101 102 103 104 105 106 107 108 109 110 |
# File 'app/concerns/models/embeddable.rb', line 100 def semantic_search(query, limit: 10, **) partition_class = unless partition_class Rails.logger.warn "[#{name}] No embedding partition class found, falling back to ContentEmbedding" = ContentEmbedding.semantic_search(query, limit: limit, types: [name], **) return .map(&:embeddable) end = partition_class.semantic_search(query, limit: limit, **) .map(&:embeddable) end |
Instance Method Details
#content_embeddings ⇒ ActiveRecord::Relation<ContentEmbedding>
42 |
# File 'app/concerns/models/embeddable.rb', line 42 has_many :content_embeddings, as: :embeddable, dependent: :destroy |
#content_for_embedding(_content_type = :primary) ⇒ String
Override in model to provide content for embedding.
This is the text that will be converted to a vector embedding.
139 140 141 |
# File 'app/concerns/models/embeddable.rb', line 139 def (_content_type = :primary) raise NotImplementedError, "#{self.class} must implement #content_for_embedding" end |
#embeddable_locales ⇒ Array<String>
Override in model to specify all locales that should have embeddings.
Return an array if content exists in multiple languages.
By default, returns only the primary locale.
174 175 176 |
# File 'app/concerns/models/embeddable.rb', line 174 def [] end |
#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String
Generate content hash for change detection.
When a model's +content_for_embedding+ accepts a +locale:+ keyword
(e.g. Post, where Liquid rendering varies per locale) the hash is
computed for that locale so stale-detection is also per-locale.
Models that do not declare a +locale:+ keyword receive the same
hash regardless of locale, preserving their existing behaviour.
190 191 192 193 |
# File 'app/concerns/models/embeddable.rb', line 190 def (content_type = :primary, locale: nil) content = (content_type, locale || ).to_s Digest::SHA256.hexdigest(content)[0..31] end |
#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean
Check if embedding needs regeneration
201 202 203 204 205 206 207 |
# File 'app/concerns/models/embeddable.rb', line 201 def (content_type = :primary, locale: nil) locale ||= = (content_type, locale: locale) return true unless .content_hash != (content_type, locale: locale) end |
#embedding_type_name ⇒ String
Returns the type name to use for content_embeddings
Uses actual class name instead of base class for STI models
297 298 299 |
# File 'app/concerns/models/embeddable.rb', line 297 def self.class.name end |
#embedding_vector ⇒ Array<Float>?
Get the primary embedding vector
399 400 401 |
# File 'app/concerns/models/embeddable.rb', line 399 def (:primary)&. end |
#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?
Find content embedding using correct type name for STI models
226 227 228 229 230 231 232 233 234 235 |
# File 'app/concerns/models/embeddable.rb', line 226 def (content_type = :primary, locale: nil) locale ||= ContentEmbedding.find_by( embeddable_type: , embeddable_id: id, content_type: content_type.to_s, locale: locale.to_s ) end |
#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>
Find similar content to this record
390 391 392 393 |
# File 'app/concerns/models/embeddable.rb', line 390 def find_similar(limit: 5, same_type_only: true) ContentEmbedding.find_similar(self, limit: limit, same_type_only: same_type_only) .map(&:embeddable) end |
#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>
Generate embeddings for all content types
374 375 376 377 378 |
# File 'app/concerns/models/embeddable.rb', line 374 def (force: false) self.class..filter_map do |content_type| (content_type, force: force) end end |
#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>
Generate chunked embeddings for long content
Splits content into overlapping chunks and creates an embedding for each.
Use this for documents that exceed the token limit (e.g., long articles, PDFs).
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
# File 'app/concerns/models/embeddable.rb', line 313 def (content_type = :primary, force: false, locale: nil) locale ||= locale_str = locale.to_s return [] unless force || (content_type, locale: locale) content = (content_type, locale_str) return [] if content.blank? chunker = Embedding::ContentChunker.new(content) # If no chunking needed, use regular embedding return [(content_type, force, locale: locale)].compact unless chunker.needs_chunking? # Delete existing chunk embeddings for this content type and locale ContentEmbedding.where( embeddable_type: , embeddable_id: id, locale: locale_str ).where('content_type LIKE ?', "#{content_type}_chunk_%").delete_all # Generate embedding for each chunk = [] chunker.chunks.each_with_index do |chunk, index| chunk_type = "#{content_type}_chunk_#{index}" begin result = RubyLLM.(chunk, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true) emb = ContentEmbedding.create!( embeddable_type: , embeddable_id: id, content_type: chunk_type, locale: locale_str, embedding: result.vectors, content_hash: Digest::SHA256.hexdigest(chunk)[0..31], token_count: result.input_tokens, model: result.model ) << emb rescue RubyLLM::Error => e Rails.logger.error "Chunk embedding failed for #{self.class}##{id} chunk #{index}: #{e.}" end end Rails.logger.info "Generated #{.size} chunk embeddings for #{self.class}##{id} (locale: #{locale_str})" end |
#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?
Generate or update embedding for this record
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'app/concerns/models/embeddable.rb', line 249 def (content_type = :primary, force: false, locale: nil) locale ||= locale_str = locale.to_s return unless force || (content_type, locale: locale) content = (content_type, locale_str) return if content.blank? # Truncate content to fit within token limits truncated_content = content.to_s.truncate(MAX_CONTENT_LENGTH, omission: '...') begin result = RubyLLM.(truncated_content, model: DEFAULT_MODEL, provider: :openai, assume_model_exists: true) # Use the actual class name for STI models (Video, Image, Post) rather than base class # This allows proper filtering by type in semantic searches = { embeddable_type: , embeddable_id: id, content_type: content_type.to_s, locale: locale_str } = { embedding: result.vectors, content_hash: (content_type), token_count: result.input_tokens, model: result.model } begin ContentEmbedding.find_or_initialize_by().tap do |emb| emb.assign_attributes() emb.save! end rescue ActiveRecord::RecordNotUnique # Two concurrent workers raced to insert — find the winner's record and update it. ContentEmbedding.find_by()&.update!() end rescue RubyLLM::Error => e Rails.logger.error "Embedding generation failed for #{self.class}##{id} (#{content_type}): #{e.}" nil end end |
#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean
Check if record has an embedding
215 216 217 218 |
# File 'app/concerns/models/embeddable.rb', line 215 def (content_type = :primary, locale: nil) locale ||= (content_type, locale: locale).present? end |
#locale_for_embedding ⇒ String
Override in model to specify the locale for embedding content.
This determines which locale's content is embedded and enables
locale-filtered searches.
159 160 161 |
# File 'app/concerns/models/embeddable.rb', line 159 def 'en' # Default to English end |
#needs_chunking?(content_type = :primary) ⇒ Boolean
Check if content needs chunking
364 365 366 367 |
# File 'app/concerns/models/embeddable.rb', line 364 def needs_chunking?(content_type = :primary) content = (content_type, ).to_s Embedding::ContentChunker.new(content).needs_chunking? end |