Module: Models::Embeddable
- Extended by:
- ActiveSupport::Concern
- Included in:
- Activity, Article, AssistantBrainEntry, CallRecord, Communication, Image, Item, ProductLine, ReviewsIo, Showcase, SiteMap, Video
- Defined in:
- app/concerns/models/embeddable.rb
Overview
Concern for models that support vector embeddings for semantic search.
Include this in any model that should be searchable via AI-powered
semantic search.
Constant Summary collapse
- MAX_CONTENT_LENGTH =
Maximum content length for embedding (roughly 30k chars, within the
Gemini text window). 30_000
Has many collapse
Class Method Summary collapse
-
.embeddable_content_types ⇒ Array<Symbol>
Override in model to define what content types are embeddable.
-
.embedding_partition_class ⇒ Class?
Returns the partition embedding class for this model.
-
.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Integer
Batch regenerate embeddings for all records by enqueueing
EmbeddingWorkerfor each record in scope. -
.semantic_search(query, limit: 10) ⇒ Array<ApplicationRecord>
Semantic search within this model type, over the unified Gemini space.
Instance Method Summary collapse
-
#content_for_embedding(_content_type = :primary) ⇒ String
Override in model to provide content for embedding.
-
#embeddable_locales ⇒ Array<String>
Override in model to specify all locales that should have embeddings.
-
#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String
Generate content hash for change detection.
-
#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean
Check whether the embedding for
content_type/localeneeds regeneration. -
#embedding_type_name ⇒ String
Returns the type name to use for
content_embeddings. -
#embedding_vector ⇒ Array<Float>?
Returns the primary embedding vector for this record.
-
#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?
Find a content embedding using the correct type name for STI models.
-
#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>
Find content similar to this record via the shared
ContentEmbeddingsimilarity index. -
#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>
Generate embeddings for all content types declared by
embeddable_content_types. -
#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>
Generate chunked embeddings for long content.
-
#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?
Generate or update an embedding for this record.
-
#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean
Check whether this record has an embedding for the given content type and locale.
-
#locale_for_embedding ⇒ String
Override in model to specify the locale for embedding content.
-
#needs_chunking?(content_type = :primary) ⇒ Boolean
Check whether content for
content_typewould need chunking before embedding (i.e. exceeds the token limit).
Class Method Details
.embeddable_content_types ⇒ Array<Symbol>
Override in model to define what content types are embeddable.
Common types: :primary, :visual, :transcript, :specifications.
54 55 56 |
# File 'app/concerns/models/embeddable.rb', line 54 def [:primary] end |
.embedding_partition_class ⇒ Class?
Returns the partition embedding class for this model. Maps model
names to their ContentEmbedding partition subclasses by convention.
105 106 107 108 |
# File 'app/concerns/models/embeddable.rb', line 105 def partition_class_name = "ContentEmbedding::#{name}Embedding" partition_class_name.safe_constantize end |
.regenerate_all_embeddings(batch_size: 100, scope: nil) ⇒ Integer
Batch regenerate embeddings for all records by enqueueing
EmbeddingWorker for each record in scope.
69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'app/concerns/models/embeddable.rb', line 69 def (batch_size: 100, scope: nil) records = scope || all count = 0 records.find_each(batch_size: batch_size) do |record| EmbeddingWorker.perform_async(record.class.name, record.id) count += 1 end Rails.logger.info "Queued #{count} #{name} records for embedding generation" count end |
.semantic_search(query, limit: 10) ⇒ Array<ApplicationRecord>
Semantic search within this model type, over the unified Gemini space.
92 93 94 95 |
# File 'app/concerns/models/embeddable.rb', line 92 def semantic_search(query, limit: 10, **) ContentEmbedding.unified_hybrid_search(query, limit: limit, types: [name], **) .map(&:embeddable) end |
Instance Method Details
#content_embeddings ⇒ ActiveRecord::Relation<ContentEmbedding>
39 |
# File 'app/concerns/models/embeddable.rb', line 39 has_many :content_embeddings, as: :embeddable, dependent: :destroy |
#content_for_embedding(_content_type = :primary) ⇒ String
Override in model to provide content for embedding. The returned
string is what gets converted to a vector embedding.
120 121 122 |
# File 'app/concerns/models/embeddable.rb', line 120 def (_content_type = :primary) raise NotImplementedError, "#{self.class} must implement #content_for_embedding" end |
#embeddable_locales ⇒ Array<String>
Override in model to specify all locales that should have embeddings.
Return an array if content exists in multiple languages. Defaults to
only the primary locale.
150 151 152 |
# File 'app/concerns/models/embeddable.rb', line 150 def [] end |
#embedding_content_hash(content_type = :primary, locale: nil) ⇒ String
Generate content hash for change detection.
When a model's content_for_embedding accepts a locale: keyword
(e.g. Post, where Liquid rendering varies per locale) the hash is
computed for that locale so stale-detection is also per-locale.
Models that do not declare a locale: keyword receive the same hash
regardless of locale, preserving their existing behaviour.
166 167 168 169 |
# File 'app/concerns/models/embeddable.rb', line 166 def (content_type = :primary, locale: nil) content = (content_type, locale || ).to_s Digest::SHA256.hexdigest(content)[0..31] end |
#embedding_stale?(content_type = :primary, locale: nil) ⇒ Boolean
Check whether the embedding for content_type/locale needs
regeneration.
178 179 180 181 182 183 184 185 186 187 |
# File 'app/concerns/models/embeddable.rb', line 178 def (content_type = :primary, locale: nil) locale ||= = (content_type, locale: locale) return true unless # A row with no Gemini vector yet (e.g. an old OpenAI row after the column # drop) needs (re)embedding even if the content hash is unchanged. return true if ..blank? .content_hash != (content_type, locale: locale) end |
#embedding_type_name ⇒ String
Returns the type name to use for content_embeddings. Uses the actual
class name instead of the base class for STI models, so semantic
searches can filter by type.
284 285 286 |
# File 'app/concerns/models/embeddable.rb', line 284 def self.class.name end |
#embedding_vector ⇒ Array<Float>?
Returns the primary embedding vector for this record.
396 397 398 |
# File 'app/concerns/models/embeddable.rb', line 396 def (:primary)&. end |
#find_content_embedding(content_type = :primary, locale: nil) ⇒ ContentEmbedding?
Find a content embedding using the correct type name for STI models.
The logical content_type is mapped to its canonical Gemini "unified"
row (see #unified_content_type), since that is where vectors now live.
209 210 211 212 213 214 215 216 217 218 |
# File 'app/concerns/models/embeddable.rb', line 209 def (content_type = :primary, locale: nil) locale ||= ContentEmbedding.find_by( embeddable_type: , embeddable_id: id, content_type: unified_content_type(content_type), locale: locale.to_s ) end |
#find_similar(limit: 5, same_type_only: true) ⇒ Array<ApplicationRecord>
Find content similar to this record via the shared ContentEmbedding
similarity index.
387 388 389 390 |
# File 'app/concerns/models/embeddable.rb', line 387 def find_similar(limit: 5, same_type_only: true) ContentEmbedding.find_similar(self, limit: limit, same_type_only: same_type_only) .map(&:embeddable) end |
#generate_all_embeddings!(force: false) ⇒ Array<ContentEmbedding>
Generate embeddings for all content types declared by
embeddable_content_types.
370 371 372 373 374 |
# File 'app/concerns/models/embeddable.rb', line 370 def (force: false) self.class..filter_map do |content_type| (content_type, force: force) end end |
#generate_chunked_embeddings!(content_type = :primary, force: false, locale: nil) ⇒ Array<ContentEmbedding>
Generate chunked embeddings for long content. Splits content into
overlapping chunks and creates an embedding for each. Use this for
documents that exceed the token limit (e.g. long articles, PDFs).
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 |
# File 'app/concerns/models/embeddable.rb', line 300 def (content_type = :primary, force: false, locale: nil) locale ||= locale_str = locale.to_s return [] unless force || (content_type, locale: locale) content = (content_type, locale_str) return [] if content.blank? chunker = Embedding::ContentChunker.new(content) # If no chunking needed, use regular embedding return [(content_type, force, locale: locale)].compact unless chunker.needs_chunking? # Delete existing chunk embeddings for this content type and locale chunk_prefix = unified_content_type(content_type) ContentEmbedding.where( embeddable_type: , embeddable_id: id, locale: locale_str ).where('content_type LIKE ?', "#{chunk_prefix}_chunk_%").delete_all # Generate embedding for each chunk (batched Gemini call). chunks = chunker.chunks vectors = begin Embedding::Gemini.(chunks, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS) rescue Embedding::Gemini::Error => e Rails.logger.error "Chunk embedding failed for #{self.class}##{id}: #{e.}" [] end = [] chunks.each_with_index do |chunk, index| vector = vectors[index] next if vector.blank? emb = ContentEmbedding.create!( embeddable_type: , embeddable_id: id, content_type: "#{chunk_prefix}_chunk_#{index}", locale: locale_str, unified_embedding: vector, embedding_model: ContentEmbedding::UNIFIED_MODEL, embedding_dimensions: ContentEmbedding::UNIFIED_DIMENSIONS, content_hash: Digest::SHA256.hexdigest(chunk)[0..31] ) << emb end Rails.logger.info "Generated #{.size} chunk embeddings for #{self.class}##{id} (locale: #{locale_str})" end |
#generate_embedding!(content_type = :primary, force: false, locale: nil) ⇒ ContentEmbedding?
Generate or update an embedding for this record.
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
# File 'app/concerns/models/embeddable.rb', line 233 def (content_type = :primary, force: false, locale: nil) locale ||= locale_str = locale.to_s return unless force || (content_type, locale: locale) content = (content_type, locale_str) return if content.blank? # Truncate content to fit within the Gemini text window (~8k tokens). truncated_content = content.to_s.truncate(MAX_CONTENT_LENGTH, omission: '...') begin vector = Embedding::Gemini.(truncated_content, dimensions: ContentEmbedding::UNIFIED_DIMENSIONS) return if vector.blank? # Use the actual class name for STI models (Video, Image, Post) rather than base class # This allows proper filtering by type in semantic searches. The row lives in # the canonical Gemini "unified" space (cross-modal with images). = { embeddable_type: , embeddable_id: id, content_type: unified_content_type(content_type), locale: locale_str } = { unified_embedding: vector, embedding_model: ContentEmbedding::UNIFIED_MODEL, embedding_dimensions: ContentEmbedding::UNIFIED_DIMENSIONS, content_hash: (content_type, locale: locale) } begin ContentEmbedding.find_or_initialize_by().tap do |emb| emb.assign_attributes() emb.save! end rescue ActiveRecord::RecordNotUnique # Two concurrent workers raced to insert — find the winner's record and update it. ContentEmbedding.find_by()&.update!() end rescue Embedding::Gemini::Error => e Rails.logger.error "Embedding generation failed for #{self.class}##{id} (#{content_type}): #{e.}" nil end end |
#has_embedding?(content_type = :primary, locale: nil) ⇒ Boolean
Check whether this record has an embedding for the given content type
and locale.
196 197 198 199 |
# File 'app/concerns/models/embeddable.rb', line 196 def (content_type = :primary, locale: nil) locale ||= (content_type, locale: locale).present? end |
#locale_for_embedding ⇒ String
Override in model to specify the locale for embedding content. This
determines which locale's content is embedded and enables
locale-filtered searches.
137 138 139 |
# File 'app/concerns/models/embeddable.rb', line 137 def 'en' # Default to English end |
#needs_chunking?(content_type = :primary) ⇒ Boolean
Check whether content for content_type would need chunking before
embedding (i.e. exceeds the token limit).
358 359 360 361 |
# File 'app/concerns/models/embeddable.rb', line 358 def needs_chunking?(content_type = :primary) content = (content_type, ).to_s Embedding::ContentChunker.new(content).needs_chunking? end |