Skip to content

AI Agent Architecture

Status: Production Ready Last Updated: February 2026

The AI Agent (Sunny) is an internal assistant for WarmlyYours employees, accessible at /assistant in the CRM. It answers business questions using real-time database queries, content search, blog management, and multi-model LLM reasoning. The system is built on RubyLLM’s acts_as_chat + Agent patterns, with role-based data access, a persistent self-learning brain, tool-loop guardrails, audit logging, and streaming Turbo Stream responses.

Key capabilities:

  • Natural-language SQL: Users ask questions; the LLM writes, executes, and interprets read-only SQL
  • Content search: Semantic search across products, FAQs, blog posts, videos, showcases, and reviews
  • Blog management: Create, update, and enrich blog posts with linked assets (images, videos, FAQs, product cards), proper tagging, and SEO metadata
  • Multi-model support: Auto-selects from Claude, GPT-4.1, or Gemini based on query complexity
  • Extended thinking: Activates reasoning scratchpad for analytical queries (Anthropic/Gemini)
  • Data domain access control: CanCanCan roles map to data domains; each database table declares which domains it belongs to
  • Streaming UI: Real-time token-by-token streaming with live preview, tool status, and formatted output
  • Conversation sharing: Owner can share conversations with colleagues at viewer or collaborator level
  • Context compaction: Sliding-window summary for long conversations to stay within token limits
  • Sunny Brain: Persistent, editable knowledge base of learned rules injected into every system prompt; self-updates via semantic embeddings
Browser (CRM) Rails Server External
┌──────────────────┐ POST ┌──────────────────────────────┐
│ Stimulus │──/ask──────►│ Crm::AssistantChatController │
│ assistant_chat │ │ ├─ build_user_context() │
│ controller │ │ ├─ sanitize_tool_services() │
│ │ │ └─ AssistantChatWorker │
│ MutationObserver │ .perform_async() │
│ auto-scroll │ └───────────┬──────────────────┘
│ completion detect │ │ Sidekiq
└─────▲────────────┘ ▼
│ ┌──────────────────────────────┐
│ Turbo Streams │ AssistantChatWorker │
│ (ActionCable) │ ├─ ChatService.new(...) │
│ │ │ ├─ auto_select_model() │
│ │ │ ├─ configure_conversation()│
│ │ │ │ ├─ with_model() │
│ │ │ │ ├─ with_instructions() │
│ │ │ │ ├─ with_tools() │ ┌──────────┐
│ │ │ │ ├─ with_thinking() │──►│ Anthropic │
│ │ │ │ └─ on_tool_call() │ │ OpenAI │
│ │ │ └─ conversation.ask() │ │ Gemini │
│ │ │ ├─ LLM streaming │ └──────────┘
│ │ │ ├─ tool execution ─────┤
│ │ │ └─ auto-persist msgs │
│ │ ├─ broadcast_chunk() │
│◄──────────────────────────│ ├─ broadcast_complete() │
│ │ │ └─ ResponseFormatter │
│ │ └─ finalize_response() │
│ └──────────────────────────────┘
│ Tool Execution Layer
│ ┌────────────────────────────────────────────┐
│ │ Assistant::ChatToolBuilder │
│ │ ├─ Content tools (semantic_search, FAQs) │──► pgvector / OpenAI Embeddings
│ │ └─ PostgreSQL tools │
│ │ ├─ describe_available_data │──► CommentManifest (db/comments/*.yml)
│ │ ├─ execute_sql ──► SqlBroker │──► PostgreSQL (read-only)
│ │ ├─ list_schemas │
│ │ ├─ list_objects │
│ │ ├─ get_object_details │
│ │ └─ explain_query │
│ └────────────────────────────────────────────┘
│ Security Layer
│ ┌────────────────────────────────────────────┐
│ │ Assistant::DataDomainPolicy ◄── data_domains.yml│
│ │ ├─ role → domain mapping │
│ │ └─ CommentManifest.objects_for_domains() │
│ │ │
│ │ Assistant::DataPolicy │
│ │ ├─ ALWAYS_BLOCKED_OBJECTS │
│ │ ├─ object_allowed?() │
│ │ └─ sensitive_columns_for_query() │
│ │ │
│ │ Assistant::ToolLoopGuard │
│ │ ├─ MAX_IDENTICAL_CALLS = 2 │
│ │ └─ MAX_CONSECUTIVE_SQL_FAILURES = 3 │
│ │ │
│ │ AssistantSqlAuditLog (every SQL execution) │
│ └────────────────────────────────────────────┘

File: app/agents/assistant/sunny_agent.rb

RubyLLM::Agent subclass that serves as the canonical factory and configuration baseline for Sunny conversations. Follows the RubyLLM Agent pattern: chat_model wires create!/find to AssistantConversation, and instructions { nil } disables automatic discovery so ChatService can apply per-request configuration.

# Create a new conversation
conversation = Assistant::SunnyAgent.create!(user: current_user, messages: [])
# Load an existing conversation
conversation = Assistant::SunnyAgent.find(conversation_id)

All dynamic configuration (model, temperature, system prompt, tools, thinking, Anthropic prompt caching) is applied per-request by ChatService#configure_conversation. The agent is intentionally lean — it’s a factory, not an orchestrator.

2. Controller — Crm::AssistantChatController

Section titled “2. Controller — Crm::AssistantChatController”

File: app/controllers/crm/assistant_chat_controller.rb

The HTTP entry point. Handles conversation CRUD and delegates LLM processing to a Sidekiq worker.

ActionMethodDescription
indexGETLoad most recent conversation; list sidebar conversations
showGETLoad a specific conversation (owned or shared)
createPOSTCreate a new conversation via SunnyAgent.create!
askPOSTAccept user message, enqueue AssistantChatWorker
cancelPOSTSet Redis cancellation flag; worker checks between chunks
destroyDELETEDelete a conversation

Key responsibilities:

  • User context assembly: build_user_context() serializes identity (name, party_id, department, job_title), role flags (is_admin, is_manager), and resolved analytics_domains into a Hash passed to the worker.
  • Tool service selection: available_chat_services() limits non-admins to content + postgres_production. Admins also get postgres_versions (audit trail DB).
  • Processing lock: Checks conversation.processing? before enqueuing; blocks if a job is already running.
  • Access control: Viewers (shared conversations) cannot submit new messages.

File: app/workers/assistant_chat_worker.rb

Sidekiq job that runs the full LLM interaction loop in the background.

Lifecycle:

  1. Acquire processing lock (acquire_processing_lock!)
  2. Load conversation, instantiate Assistant::ChatService
  3. Call service.call { |chunk| broadcast_chunk(chunk) }
  4. LLM streams response tokens → worker broadcasts live preview via Turbo::StreamsChannel
  5. On tool calls: LLM pauses, tool executes, result fed back, LLM continues
  6. On completion: ResponseFormatter renders final markdown → HTML; broadcast replaces preview
  7. finalize_response(): track metrics, sync token totals, dual-write legacy JSONB
  8. Release processing lock

Streaming: The worker throttles preview updates to ~8fps (0.12s interval) for smooth UX.

Cancellation: cancel action sets cancelled:{jid} in Redis (TTL 300s). Worker checks cancelled? between chunks and stops streaming.

Error handling: broadcast_error maps known error patterns (rate limits, timeouts, invalid model IDs, JSON parse failures) to user-friendly messages.

4. Chat Service — Assistant::ChatService

Section titled “4. Chat Service — Assistant::ChatService”

File: app/services/assistant/chat_service.rb

Orchestrates a single chat turn: model selection, system prompt assembly, tool registration, thinking activation, and streaming execution.

MODELS = {
'claude-haiku' => { id: LlmDefaults::DEFAULT_HAIKU_MODEL, provider: :anthropic, cost: :low, supports_thinking: false },
'claude-sonnet' => { id: LlmDefaults::DEFAULT_SONNET_MODEL, provider: :anthropic, cost: :medium, supports_thinking: true, thinking_effort_default: :medium },
'claude-opus' => { id: LlmDefaults::DEFAULT_OPUS_MODEL, provider: :anthropic, cost: :high, supports_thinking: true, thinking_effort_default: :high },
'gpt-4.1' => { id: 'gpt-4.1', provider: :openai, cost: :medium, supports_thinking: false },
'gpt-4.1-mini' => { id: 'gpt-4.1-mini', provider: :openai, cost: :low, supports_thinking: false },
'gemini-flash' => { id: 'gemini-2.5-flash', provider: :gemini, cost: :low, supports_thinking: true, thinking_effort_default: :low },
'gemini-pro' => { id: 'gemini-2.5-pro', provider: :gemini, cost: :medium, supports_thinking: true, thinking_effort_default: :medium },
}

Auto-selection (model: 'auto') uses regex pattern matching + token estimation:

  • Simple lookups (< 50 tokens, “show me”, “list”, “count”) → claude-haiku
  • Complex/comparison queries (“trend”, “correlate”, “why”, > 150 tokens) → claude-sonnet
  • Defaultclaude-sonnet

Model IDs are managed through config/initializers/llm_defaults.rb to ensure a single source of truth.

Assembled from layers, joined with "\n\n":

  1. Base template (app/prompts/assistant/sunny_agent/instructions.txt.erb): Identity, date macros, formatting guidelines, source citation rules
  2. user_context_prompt: User identity (## CURRENT USER), party_id mappings for “my” queries, data access domains
  3. schema_hint_for_role: Instructs the LLM to call describe_available_data; power users also get list_schemas/get_object_details
  4. tools_system_prompt: Tool names, tool-mode rules, and service-specific rules (content, blog, GA4, GSC, Ahrefs, etc.)
  5. crm_url_templates_prompt: CRM URL patterns for linking records by ID
  6. brain_prompt: Hybrid-retrieved AssistantBrainEntry rules injected under ## LEARNED RULES; contextually filtered to entries whose applies_to_services overlap with the active tool services

For Anthropic models, the system prompt is wrapped with cache: true via RubyLLM::Providers::Anthropic::Content. This enables up to 90% input token savings on subsequent turns.

When the model supports it and the query matches THINKING_QUERY_PATTERNS:

  • Activated via conversation.with_thinking(effort:, budget:)
  • Budget tokens: 4K (low) / 8K (medium) / 16K (high, Opus only)
  • Anthropic requires temperature: 1 when thinking is active
  • Thinking traces are persisted to assistant_messages.thinking_text

Conversation Configuration (configure_conversation)

Section titled “Conversation Configuration (configure_conversation)”
conversation.with_model(model_id, provider:, assume_exists: true)
conversation.with_instructions(cacheable_system_prompt, replace: true)
conversation.with_thinking(effort:, budget:) # if applicable
conversation.with_params(max_tokens: budget + 16_000) # if thinking with budget
conversation.with_temperature(thinking? ? 1 : 0.3)
conversation.with_tools(*tools) # from ChatToolBuilder
conversation.on_new_message { emit_status(:thinking) }
conversation.on_tool_call { |tc| emit_status(...) }
conversation.on_tool_result { ... }
conversation.on_end_message { emit_status(:composing) }

5. Tool Builder — Assistant::ChatToolBuilder

Section titled “5. Tool Builder — Assistant::ChatToolBuilder”

File: app/services/assistant/chat_tool_builder.rb

Dynamically builds RubyLLM::Tool subclasses for the chat. No HTTP hop to the MCP server — tools execute in-process.

Tool registration is keyed on tool_services — an array passed from the controller that determines which tool groups are active for this conversation. Non-admins are limited to content + postgres_production; admins also get postgres_versions.

ToolDescription
semantic_searchSemantic vector search across all content types
find_faqsFAQ search with product line filtering
find_call_recordingsPermission-gated call transcript search

PostgreSQL Tools (postgres_production, postgres_versions)

Section titled “PostgreSQL Tools (postgres_production, postgres_versions)”

Built per database service key. Access level depends on role:

ToolAdmin/ManagerEmployee
describe_available_dataYesYes
execute_sqlFullDomain-restricted
list_schemasYesNo
list_objectsYesNo
get_object_detailsYesNo
explain_queryYesNo
ToolDescription
create_blog_postCreate a draft post (body, SEO, tags, preview image)
update_blog_postUpdate fields + atomically replace tags; creates a revision
get_blog_postRead all editable fields, current tags, revision count
insert_imageoEmbed rendered HTML for an image embed
insert_videooEmbed rendered HTML for a video embed (Cloudflare or YouTube)
insert_faqsoEmbed rendered HTML for an FAQ section
insert_productoEmbed rendered HTML for a product card (Liquid tag)
ToolDescription
propose_brain_entryPropose a new learned rule; creates a pending brain entry

File: app/services/assistant/sql_broker.rb

Centralized execution layer for all AI-generated SQL.

Enforcement chain:

  1. Read-only check: Only SELECT, SHOW, WITH, EXPLAIN, SET are allowed
  2. SQL normalization: Strip comments, collapse whitespace
  3. Object extraction: PgQuery.parse(sql).tables via AST
  4. Object-level access: DataPolicy.object_allowed?() checks against ALWAYS_BLOCKED_OBJECTS + domain-resolved allowed_objects
  5. Query execution: Read-only transaction with statement timeout (8s)
  6. Column-level redaction: PII columns redacted from results
  7. Row cap: Truncates to 50 rows; adds truncation notice
  8. Audit logging: Every execution logged to assistant_sql_audit_logs

Same domain system as described in the Analytics section:

  • File: config/analytics/data_domains.yml
  • 5 business domains: sales, financial, workforce, operations, marketing
  • Role mappings: sales_rep, sales_manager, accounting_manager, marketing_manager, etc.
  • Database tables/views declare their domain(s) in db/comments/*.yml manifests
  • Admins get nil (unrestricted); DataDomainPolicy resolves roles → domains → allowed objects

8. Context Compactor — Assistant::ContextCompactor

Section titled “8. Context Compactor — Assistant::ContextCompactor”

File: app/services/assistant/context_compactor.rb

Sliding-window context management for long conversations:

  • When token count exceeds the threshold, ensure_context_summary! generates (or returns a cached) summary of older messages
  • Summary injected as synthetic user + assistant messages in to_llm (after system messages)
  • Only recent messages (after compaction_through_message_id) are sent verbatim
  • Summary and cutoff ID are cached in AssistantConversation’s JSONB metadata

9. Cost Calculator — Assistant::CostCalculator

Section titled “9. Cost Calculator — Assistant::CostCalculator”

File: app/services/assistant/cost_calculator.rb

Computes per-turn and per-conversation cost from token counts and model pricing:

  • Looks up per-token price from ChatService::MODELS
  • Accounts for cached tokens (90% discount) and cache-creation tokens
  • Called by AssistantConversation#computed_total_cost and sync_token_totals!

10. Response Formatter — Assistant::ResponseFormatter

Section titled “10. Response Formatter — Assistant::ResponseFormatter”

File: app/services/assistant/response_formatter.rb

TransformationDescription
Markdown → HTMLKramdown with GFM + Rouge syntax highlighting
HTML sanitizationSanitize gem strips XSS vectors
SQL collapseSQL blocks wrapped in Bootstrap collapsible accordions
Table enhancementBootstrap tables with cell formatting and export buttons (Copy, CSV, Excel)
Entity linkingOrder numbers, customer IDs, SKUs linked to CRM pages
Source panelSource citations styled distinctively

Files: app/models/assistant_brain_entry.rb, app/controllers/crm/assistant_brain_controller.rb

The Brain is a persistent, CRM-editable knowledge base of rules and facts that are injected into Sunny’s system prompt on every conversation. It replaces hard-coded domain rules in prompt files with database-backed entries that any manager can inspect, approve, or correct without a code deploy.

assistant_brain_entries
id, category, title, rule (text),
scope ('global'|'user'), user_id,
applies_to_services (varchar[]),
status ('active'|'pending'|'inactive'),
source ('manual'|'auto_learned'),
suggested_by_id, approved_by_id,
created_at, updated_at
  • scope: 'global' — shared across all users; managed by managers / sunny_administrator role
  • scope: 'user' — personal preferences for one user; self-managed, never shown to others
  • applies_to_services — contextual filtering: an empty array means “always inject”; a populated array means “only inject when one of these tool services is active in the current conversation” (e.g. ['blog_management'] keeps blog rules out of analytics sessions)

url_rules, product_data, content_rules, analytics, schema_knowledge, general

ChatService#brain_prompt is called during prompt assembly and appends a ## LEARNED RULES section. It uses hybrid retrieval:

  1. Universal entries (applies_to_services: []) — always injected
  2. Service-specific entries — when the total exceeds BRAIN_SEMANTIC_THRESHOLD (40) and a user message is present, pgvector cosine similarity selects the top BRAIN_SEMANTIC_TOP_K (20) most relevant entries from candidates; otherwise all are injected
  3. Entries are sorted by category then title before injection

Each AssistantBrainEntry includes Models::Embeddable. When title or rule changes, an Events::ContentEmbeddingRequired event is published to the Rails Event Store. Embedding::ContentChangedHandler (async handler) dequeues an EmbeddingWorker job on the ai_embeddings queue, which calls OpenAI text-embedding-3-small and stores a 1536-dimension vector in the content_embeddings_assistant_brain_entries partition (PostgreSQL list partition of content_embeddings keyed on embeddable_type = 'AssistantBrainEntry').

The partition has:

  • A UNIQUE index on (embeddable_id, content_type, locale)
  • An HNSW index with vector_cosine_ops for fast approximate nearest-neighbor search

Sunny can propose new rules after user corrections by calling the propose_brain_entry tool (registered in ChatToolBuilder). Proposed entries are created with status: 'pending'. Managers see a Pending Approval panel in the Brain CRM page and can approve or reject with a single click.

GET /assistant_brain — lists:

  • Global active entries (grouped by category) — visible to all users with access
  • Pending entries awaiting approval — sunny_admin only
  • Inactive entries — sunny_admin only
  • Current user’s personal entries

Access control uses CanCan:

# ability.rb — inside is_manager? block
can :manage, AssistantBrainEntry
# Plus standalone for non-managers with explicit role
can :manage, AssistantBrainEntry if account.has_role?('sunny_administrator', admin_check: false)

The sunny_administrator role (created in 20260227140000_add_sunny_administrator_role) can be granted to non-managers who need brain management access (e.g. content specialists).


File: app/services/assistant/blog_tool_builder.rb

The blog_management tool service provides Sunny with 8 tools for creating and maintaining blog posts.

ToolDescription
create_blog_postCreate a draft post with HTML body, SEO fields, tags, preview image
update_blog_postUpdate any post field; creates a revision; replaces tags atomically
get_blog_postRead all editable fields + current tags, revisions, effective meta description
insert_imageFetch rendered oEmbed HTML for an image embed
insert_videoFetch rendered oEmbed HTML for a video embed
insert_faqsFetch rendered oEmbed HTML for an FAQ section
insert_productFetch rendered oEmbed HTML for a product card
propose_brain_entryPropose a new brain rule (available in all services)

Blog posts use two distinct tag types, both applied via the tags: parameter:

Tier 1 — Page-placement tags (public: false): internal tags that make a post appear in the content section of a specific landing page. Format: for-{page-path-parameterized}-page. Used by PagesHelper#page_posts, page_videos, page_showcases, etc. Never shown to visitors (filtered by BlogHelper#tags_with_links).

Examples: for-towel-warmer-page, for-towel-warmer-matte-black-page, for-floor-heating-bathroom-page

Tier 2 — Public navigation tags (public: true): drive the blog tag cloud at /posts/{tag}/tag. 22 tags in production: towel-warmers, installation, indoor-heating, snow-melting, design-trends, etc.

The tags: parameter in update_blog_post replaces the full tag set (clear + add). Always call get_blog_post first to read existing tags before updating. Only tags that already exist in the database are applied — unrecognised names are silently skipped (assign_tags never creates new tags).

Before calling update_blog_post or create_blog_post with any HTML containing <a> elements, Sunny must validate every link using fetch_url. Links that return 404, redirect loops, or error pages must be corrected before the post is saved. This is enforced by a Brain rule (Validate All Links Before Saving a Blog Post) injected when blog_management + web_fetch are active.

Style Guide (BlogToolBuilder::STYLE_GUIDE)

Section titled “Style Guide (BlogToolBuilder::STYLE_GUIDE)”

A condensed style guide is injected into the create_blog_post and update_blog_post tool descriptions, covering headings (H2+), callouts, feature boxes, sidebars, tables, Liquid tags, Liquid variables, internal link patterns (always use {{ locale }}), and the oEmbed embed workflow.


Files: app/concerns/models/embeddable.rb, app/subscribers/embedding/content_changed_handler.rb, app/workers/embedding_worker.rb

Included by any model that needs semantic search. Provides:

  • content_for_embedding(content_type) — override to define what text is embedded
  • embedding_content_changed? — override to define which attribute changes trigger re-embedding
  • queue_embedding_generation — publishes Events::ContentEmbeddingRequired to the event store

The concern uses the Rails Event Store pub/sub system rather than direct worker enqueue, ensuring every embedding regeneration trigger has a permanent, auditable record in event_store_events.

Model#save → after_commit → queue_embedding_generation
→ event_store.publish(Events::ContentEmbeddingRequired, data: {type, id})
→ Embedding::ContentChangedHandler (AsyncJob, ai_embeddings queue)
→ EmbeddingWorker.perform_async(type, id)
→ OpenAI text-embedding-3-small API
→ ContentEmbedding.upsert(embeddable, vector)

The content_embeddings parent table is list-partitioned by embeddable_type. Each model that uses embeddings has its own partition created via pg_party’s create_list_partition_of. Per-partition indexes:

  • UNIQUE on (embeddable_id, content_type, locale)
  • HNSW (vector_cosine_ops) for cosine ANN search

pgvector HNSW indexes cannot be placed on partitioned parent tables; they must be on each partition individually.

ModelPartitionEmbedding content
Article / Post / Faqcontent_embeddings_articlesSubject + solution body
Videocontent_embeddings_videosTitle + description + transcript
Showcasecontent_embeddings_showcasesTitle + description
AssistantBrainEntrycontent_embeddings_assistant_brain_entriesTitle + rule text

Table: assistant_conversations

id, user_id, title, messages (jsonb, legacy), metadata (jsonb), llm_model_id,
processing_by_id, processing_since, timestamps
  • acts_as_chat from RubyLLM — provides ask(), with_model(), with_tools(), with_instructions(), with_thinking()
  • Messages auto-persist to assistant_messages table
  • metadata stores via jsonb_accessor: token totals, cost, queries, compaction summary, tool services
  • with_instructions override fixes RubyLLM v1.11.0 bug where Content::Raw objects were serialized as #<RubyLLM::Content::Raw:0x...> instead of their content
  • to_llm override: eager-loads to prevent N+1 queries, integrates context compaction, filters empty/orphaned/duplicate messages
  • Processing lock via processing_by_id + processing_since columns (5-minute staleness threshold)

Table: assistant_messages

id, assistant_conversation_id, role, content, content_raw (json),
input_tokens, output_tokens, cached_tokens, cache_creation_tokens,
thinking_text, thinking_tokens, thinking_signature,
llm_model_id, assistant_tool_call_id, timestamps

Table: assistant_tool_calls

id, assistant_message_id, tool_call_id, name, arguments (jsonb), timestamps

Table: assistant_conversation_shares

id, assistant_conversation_id, shared_with_type, shared_with_id, access_level, timestamps

Access levels: viewer (read-only), collaborator (can send messages).

Table: assistant_brain_entries

id, category, title, rule (text),
scope ('global'|'user'), user_id,
applies_to_services (varchar[]),
status ('active'|'pending'|'inactive'),
source ('manual'|'auto_learned'),
suggested_by_id, approved_by_id,
created_at, updated_at
  • Includes Models::Auditable (PaperTrail) — all changes tracked in record_versions table
  • Includes Models::Embeddable — rule text vectorised to content_embeddings_assistant_brain_entries partition
  • Scopes: .active, .pending, .global, .for_user(user_id), .for_services(service_keys)

Table: content_embeddings (list-partitioned by embeddable_type)

id, embeddable_type, embeddable_id, content_type, locale, embedding (vector(1536)), timestamps

Used by: AssistantBrainEntry, Article/Post/Faq, Video, Showcase. Cosine ANN search via has_neighbors from the neighbor gem.

Stimulus Controller — assistant_chat_controller.js

Section titled “Stimulus Controller — assistant_chat_controller.js”
TargetDescription
messagesChat message container (auto-scroll target)
inputText input field
submitButtonSend button (loading state management)
placeholderEmpty-state placeholder
modelSelectModel dropdown selector

Features:

  • MutationObserver: Watches messages target; auto-scrolls on new content
  • Completion detection: Scans for [data-chat-complete="true"] marker to re-enable input
  • Double-submit prevention: isProcessing flag blocks re-submission while worker runs
  1. User submits → Turbo handles POST /assistant/ask
  2. Controller responds with ask.turbo_stream.erb → appends user message bubble + processing indicator
  3. Worker broadcasts via Turbo::StreamsChannel:
    • broadcast_replace_to → live preview with gray monospace text + blinking caret
    • Throttled to ~8fps for smooth streaming
  4. On completion: Worker broadcasts final rendered HTML replacing the preview
  5. Stimulus detects [data-chat-complete] → re-enables input

Stream name: assistant_chat:{conversation_id}

Model IDs — config/initializers/ai_model_constants.rb

Section titled “Model IDs — config/initializers/ai_model_constants.rb”

AiModelConstants is the canonical registry for every model ID. LlmDefaults is a thin backward-compatibility shim that delegates to it:

module LlmDefaults
DEFAULT_SONNET_MODEL = AiModelConstants.id(:anthropic_sonnet) # claude-sonnet-4-6
DEFAULT_OPUS_MODEL = AiModelConstants.id(:anthropic_opus) # claude-opus-4-8
DEFAULT_HAIKU_MODEL = AiModelConstants.id(:anthropic_haiku) # claude-haiku-4-5-20251001
end

All model references in ChatService::MODELS use these constants. Never hardcode model IDs elsewhere. WeeklyLlmModelSyncWorker syncs the live provider registry weekly and emails a report when a pinned default has fallen behind.

ProviderEnv Variable
AnthropicANTHROPIC_API_KEY
OpenAIOPENAI_API_KEY
GoogleGEMINI_API_KEY
Service KeyLabelConnection Class
postgres_productionApp DBActiveRecord::Base
postgres_versionsVersions DBVersionsDb::Base
FilePurpose
app/agents/assistant/sunny_agent.rbRubyLLM Agent: factory for AssistantConversation records
app/prompts/assistant/sunny_agent/instructions.txt.erbBase system prompt template (identity, date macros, guidelines)
app/controllers/crm/assistant_chat_controller.rbHTTP entry point, user context, tool service selection
app/controllers/crm/assistant_brain_controller.rbCRM CRUD for brain entries; access-gated via CanCan
app/workers/assistant_chat_worker.rbSidekiq job: LLM loop, streaming, response formatting
app/services/assistant/chat_service.rbModel selection, prompt assembly (incl. brain_prompt), thinking, tool registration
app/services/assistant/chat_tool_builder.rbBuilds RubyLLM::Tool instances (content, DB, blog, brain tools)
app/services/assistant/blog_tool_builder.rbBlog create/update/get/embed tools + tagging helpers
app/services/assistant/sql_broker.rbSQL execution: read-only, access control, redaction, audit
app/services/assistant/data_policy.rbObject + column access rules, PII redaction
app/services/assistant/data_domain_policy.rbRole → domain → allowed objects resolution
app/services/assistant/comment_manifest.rbYAML manifest reader: domains, restricted columns, comments
app/services/assistant/tool_loop_guard.rbPrevents repetitive tool call loops
app/services/assistant/context_compactor.rbSliding-window conversation summarisation
app/services/assistant/cost_calculator.rbPer-turn and per-conversation cost from token counts
app/services/assistant/response_formatter.rbMarkdown → HTML with tables, SQL collapse, entity links
app/models/assistant_conversation.rbConversation record (acts_as_chat) with token tracking
app/models/assistant_message.rbPer-message record (acts_as_message) with token tracking
app/models/assistant_tool_call.rbTool invocation record (acts_as_tool_call)
app/models/assistant_conversation_share.rbConversation sharing (viewer / collaborator)
app/models/assistant_brain_entry.rbLearned rules model; embeddable + auditable
app/models/content_embedding.rbPolymorphic vector store (has_neighbors); partitioned by type
app/concerns/models/embeddable.rbConcern: queue_embedding_generation via Rails Event Store
app/events/events/content_embedding_required.rbPub/sub event: signals an embedding regeneration is needed
app/subscribers/embedding/content_changed_handler.rbAsync handler: enqueues EmbeddingWorker on ai_embeddings queue
app/workers/embedding_worker.rbCalls OpenAI text-embedding-3-small; upserts to content_embeddings
config/initializers/llm_defaults.rbCanonical Anthropic model ID constants
config/analytics/data_domains.ymlDomain descriptions + role mappings
db/comments/*.ymlPer-table manifests with domain, comments, restricted flags
db/data/brain_entry_embeddings_seed.jsonPre-computed embeddings for initial brain entries (avoids API costs on deploy)

All CRM accounts. Tool services available vary by role:

Tool serviceNon-adminAdmin / Manager
contentYesYes
postgres_productionDomain-restrictedFull
postgres_versions (audit DB)NoYes
blog_managementNoYes

The can :manage, AssistantBrainEntry CanCan ability is granted to:

  1. Any account where is_manager? returns true (inside the shared manager ability block in ability.rb)
  2. Any account with the sunny_administrator role (can be assigned to non-managers, e.g. content specialists)

All CRM views (index.html.erb, histories.html.erb) use can?(:manage, AssistantBrainEntry) to conditionally render the “Sunny Brain” button. The controller uses the same ability check via require_sunny_admin!.


Test FileCoverage
test/agents/assistant/sunny_agent_create_test.rbcreate! returns persisted conversation; no llm_model; no system messages at factory time; title defaults and custom title
test/agents/assistant/sunny_agent_find_test.rbfind returns existing conversation without mutating it; no llm_model; no system messages
test/integration/sunny_blog_editor_end_to_end_test.rbJulia-style blog tool flows (conv 1565); assertion helpers in test/support/sunny_blog_editor_julia_flow_helpers.rb
test/models/assistant_conversation_test.rbto_llm nil guard, track_query!, processing lock, scopes
test/models/assistant_brain_entry_test.rbScopes, status transitions, for_services filtering
test/services/assistant/chat_service_test.rbMODELS registry, auto_select_model, estimate_tokens, base prompt, brain prompt injection
test/workers/assistant_chat_worker_test.rbbroadcast_error mappings, graceful error handling
test/initializers/llm_defaults_test.rbModel IDs valid, Anthropic pattern, no future dates, RubyLLM resolution

Run AI assistant tests:

Terminal window
mise exec -- bin/rails test test/agents/ \
test/models/assistant_conversation_test.rb \
test/models/assistant_brain_entry_test.rb \
test/services/assistant/chat_service_test.rb \
test/workers/assistant_chat_worker_test.rb \
test/initializers/llm_defaults_test.rb

MigrationDescription
20260227100000_create_assistant_brain_entriesCreates assistant_brain_entries table
20260227110000_add_scope_to_brain_entriesAdds scope, user_id, applies_to_services columns
20260227140000_add_sunny_administrator_roleCreates sunny_administrator role in roles table
20260228100001_seed_assistant_brain_entriesSeeds 14 initial brain entries (stub class pattern)
20260228160000_add_brain_entry_embeddingsCreates content_embeddings partitioned table + AssistantBrainEntry partition; seeds pre-computed vectors
20260228170000_seed_brain_entry_embeddingsInserts pre-computed embedding vectors for the 14 initial entries
20260228190000_add_blog_tagging_brain_entrySeeds two-tier tagging brain entry; queues embedding via after_commit

Migration pattern: Data migrations that create AssistantBrainEntry records during a deploy sequence must use a local stub class (class BrainEntry < ApplicationRecord; self.table_name = 'assistant_brain_entries'; end) if the migration runs before all schema migrations are applied. This avoids NoMethodError from model validations referencing not-yet-existing columns. The final tagging entry (20260228190000) intentionally uses the real model so the after_commit embedding callback fires.