Class: Assistant::ContextCompactor
- Inherits:
-
Object
- Object
- Assistant::ContextCompactor
- Defined in:
- app/services/assistant/context_compactor.rb
Overview
Two-layer context compaction to keep AI assistant conversations within
token budgets and reduce cost.
Strategy 1 — Ephemeral Tool Result Compaction (post-exchange):
After a complete exchange, replace ALL large tool result messages with
compact summaries. Tool results (blog HTML, SQL output, API JSON) are
only needed verbatim during the exchange that consumed them — once the
response is streamed they are stale. Keeping them verbatim forces every
subsequent API call to re-send the same payload, bloating context rapidly.
This mirrors how Cursor treats file reads: ephemeral per-request injection,
not accumulated history.
Strategy 2 — Sliding Window + Summary (pre-exchange):
When estimated context exceeds a threshold, compress older messages
into a cached summary and keep only recent messages verbatim.
Usage:
After a response completes (in the worker):
Assistant::ContextCompactor.compact_tool_results!(conversation)
Before building the LLM payload (in to_llm):
Assistant::ContextCompactor.ensure_context_summary!(conversation)
Lifecycle hooks (compact / summarize / ensure / fork / emergency_compact)
all share the same conversation scan + tool-call/message classification,
so splitting would scatter knowledge of the message lifecycle. The
exempt-list of schema-discovery suffixes also has to live next to the
truncation routine that consults it.
rubocop:disable Metrics/ClassLength
Constant Summary collapse
- TOOL_RESULT_CHAR_THRESHOLD =
── Thresholds ────────────────────────────────────────────────────
Tool results shorter than this (chars) are left as-is. 1_500- SCHEMA_DISCOVERY_TOOL_SUFFIXES =
Schema-discovery results (column lists, table summaries) are referenced
repeatedly across tool calls in the same turn — Sunny needs to know what
columns exist on view_opportunities / view_activities every time it writes
SQL, not just the first time. Truncating these to 800 chars stripped the
answer from context and led to 290+ guessed-column SQL failures over a
10-day window. Exempt them from both the verbatim-truncation pass and the
repeated-call dedup so the column list stays in scope for the whole turn. %w[ _describe_available_data _get_object_details _list_objects _list_schemas ].freeze
- CONTEXT_TOKEN_THRESHOLD =
When estimated context exceeds this many tokens, trigger sliding-window.
When actual input_tokens from the API are available (preferred over char
estimation), this threshold is applied directly against those real counts.
Kept conservatively low so that tool-definition overhead + mid-flight
tool-call additions don't push the next request over the 200k limit. 50_000- MAX_MESSAGES_BEFORE_SUMMARY =
Also trigger sliding-window when the raw message count crosses this threshold,
regardless of token estimates. Large-context models (Gemini 1M) never hit the
token ceiling naturally, so we need a count-based safety valve. 40 messages ≈
5-8 user turns with tool activity — enough history for coherent continuity. 40- MIN_RECENT_MESSAGES =
Keep at least this many messages verbatim in the recent window.
Ensures the LLM always sees enough context for coherent follow-ups.
6 messages ≈ 2 user/assistant exchanges — sufficient for continuity.
Combined with immediate tool-result compaction, these messages are short. 6- SUMMARIZER_MODEL =
Summarizer model.
AiModelConstants.id(:summarization)
- CHARS_PER_TOKEN =
Rough chars-per-token ratio. Conservative (real ratio is ~3.5 for
English) so we trigger compaction a bit early rather than too late. 4- MAX_FORK_DEPTH =
Maximum number of parent→child forks before we refuse to fork again.
Prevents infinite cascade: conv → cont → cont → cont → … 3- OVERHEAD_TOKENS =
Check whether the conversation needs a sliding-window summary.
If context exceeds the threshold and the cached summary is stale
(or absent), generate a new one.Returns the summary text (or nil if compaction is not needed).
The caller (to_llm) uses this to decide whether to inject the
summary and truncate old messages.param conversation [AssistantConversation]
Estimated token overhead from system prompt and tool schemas that is NOT
reflected in stored message content. Conservative estimate based on typical
Sunny configurations: ~3K system prompt + ~2-15K per tool service. 15_000- DEDUP_STUB =
Collapse repeated identical tool calls across all turns.
When the same tool+args combination appears N times in history (e.g. the model
called get_blog_post(868) in 6 different turns), the context re-sends the same
data on every subsequent API call. This replaces the content of all but the
MOST RECENT result for each unique (tool_name, arguments) signature with a
lightweight stub, dramatically reducing context for chatty read-only tools.Skips messages already stubbed (content starts with "[Already retrieved").
Safe: the latest result is always preserved so the model can reference it. '[Already retrieved earlier — omitted to reduce context]'- MIN_CALLS_TO_DEDUP =
only dedup when ≥ this many identical calls exist
2- TRUNCATION_KEEP_CHARS =
Truncation threshold: keep this many leading chars from the tool result.
Enough to preserve key data points (IDs, titles, status, error messages)
without carrying full blog HTML or SQL result sets into future turns. 800- CONTEXT_OVERFLOW_PATTERNS =
Regex patterns that identify a provider-level context-length error across
Anthropic, OpenAI, and Gemini. RubyLLM raises BadRequestError (400) for all. [ /prompt is too long/i, /maximum context length/i, /context_length_exceeded/i, /exceeds.*context.*window/i, /input.*too long/i, /too many tokens/i, /tokens.*exceed/i, /prompt.*exceeds.*limit/i, /reduce.*size.*message/i ].freeze
Class Method Summary collapse
-
.compact_tool_results!(conversation) ⇒ Object
Summarize large tool-result messages from the most recent exchange.
-
.compaction_cutoff_id(conversation) ⇒ Integer?
Return the message ID through which the cached summary covers.
-
.context_overflow?(error) ⇒ Boolean
Returns true when the error message matches a known context-overflow pattern.
-
.emergency_compact!(conversation, level: 1) ⇒ Boolean
── Strategy 3: Emergency Compaction (context overflow recovery) ──.
- .ensure_context_summary!(conversation) ⇒ Object
-
.fork_continuation!(conversation, pending_user_message: nil, tool_services: [], summarize: true) ⇒ AssistantConversation?
── Strategy 4: Conversation Fork (manual / power-user) ───────.
-
.fork_from_message!(conversation, message_id:, summarize: true) ⇒ Array(AssistantConversation, String)?
Fork a new conversation branching from a specific user message.
-
.generate_parent_summary!(conversation) ⇒ String?
Generate and persist the context summary for a forked conversation whose summary was deferred at fork time (fork_from_message! with summarize: false).
-
.schema_discovery_tool?(tool_name) ⇒ Boolean
Whether a tool name belongs to the schema-discovery family.
Class Method Details
.compact_tool_results!(conversation) ⇒ Object
Summarize large tool-result messages from the most recent exchange.
Called after finalize_response in the worker so the assistant has
already consumed the raw data.
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
# File 'app/services/assistant/context_compactor.rb', line 100 def self.compact_tool_results!(conversation) = large_tool_results(conversation) unless .empty? Rails.logger.info do "[ContextCompactor] Compacting #{.size} tool result(s) " \ "for conversation #{conversation.id}" end .each { |msg| summarize_tool_result!(msg) } end # Deduplicate: when the same tool+args has been called multiple times across # turns, stub out older results so the context doesn't keep re-sending the # same data. The most recent result is preserved — it reflects current state. deduplicate_repeated_tool_calls!(conversation) rescue StandardError => e # Compaction is best-effort — never break the main flow. Rails.logger.warn("[ContextCompactor] Tool result compaction failed: #{e.}") end |
.compaction_cutoff_id(conversation) ⇒ Integer?
Return the message ID through which the cached summary covers.
Used by to_llm to filter old messages.
236 237 238 |
# File 'app/services/assistant/context_compactor.rb', line 236 def self.compaction_cutoff_id(conversation) conversation. end |
.context_overflow?(error) ⇒ Boolean
Returns true when the error message matches a known context-overflow pattern.
697 698 699 700 |
# File 'app/services/assistant/context_compactor.rb', line 697 def self.context_overflow?(error) = error..to_s CONTEXT_OVERFLOW_PATTERNS.any? { |pattern| .match?(pattern) } end |
.emergency_compact!(conversation, level: 1) ⇒ Boolean
── Strategy 3: Emergency Compaction (context overflow recovery) ──
Called when the LLM rejects a request due to context length. Performs
progressively more aggressive compaction and returns true if the context
was reduced (caller should retry the LLM call).
Levels:
- Force-compact all tool results (including small ones from current turn)
and regenerate the sliding-window summary with a tighter recent window. - Nuclear option — summarize everything except the last 2 messages,
compact ALL tool results regardless of size.
434 435 436 437 438 439 440 441 442 443 |
# File 'app/services/assistant/context_compactor.rb', line 434 def self.emergency_compact!(conversation, level: 1) case level when 1 then emergency_compact_level_one!(conversation) when 2 then emergency_compact_level_two!(conversation) else false end rescue StandardError => e Rails.logger.warn("[ContextCompactor] Emergency compaction level #{level} failed: #{e.}") false end |
.ensure_context_summary!(conversation) ⇒ Object
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'app/services/assistant/context_compactor.rb', line 137 def self.ensure_context_summary!(conversation) = (conversation) return nil unless sliding_window_should_summarize?(conversation, ) split_index = sliding_window_split_index() return nil if split_index <= 0 = [split_index - 1][0] # last message included in "old" cached = sliding_window_cached_summary(conversation, ) return cached if cached summary = generate_conversation_summary([0...split_index], conversation: conversation) sliding_window_persist!(conversation, summary, ) summary rescue StandardError => e Rails.logger.warn("[ContextCompactor] Sliding window summary failed: #{e.}") nil end |
.fork_continuation!(conversation, pending_user_message: nil, tool_services: [], summarize: true) ⇒ AssistantConversation?
── Strategy 4: Conversation Fork (manual / power-user) ───────
Creates a new "continuation" conversation with the same owner, carrying a
compact summary of the full prior history as injected context. Used by
manual "Continue with context" and "Branch from here" actions.
NOTE: This is NOT used for automatic context overflow recovery — that is
handled by emergency_compact! + retry. Forking is a deliberate user action.
534 535 536 537 538 539 540 541 542 543 544 545 546 |
# File 'app/services/assistant/context_compactor.rb', line 534 def self.fork_continuation!(conversation, pending_user_message: nil, tool_services: [], summarize: true) return nil if fork_depth_exceeded?(conversation) summary = fork_continuation_summary(conversation, summarize: summarize) continuation = build_fork_record(conversation, suffix: '(cont.)', summary: summary, tool_services: tool_services) continuation.save! log_fork(conversation, continuation, summary: summary, pending: .present?) continuation rescue StandardError => e Rails.logger.error("[ContextCompactor] fork_continuation! failed: #{e.}") nil end |
.fork_from_message!(conversation, message_id:, summarize: true) ⇒ Array(AssistantConversation, String)?
Fork a new conversation branching from a specific user message.
597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 |
# File 'app/services/assistant/context_compactor.rb', line 597 def self.(conversation, message_id:, summarize: true) fork_msg = conversation..where(role: 'user').find_by(id: ) return nil unless fork_msg summary = summarize ? (conversation, fork_msg) : nil fork_convo = build_fork_record(conversation, suffix: '(fork)', summary: summary, tool_services: conversation.tool_services) # Store the fork-point so the worker can summarise the right slice # of the parent conversation when summarize: false is used. fork_convo. = fork_convo..merge('fork_message_id' => fork_msg.id) fork_convo.save! log_fork(conversation, fork_convo, summary: summary) [fork_convo, fork_msg.content.to_s] rescue StandardError => e Rails.logger.error("[ContextCompactor] fork_from_message! failed: #{e.}") nil end |
.generate_parent_summary!(conversation) ⇒ String?
Generate and persist the context summary for a forked conversation whose summary
was deferred at fork time (fork_from_message! with summarize: false).
Reads prior messages from the parent conversation up to the stored fork_message_id,
generates the LLM summary, and persists it directly via a JSONB merge so concurrent
writes (e.g. the AssistantChatWorker token-count sync) cannot clobber it.
Safe to call multiple times — no-op if the summary already exists.
639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 |
# File 'app/services/assistant/context_compactor.rb', line 639 def self.generate_parent_summary!(conversation) return nil if conversation.parent_conversation_summary.present? parent = AssistantConversation.find_by(id: conversation.parent_conversation_id) return nil unless parent prior_msgs = parent_summary_prior_msgs(conversation, parent) return nil if prior_msgs.empty? summary = generate_conversation_summary(prior_msgs, conversation: parent) return nil if summary.blank? persist_parent_summary!(conversation, summary) summary rescue StandardError => e Rails.logger.warn("[ContextCompactor] generate_parent_summary! failed: #{e.}") nil end |
.schema_discovery_tool?(tool_name) ⇒ Boolean
Whether a tool name belongs to the schema-discovery family. Schema
discovery results are exempt from both verbatim-truncation and the
repeated-call dedup so Sunny keeps the column list in scope all turn.
58 59 60 61 |
# File 'app/services/assistant/context_compactor.rb', line 58 def self.schema_discovery_tool?(tool_name) name = tool_name.to_s SCHEMA_DISCOVERY_TOOL_SUFFIXES.any? { |suffix| name.end_with?(suffix) } end |