Class: Assistant::ToolLoopGuard

Inherits:
Object
  • Object
show all
Includes:
ResponseMessages
Defined in:
app/services/assistant/tool_loop_guard.rb,
app/services/assistant/tool_loop_guard/response_messages.rb

Overview

Wraps RubyLLM tools with lightweight guardrails to prevent repetitive
tool loops and force recovery steps when SQL calls repeatedly fail.

Guards:

  1. Identical call dedup: blocks the same tool+args after MAX_IDENTICAL_CALLS.
  2. Consecutive SQL failures: forces schema discovery after MAX_CONSECUTIVE_SQL_FAILURES.
  3. Total call ceiling: hard-stops all tool calls after the effective limit to
    prevent unbounded recursion in RubyLLM's handle_tool_calls → complete loop.
    Budget scales with plan size: BASE_TOOL_CALLS without a plan, up to
    MAX_PLAN_TOOL_CALLS when a plan is declared (CALLS_PER_PLAN_STEP × steps).

Many orthogonal guardrails intentionally share one lifecycle; splitting files
would obscure the single-turn flow and duplicate shared constants.
rubocop:disable Metrics/ClassLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity

Defined Under Namespace

Modules: ResponseMessages

Constant Summary collapse

MAX_IDENTICAL_CALLS =

Block ANY repeat of the exact same call within a turn

1
MAX_CONSECUTIVE_SQL_FAILURES =

Maximum consecutive sql failures.

3
MAX_CONSECUTIVE_PATCH_FAILURES =

Maximum consecutive patch failures.

3
BASE_TOOL_CALLS =

Default limit without a plan. Was 15 — bumped after a 10-day audit

20
PLAN_STEP_TOOL_CALLS =

showed read-only sales-rep workflows ("review opportunity X, draft a follow-up email") routinely
legitimately consume 15-20 calls per turn (describe ×2 + bundled lookup + activity search +
quote/order pulls + email draft) and were hitting the limit before the model could answer.

40
CALLS_PER_PLAN_STEP =

Per-step budget during server-orchestrated plan execution

20
MAX_PLAN_TOOL_CALLS =

Extra budget granted per declared plan step

200
BUDGET_WARNING_THRESHOLD =

Threshold for budget warning.

5
MAX_TURN_DURATION =

seconds — default wall-clock cap per turn (Flash, GPT, Haiku)

180
MAX_TURN_DURATION_THINKING =

seconds — extended cap for thinking-capable models

360
MAX_TURN_DURATION_AUTHORING =

(Gemini Pro / Claude Sonnet+Opus / GPT-5 reasoning).
The first describe_available_data call alone can take
~60-130s on Pro because of extended thinking, leaving
almost no budget under the default cap.

600
MAX_STEP_DURATION =

turns (blog_management / email_management) on thinking models. These edit large HTML bodies
with many media/embed/product tool calls plus extended thinking, and legitimately ran ~420-450s
— past the 360s thinking cap — so they were being halted mid-edit (AppSignal #4714 on
edit_email_template). Matches MAX_STEP_DURATION; non-authoring turns keep the 360s telemetry cap.

600
WRITE_TOOLS =

seconds — wall-clock cap per plan step (Gemini Pro needs 60-90s/call)

%w[
  update_blog_post create_blog_post edit_blog_post
  update_email_template create_email_template edit_email_template clone_email_template
  remove_embed replace_embed move_embed
  create_faq insert_faqs refresh_blog_oembeds
].to_set.freeze
PATCHABLE_TOOLS =

Tools that write to a blog post body — participate in the patch-failure
circuit breaker and per-post reread gate (Stage 6).

%w[update_blog_post edit_blog_post].to_set.freeze
READS_INVALIDATED_BY_WRITES =

Read tools whose cached signatures should be cleared after any write tool
executes, because the underlying content has changed and re-reading is valid.
get_block_html / get_email_block_html are included so the model can re-read
a block's exact HTML after editing it (and read a different block later)
without the per-turn dedup guard blocking the call as a repeat.

%w[get_blog_post get_block_html get_email_block_html].freeze
COMPLETION_MARKING_TOOLS =

Tools that mark SEO work as "done" — must not run until writes are verified.

%w[seo_update_action_item seo_batch_update_action_items].to_set.freeze
VERIFICATION_READS =

Read tools that count as verification of a prior write.

%w[get_blog_post].to_set.freeze
WRITES_REQUIRING_VERIFICATION =

Historically patch_blog_post needed a follow-up get_blog_post to verify
because find/replace could silently mismatch. That tool was removed in
Stage 7 of the Sunny blog editor fix plan, so no current write tool
needs post-hoc verification — block-ID ops in edit_blog_post are
self-verifying (each op_result reports applied / not_found / invalid),
update_blog_post replaces the full body, and create_faq / insert_faqs
manipulate separate records.

Set.new.freeze
SELF_VERIFYING_WRITES =

A successful full-body write is authoritative — it makes any prior pending
patch verifications moot because the entire content was replaced.

%w[update_blog_post create_blog_post].to_set.freeze
MAX_CONSECUTIVE_EMPTY_SEARCHES =

Consecutive search-type calls that return zero results before forcing
the model to stop and ask the user for clarification.

3
MAX_CONSECUTIVE_SEO_LOOKUP_FAILURES =

seo_get_page / seo_get_action_items(path:) when the model guesses wrong paths repeatedly

3
SEO_LOOKUP_TOOLS =

Seo lookup tools.

%w[seo_get_page seo_get_action_items].freeze
SEARCH_TOOLS =

Search/lookup tools that are prone to "spiral" loops when the model
can't find what it's looking for and keeps retrying with variations.

%w[
  semantic_search find_images search_images find_faqs find_call_recordings
  search_activity_notes search_brain search_products
  find_support_cases search_support_notes
  find_employee get_team_availability get_pipeline_summary
].to_set.freeze

Instance Method Summary collapse

Constructor Details

#initialize(role:, conversation_id: nil, plan_step_mode: false, cross_step_cache: nil, supports_thinking: false, cancel_check: nil, available_tool_names: nil, described_views: nil, long_form_authoring: false) ⇒ ToolLoopGuard

Returns a new instance of ToolLoopGuard.



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'app/services/assistant/tool_loop_guard.rb', line 105

def initialize(role:, conversation_id: nil, plan_step_mode: false, cross_step_cache: nil,
               supports_thinking: false, cancel_check: nil, available_tool_names: nil,
               described_views: nil, long_form_authoring: false)
  @role = role.to_sym
  @conversation_id = conversation_id
  @plan_step_mode = plan_step_mode
  @cross_step_cache = cross_step_cache
  @supports_thinking = supports_thinking
  # True when the turn loads long-form content-authoring tools (blog/email).
  # Grants a longer wall-clock cap (MAX_TURN_DURATION_AUTHORING) so large
  # template/body edits finish instead of being halted mid-edit (#4714).
  @long_form_authoring = long_form_authoring
  # Optional Proc returning truthy when the user has requested cancellation.
  # Checked in preflight so the guard halts mid-tool-loop, not just at
  # plan-step boundaries — see PlanOrchestrator and AssistantChatWorker.
  @cancel_check = cancel_check
  # Names of every tool registered for this turn. Used by response messages
  # so the budget-exhaustion / hard-stop guidance only suggests a write
  # tool when one is actually loaded — read-only sales workflows used to
  # be told "use a write op" with no write op available, which the model
  # acted on by ending mid-task.
  @available_tool_names = Array(available_tool_names).map(&:to_s).freeze
  @call_counts = Hash.new(0)
  # Tools whose execute body actually ran (preflight passed and
  # original_execute was invoked). @call_counts is bumped before dedup /
  # recovery short-circuits, so it must not drive plan "services used"
  # carryover — see PR 644 review.
  @executed_tool_names = Set.new
  @consecutive_sql_failures = 0
  @consecutive_patch_failures = 0
  @consecutive_empty_searches = 0
  @cumulative_empty_searches = 0
  @cumulative_sql_errors = 0
  @cumulative_patch_errors = 0
  @cross_step_cache_hits = 0
  @total_calls = 0
  @final_write_used = false
  @limit_reached = false
  @cancelled = false
  @writes_pending_verification = 0
  @plan_declared = false
  @plan_step_count = 0
  @plan_nudge_sent = false
  @consecutive_seo_lookup_failures = 0
  # Stage 6 of the Sunny blog editor fix plan: per-post reread breaker.
  # Any failed patch_blog_post / edit_blog_post sets the post_id here
  # so the next write to that post is blocked until get_blog_post(post_id)
  # is called. Cleared by the verification read.
  @posts_requiring_reread = Set.new
  # SQL HYGIENE: track which views have been resolved via
  # describe_available_data this turn. Subsequent *_execute_sql
  # calls that reference a view NOT in this set are blocked with a
  # structured error pointing the model at describe first — prevents
  # the column-hallucination loop that PR #720 first tried to address
  # with prompt-only guidance (kept inventing fresh column names like
  # `assigned_resource_name` / `assigned_rep_name`).
  #
  # In plan_step_mode the orchestrator passes a shared Set so a
  # describe call in step 1 still satisfies SQL in step 5 — otherwise
  # the model would be forced to re-describe each step, which is wasteful
  # and itself burns budget.
  @described_views = described_views || Set.new
  @started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
end

Instance Method Details

#apply!(tools) ⇒ Object



238
239
240
241
# File 'app/services/assistant/tool_loop_guard.rb', line 238

def apply!(tools)
  Array(tools).each { |tool| wrap!(tool) }
  tools
end

#cancelled?Boolean

True after the user-cancellation check fired at least once.
Used by PlanOrchestrator to short-circuit remaining steps without
waiting for the next outer cancel_check at the step boundary.

Returns:

  • (Boolean)


193
194
195
# File 'app/services/assistant/tool_loop_guard.rb', line 193

def cancelled?
  @cancelled
end

#effective_limitObject

Dynamic tool call limit.

  • plan_step_mode: server-orchestrated step execution → PLAN_STEP_TOOL_CALLS
  • plan_declared: model declared a plan inline → scales with step count
  • otherwise: BASE_TOOL_CALLS (keeps initial turns fast)


174
175
176
177
178
179
180
181
182
# File 'app/services/assistant/tool_loop_guard.rb', line 174

def effective_limit
  if @plan_step_mode
    PLAN_STEP_TOOL_CALLS
  elsif @plan_declared && @plan_step_count.positive?
    [BASE_TOOL_CALLS, @plan_step_count * CALLS_PER_PLAN_STEP].max.clamp(BASE_TOOL_CALLS, MAX_PLAN_TOOL_CALLS)
  else
    BASE_TOOL_CALLS
  end
end

#limit_reached?Boolean

True after the total call ceiling was hit at least once.
Used by the worker to render a "Continue in new conversation" button.

Returns:

  • (Boolean)


186
187
188
# File 'app/services/assistant/tool_loop_guard.rb', line 186

def limit_reached?
  @limit_reached
end

#loaded_write_tool_namesArray<String>

Names of write tools that are loaded for this turn. Used by response
messages to tailor the budget-exhaustion guidance — read-only sales
workflows shouldn't be told to "use a write tool" when none exist.

Returns:

  • (Array<String>)

    loaded write-tool names (frozen, memoized)



218
219
220
# File 'app/services/assistant/tool_loop_guard.rb', line 218

def loaded_write_tool_names
  @loaded_write_tool_names ||= (@available_tool_names & WRITE_TOOLS.to_a).freeze
end

#max_turn_durationInteger

Wall-clock cap for the current turn. Thinking-capable models (Gemini Pro,
Claude Sonnet+Opus reasoning, etc.) get a longer cap because their first
tool call commonly spends 60-130s on extended thinking. Long-form
content-authoring turns (blog/email) get the longest cap because they edit
large HTML bodies across many tool calls and were being halted mid-edit (#4714).

Returns:



206
207
208
209
210
211
# File 'app/services/assistant/tool_loop_guard.rb', line 206

def max_turn_duration
  return MAX_TURN_DURATION unless @supports_thinking
  return MAX_TURN_DURATION_AUTHORING if @long_form_authoring

  MAX_TURN_DURATION_THINKING
end

#statsObject



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/services/assistant/tool_loop_guard.rb', line 222

def stats
  tool_names = @executed_tool_names.to_a
  {
    total_tool_calls: @total_calls,
    tool_call_limit: effective_limit,
    sql_errors: @cumulative_sql_errors,
    patch_errors: @cumulative_patch_errors,
    empty_searches: @cumulative_empty_searches,
    writes_pending: @writes_pending_verification,
    unique_tools: tool_names.size,
    unique_tool_names: tool_names,
    limit_reached: @limit_reached,
    cross_step_cache_hits: @cross_step_cache_hits
  }
end