Module: AssistantConversationTokenTrackable

Extended by:: ActiveSupport::Concern

Included in:: AssistantConversation

Defined in:: app/models/concerns/assistant_conversation_token_trackable.rb

Instance Method Summary collapse

#computed_token_totals ⇒ Object
Compute token totals from assistant_messages (source of truth).
#computed_total_cost ⇒ Object
Compute total cost (USD) for this conversation from per-message token data.
#sync_token_totals! ⇒ Object
Sync aggregate metadata from assistant_messages (call after responses complete).
#total_tokens ⇒ Object
Total tokens used — returns cached metadata totals (synced after each response), falls back to computing from assistant_messages only when metadata is empty.
#track_error! ⇒ Object
Track an error.
#track_query!(model:, input_tokens: 0, output_tokens: 0, response_time: nil, tool_stats: {}) ⇒ Object
Track a completed query with its metrics.

Instance Method Details

#computed_token_totals ⇒ `Object`

Compute token totals from assistant_messages (source of truth).
Returns { input: N, output: N, thinking: N, cached: N, cache_creation: N, total: N }

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 47

def computed_token_totals
  sums = assistant_messages.unscope(:order)
                           .where(role: 'assistant')
                           .pick(
                             Arel.sql('COALESCE(SUM(input_tokens), 0)'),
                             Arel.sql('COALESCE(SUM(output_tokens), 0)'),
                             Arel.sql('COALESCE(SUM(thinking_tokens), 0)'),
                             Arel.sql('COALESCE(SUM(cached_tokens), 0)'),
                             Arel.sql('COALESCE(SUM(cache_creation_tokens), 0)')
                           ) || [0, 0, 0, 0, 0]
  {
    input: sums[0].to_i,
    output: sums[1].to_i,
    thinking: sums[2].to_i,
    cached: sums[3].to_i,
    cache_creation: sums[4].to_i,
    total: sums[0].to_i + sums[1].to_i
  }
end

#computed_total_cost ⇒ `Object`

Compute total cost (USD) for this conversation from per-message token data.
Uses each message's associated LlmModel to look up the correct pricing.
Falls back to the conversation-level model when a message has no model association.

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 70

def computed_total_cost
  # Build a model_id → model_key lookup from ChatService::MODELS
  model_id_to_key = Assistant::ChatService::MODELS.transform_values { |v| v[:id] }.invert

  assistant_messages
    .where(role: 'assistant')
    .includes(:llm_model)
    .sum do |message|
      model_key = if message.llm_model
                    model_id_to_key[message.llm_model.model_id] || llm_model_name
                  else
                    llm_model_name
                  end

      Assistant::CostCalculator.cost_for(
        model_key,
        input_tokens: message.input_tokens || 0,
        output_tokens: message.output_tokens || 0,
        cached_tokens: message.cached_tokens || 0,
        cache_creation_tokens: message.cache_creation_tokens || 0
      )
    end
end

#sync_token_totals! ⇒ `Object`

Sync aggregate metadata from assistant_messages (call after responses complete).
Fixes the issue where track_query! only captures last-chunk tokens.
Also computes and caches total cost for fast sidebar display.

Uses a database-level JSONB merge (metadata || patch) instead of a
Ruby-side Hash#merge so that keys written by concurrent callers — most
importantly compaction_summary set by ContextCompactor — are never
clobbered by a stale in-memory copy of metadata.

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 102

def sync_token_totals!
  totals = computed_token_totals
  cost = computed_total_cost
  patch = {
    'total_input_tokens' => totals[:input],
    'total_output_tokens' => totals[:output],
    'total_cost_cents' => cost
  }.to_json

  self.class.where(id: id).update_all(["metadata = metadata || ?::jsonb", patch])
  reload
end

#total_tokens ⇒ `Object`

Total tokens used — returns cached metadata totals (synced after each response),
falls back to computing from assistant_messages only when metadata is empty.
This avoids N+1 queries when displaying token counts in conversation lists.

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 38

def total_tokens
  cached = (total_input_tokens || 0) + (total_output_tokens || 0)
  return cached if cached.positive?

  computed_token_totals[:total]
end

#track_error! ⇒ `Object`

Track an error

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 30

def track_error!
  self.error_count = (error_count || 0) + 1
  save!
end

#track_query!(model:, input_tokens: 0, output_tokens: 0, response_time: nil, tool_stats: {}) ⇒ `Object`

Track a completed query with its metrics

# File 'app/models/concerns/assistant_conversation_token_trackable.rb', line 7

def track_query!(model:, input_tokens: 0, output_tokens: 0, response_time: nil, tool_stats: {})
  self.llm_model_name = model
  self.total_input_tokens = (total_input_tokens || 0) + input_tokens
  self.total_output_tokens = (total_output_tokens || 0) + output_tokens
  self.total_queries = (total_queries || 0) + 1
  self.last_query_at = Time.current

  if response_time
    current_avg = average_response_time || 0
    current_count = (total_queries || 1) - 1
    self.average_response_time = ((current_avg * current_count) + response_time) / total_queries
  end

  if tool_stats.present?
    self.total_tool_calls = (total_tool_calls || 0) + (tool_stats[:total_tool_calls] || 0)
    self.total_tool_errors = (total_tool_errors || 0) +
      (tool_stats[:sql_errors] || 0) + (tool_stats[:patch_errors] || 0)
  end

  save!
end

Module: AssistantConversationTokenTrackable

Instance Method Summary collapse

Instance Method Details

#computed_token_totals ⇒ Object

#computed_total_cost ⇒ Object

#sync_token_totals! ⇒ Object

#total_tokens ⇒ Object