Class: Assistant::CostCalculator

Inherits:
Object
  • Object
show all
Defined in:
app/services/assistant/cost_calculator.rb

Overview

Calculates the cost of AI model usage based on token counts and provider pricing.

Pricing is per million tokens (USD), sourced from provider pricing pages.
Cache pricing uses multipliers on the base input price:
Anthropic: cache_read = 0.1x, cache_write = 1.25x
OpenAI: cache_read = 0.25x
Gemini: cache_read = 0.25x

Usage:
Assistant::CostCalculator.cost_for('claude-sonnet', input_tokens: 50_000, output_tokens: 2_000)

=> 0.18

Assistant::CostCalculator.pricing_for('claude-sonnet')

=> { input: 3.00, output: 15.00, cache_read: 0.30, cache_write: 3.75 }

Constant Summary collapse

MODEL_PRICING =

Per-million-token pricing (USD). Updated Mar 2026 from provider pricing pages.
Keys match Assistant::ChatService::MODELS keys.

{
  'claude-haiku'  => { input: 1.00,  output: 5.00,  cache_read: 0.10,   cache_write: 1.25 },
  'claude-sonnet' => { input: 3.00,  output: 15.00, cache_read: 0.30,   cache_write: 3.75 },
  'claude-opus'   => { input: 5.00,  output: 25.00, cache_read: 0.50,   cache_write: 6.25 },
  'gpt-5'         => { input: 1.25,  output: 10.00, cache_read: 0.3125, cache_write: 0.0 },
  'gpt-5.4'       => { input: 2.50,  output: 20.00, cache_read: 0.625,  cache_write: 0.0 },
  'gpt-5-mini'    => { input: 0.25,  output: 2.00,  cache_read: 0.0625, cache_write: 0.0 },
  'gemini-flash'  => { input: 0.30,  output: 2.50,  cache_read: 0.075,  cache_write: 0.0 },
  'gemini-pro'    => { input: 1.25,  output: 10.00, cache_read: 0.3125, cache_write: 0.0 }
}.freeze

Class Method Summary collapse

Class Method Details

.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ Float

Calculate the cost (in USD) for a single response given token counts and model key.

Parameters:

  • model_key (String)

    Key from ChatService::MODELS (e.g. 'claude-sonnet')

  • input_tokens (Integer)

    Total input tokens

  • output_tokens (Integer)

    Total output tokens

  • cached_tokens (Integer) (defaults to: 0)

    Tokens served from cache (cheaper)

  • cache_creation_tokens (Integer) (defaults to: 0)

    Tokens written to cache (more expensive)

Returns:

  • (Float)

    Cost in USD



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'app/services/assistant/cost_calculator.rb', line 41

def self.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0)
  pricing = MODEL_PRICING[model_key]
  return 0.0 unless pricing

  # Cached tokens are a subset of input tokens billed at the cache_read rate.
  # Cache creation tokens are additional tokens billed at the cache_write rate.
  # Regular input = total input - cached (the non-cached portion at full price).
  regular_input = [input_tokens - cached_tokens, 0].max

  cost = 0.0
  cost += (regular_input / 1_000_000.0) * pricing[:input]
  cost += (output_tokens / 1_000_000.0) * pricing[:output]
  cost += (cached_tokens / 1_000_000.0) * pricing[:cache_read]
  cost += (cache_creation_tokens / 1_000_000.0) * pricing[:cache_write]
  cost
end

.pricing_for(model_key) ⇒ Hash?

Look up pricing for a model key.

Parameters:

  • model_key (String)

Returns:

  • (Hash, nil)

    { input:, output:, cache_read:, cache_write: } or nil



62
63
64
# File 'app/services/assistant/cost_calculator.rb', line 62

def self.pricing_for(model_key)
  MODEL_PRICING[model_key]
end