Class: Assistant::CostCalculator

Inherits:
Object
  • Object
show all
Defined in:
app/services/assistant/cost_calculator.rb

Overview

Calculates the cost of AI model usage based on token counts and provider pricing.

Pricing is per million tokens (USD), sourced from provider pricing pages.
Cache pricing:
Anthropic: cache_read = 0.1x input, cache_write = 1.25x input
OpenAI: cache_read = 0.25x input
Gemini: provider-published per-model rates (not a single multiplier) —
e.g. 3.5 Flash cache_read $0.15/M, 3.1 Pro $0.20/M

Usage:
Assistant::CostCalculator.cost_for('claude-sonnet', input_tokens: 50_000, output_tokens: 2_000)

=> 0.18

Assistant::CostCalculator.pricing_for('claude-sonnet')

=> { input: 3.00, output: 15.00, cache_read: 0.30, cache_write: 3.75 }

Constant Summary collapse

MODEL_PRICING =

Per-million-token pricing (USD). Updated Jun 2026 from provider pricing
pages (Gemini 3.5 Flash / Gemini 3.1 Pro). Keys match
Assistant::ChatService::MODELS keys.

{
  'claude-haiku'  => { input: 1.00,  output: 5.00,  cache_read: 0.10,   cache_write: 1.25 },
  'claude-sonnet' => { input: 3.00,  output: 15.00, cache_read: 0.30,   cache_write: 3.75 },
  'claude-opus'   => { input: 5.00,  output: 25.00, cache_read: 0.50,   cache_write: 6.25 },
  'gpt-5'         => { input: 1.25,  output: 10.00, cache_read: 0.3125, cache_write: 0.0 },
  'gpt-5.5'       => { input: 5.00,  output: 30.00, cache_read: 0.50,   cache_write: 0.0 },
  'gpt-5-mini'    => { input: 0.25,  output: 2.00,  cache_read: 0.0625, cache_write: 0.0 },
  'gemini-flash'  => { input: 1.50,  output: 9.00,  cache_read: 0.15,  cache_write: 0.0 },
  'gemini-pro'    => { input: 2.00,  output: 12.00, cache_read: 0.20,  cache_write: 0.0 }
}.freeze

Class Method Summary collapse

Class Method Details

.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ Float

Calculate the cost (in USD) for a single response given token counts and model key.

RubyLLM 1.15 normalized token accounting across providers: input_tokens
now means "standard input only" — prompt cache reads and writes are
reported separately as cached_tokens and cache_creation_tokens. The
three buckets are additive (no subtraction needed).

Parameters:

  • model_key (String)

    Key from ChatService::MODELS (e.g. 'claude-sonnet')

  • input_tokens (Integer)

    Standard (non-cached) input tokens

  • output_tokens (Integer)

    Output tokens

  • cached_tokens (Integer) (defaults to: 0)

    Tokens served from cache (cache-read rate)

  • cache_creation_tokens (Integer) (defaults to: 0)

    Tokens written to cache (cache-write rate)

Returns:

  • (Float)

    Cost in USD



48
49
50
51
52
53
54
55
56
# File 'app/services/assistant/cost_calculator.rb', line 48

def self.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0)
  pricing = MODEL_PRICING[model_key]
  return 0.0 unless pricing

  ((input_tokens / 1_000_000.0) * pricing[:input]) +
    ((output_tokens         / 1_000_000.0) * pricing[:output]) +
    ((cached_tokens         / 1_000_000.0) * pricing[:cache_read]) +
    ((cache_creation_tokens / 1_000_000.0) * pricing[:cache_write])
end

.pricing_for(model_key) ⇒ Hash?

Look up pricing for a model key.

Parameters:

  • model_key (String)

Returns:

  • (Hash, nil)

    { input:, output:, cache_read:, cache_write: } or nil



62
63
64
# File 'app/services/assistant/cost_calculator.rb', line 62

def self.pricing_for(model_key)
  MODEL_PRICING[model_key]
end