Class: Assistant::CostCalculator

Inherits:

Object

Object
Assistant::CostCalculator

show all

Defined in:: app/services/assistant/cost_calculator.rb

Overview

Calculates the cost of AI model usage based on token counts and provider pricing.

Pricing is per million tokens (USD), sourced from provider pricing pages.
Cache pricing:
Anthropic: cache_read = 0.1x input, cache_write = 1.25x input
OpenAI: cache_read = 0.25x input
Gemini: provider-published per-model rates (not a single multiplier) —
e.g. 3.5 Flash cache_read $0.15/M, 3.1 Pro $0.20/M

Usage:
Assistant::CostCalculator.cost_for('claude-sonnet', input_tokens: 50_000, output_tokens: 2_000)

=> 0.18

Assistant::CostCalculator.pricing_for('claude-sonnet')

=> { input: 3.00, output: 15.00, cache_read: 0.30, cache_write: 3.75 }

Constant Summary collapse

MODEL_PRICING = Per-million-token pricing (USD). Updated Jun 2026 from provider pricing pages (Gemini 3.5 Flash / Gemini 3.1 Pro). Keys match Assistant::ChatService::MODELS keys.

{
  'claude-haiku'  => { input: 1.00,  output: 5.00,  cache_read: 0.10,   cache_write: 1.25 },
  'claude-sonnet' => { input: 3.00,  output: 15.00, cache_read: 0.30,   cache_write: 3.75 },
  'claude-opus'   => { input: 5.00,  output: 25.00, cache_read: 0.50,   cache_write: 6.25 },
  'gpt-5'         => { input: 1.25,  output: 10.00, cache_read: 0.3125, cache_write: 0.0 },
  'gpt-5.5'       => { input: 5.00,  output: 30.00, cache_read: 0.50,   cache_write: 0.0 },
  'gpt-5-mini'    => { input: 0.25,  output: 2.00,  cache_read: 0.0625, cache_write: 0.0 },
  'gemini-flash'  => { input: 1.50,  output: 9.00,  cache_read: 0.15,  cache_write: 0.0 },
  'gemini-pro'    => { input: 2.00,  output: 12.00, cache_read: 0.20,  cache_write: 0.0 }
}.freeze

Class Method Summary collapse

.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ Float
Calculate the cost (in USD) for a single response given token counts and model key.
.pricing_for(model_key) ⇒ Hash^?
Look up pricing for a model key.

Class Method Details

.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ `Float`

Calculate the cost (in USD) for a single response given token counts and model key.

RubyLLM 1.15 normalized token accounting across providers: input_tokens
now means "standard input only" — prompt cache reads and writes are
reported separately as cached_tokens and cache_creation_tokens. The
three buckets are additive (no subtraction needed).

Parameters:

model_key (String) —
Key from ChatService::MODELS (e.g. 'claude-sonnet')
input_tokens (Integer) —
Standard (non-cached) input tokens
output_tokens (Integer) —
Output tokens
cached_tokens (Integer) (defaults to: 0) —
Tokens served from cache (cache-read rate)
cache_creation_tokens (Integer) (defaults to: 0) —
Tokens written to cache (cache-write rate)

Returns:

(Float) —
Cost in USD

# File 'app/services/assistant/cost_calculator.rb', line 48

def self.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0)
  pricing = MODEL_PRICING[model_key]
  return 0.0 unless pricing

  ((input_tokens / 1_000_000.0) * pricing[:input]) +
    ((output_tokens         / 1_000_000.0) * pricing[:output]) +
    ((cached_tokens         / 1_000_000.0) * pricing[:cache_read]) +
    ((cache_creation_tokens / 1_000_000.0) * pricing[:cache_write])
end

.pricing_for(model_key) ⇒ `Hash`^?

Look up pricing for a model key.