Class: Assistant::CostCalculator
- Inherits:
-
Object
- Object
- Assistant::CostCalculator
- Defined in:
- app/services/assistant/cost_calculator.rb
Overview
Calculates the cost of AI model usage based on token counts and provider pricing.
Pricing is per million tokens (USD), sourced from provider pricing pages.
Cache pricing:
Anthropic: cache_read = 0.1x input, cache_write = 1.25x input
OpenAI: cache_read = 0.25x input
Gemini: provider-published per-model rates (not a single multiplier) —
e.g. 3.5 Flash cache_read $0.15/M, 3.1 Pro $0.20/M
Usage:
Assistant::CostCalculator.cost_for('claude-sonnet', input_tokens: 50_000, output_tokens: 2_000)
=> 0.18
Assistant::CostCalculator.pricing_for('claude-sonnet')
=> { input: 3.00, output: 15.00, cache_read: 0.30, cache_write: 3.75 }
Constant Summary collapse
- MODEL_PRICING =
Per-million-token pricing (USD). Updated Jun 2026 from provider pricing
pages (Gemini 3.5 Flash / Gemini 3.1 Pro). Keys match
Assistant::ChatService::MODELS keys. { 'claude-haiku' => { input: 1.00, output: 5.00, cache_read: 0.10, cache_write: 1.25 }, 'claude-sonnet' => { input: 3.00, output: 15.00, cache_read: 0.30, cache_write: 3.75 }, 'claude-opus' => { input: 5.00, output: 25.00, cache_read: 0.50, cache_write: 6.25 }, 'gpt-5' => { input: 1.25, output: 10.00, cache_read: 0.3125, cache_write: 0.0 }, 'gpt-5.5' => { input: 5.00, output: 30.00, cache_read: 0.50, cache_write: 0.0 }, 'gpt-5-mini' => { input: 0.25, output: 2.00, cache_read: 0.0625, cache_write: 0.0 }, 'gemini-flash' => { input: 1.50, output: 9.00, cache_read: 0.15, cache_write: 0.0 }, 'gemini-pro' => { input: 2.00, output: 12.00, cache_read: 0.20, cache_write: 0.0 } }.freeze
Class Method Summary collapse
-
.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ Float
Calculate the cost (in USD) for a single response given token counts and model key.
-
.pricing_for(model_key) ⇒ Hash?
Look up pricing for a model key.
Class Method Details
.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) ⇒ Float
Calculate the cost (in USD) for a single response given token counts and model key.
RubyLLM 1.15 normalized token accounting across providers: input_tokens
now means "standard input only" — prompt cache reads and writes are
reported separately as cached_tokens and cache_creation_tokens. The
three buckets are additive (no subtraction needed).
48 49 50 51 52 53 54 55 56 |
# File 'app/services/assistant/cost_calculator.rb', line 48 def self.cost_for(model_key, input_tokens:, output_tokens:, cached_tokens: 0, cache_creation_tokens: 0) pricing = MODEL_PRICING[model_key] return 0.0 unless pricing ((input_tokens / 1_000_000.0) * pricing[:input]) + ((output_tokens / 1_000_000.0) * pricing[:output]) + ((cached_tokens / 1_000_000.0) * pricing[:cache_read]) + ((cache_creation_tokens / 1_000_000.0) * pricing[:cache_write]) end |
.pricing_for(model_key) ⇒ Hash?
Look up pricing for a model key.
62 63 64 |
# File 'app/services/assistant/cost_calculator.rb', line 62 def self.pricing_for(model_key) MODEL_PRICING[model_key] end |