Class: CallRecordProcessing::BulkTranscriptionService

Inherits:

Object

Object
CallRecordProcessing::BulkTranscriptionService

show all

Defined in:: app/services/call_record_processing/bulk_transcription_service.rb

Overview

Lightweight transcription service for historical call record backfill.

Uses RubyLLM.transcribe with Gemini's native audio (generateContent) path
instead of AssemblyAI, providing a large cost reduction for bulk historical
data. Per audio-hour (Gemini bills audio input at 32 tokens/sec ⇒
1,920 tokens/min):

AssemblyAI (+ LeMUR): $0.06/call (premium recent-call pipeline)
gpt-4o-mini-transcribe: $0.18/hour ($0.003/min) — prior backfill model
gemini-3.1-flash-lite: $0.076/hour (~$0.0013/min) — ~2.3× cheaper, current

The model comes from AiModelConstants.id(:transcription); pass an explicit
model: (e.g. 'gpt-4o-mini-transcribe') to override. The provider is
inferred from the model id, so the same code path serves both Gemini and
OpenAI transcription models.

Trade-offs vs AssemblyAI TranscriptionService:

No speaker diarization (plain text transcript, no Speaker A/B labels)
No LeMUR analysis (no ai_summary, action_items, call_phases, etc.)
No PII redaction, custom spelling, or sentiment analysis
Good enough for semantic search and keyword discovery on historical calls

The transcript is still useful for:

Embedding generation (semantic search over call content)
Full-text search (tsvector)
Manual review and keyword discovery

Examples:

Transcribe a single call record

service = CallRecordProcessing::BulkTranscriptionService.new(call_record)
result = service.transcribe
# => { status: :success, word_count: 342, model: "gpt-4o-mini-transcribe" }

Force re-transcription with a specific model

service = CallRecordProcessing::BulkTranscriptionService.new(call_record, model: "whisper-1")
result = service.transcribe(force: true)

Constant Summary collapse

MIN_DURATION_SECONDS = Minimum duration seconds.

MIN_DURATION_SECONDS_VOICEMAIL = Minimum duration seconds voicemail.

DEFAULT_MODEL = Default model — sourced from the canonical registry.

AiModelConstants.id(:transcription)

MAX_FILE_SIZE_GEMINI = Maximum audio file size. Gemini transcription inlines the audio as base64 in the request body (RubyLLM's generateContent path), and base64 inflates by ~33%, so the raw file must stay under Gemini's ~20MB inline-request cap — 15MB raw ≈ 20MB encoded. OpenAI's transcribe endpoint accepts up to 25MB.

15.megabytes

MAX_FILE_SIZE_OPENAI =

25.megabytes

Instance Attribute Summary collapse

#call_record ⇒ Object readonly
Returns the value of attribute call_record.
#model ⇒ Object readonly
Returns the value of attribute model.

Instance Method Summary collapse

#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService constructor
A new instance of BulkTranscriptionService.
#transcribe(force: false) ⇒ Object

Constructor Details

#initialize(call_record, model: DEFAULT_MODEL) ⇒ `BulkTranscriptionService`

Returns a new instance of BulkTranscriptionService.

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 56

def initialize(call_record, model: DEFAULT_MODEL)
  @call_record = call_record
  @model = model
end

Instance Attribute Details

#call_record ⇒ `Object` (readonly)

Returns the value of attribute call_record.



54
55
56

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 54

def call_record
  @call_record
end

#model ⇒ `Object` (readonly)

Returns the value of attribute model.



54
55
56

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 54

def model
  @model
end

Instance Method Details

#transcribe(force: false) ⇒ `Object`

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 61

def transcribe(force: false)
  return skip_result(:already_transcribed) if already_transcribed? && !force
  return skip_result(:too_short) if too_short?
  return skip_result(:no_audio) unless has_audio?

  begin
    call_record.update!(transcription_state: :processing)

    temp_file = download_audio
    return error_result(:no_audio_file, 'Could not download audio') unless temp_file
    return error_result(:file_too_large, "#{File.size(temp_file)} bytes exceeds #{max_file_size} limit") if File.size(temp_file) > max_file_size

    transcription = RubyLLM.transcribe(
      temp_file,
      model: model,
      provider: provider,
      assume_model_exists: true,
      language: 'en'
    )

    transcript_text = transcription.text.to_s.strip
    return error_result(:empty_transcript, 'Transcription returned empty text') if transcript_text.blank?

    call_record.update!(
      transcript: transcript_text,
      transcription_state: :completed,
      transcribed_at: Time.current
    )

    EmbeddingWorker.perform_async('CallRecord', call_record.id)

    {
      status: :success,
      word_count: transcript_text.split.size,
      model: model,
      duration_secs: call_record.duration_secs
    }
  rescue RubyLLM::Error => e
    log_error "RubyLLM error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  rescue StandardError => e
    log_error "Unexpected error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  ensure
    cleanup_temp_file(temp_file) if temp_file
  end
end