Class: CallRecordProcessing::BulkTranscriptionService

Inherits:
Object
  • Object
show all
Defined in:
app/services/call_record_processing/bulk_transcription_service.rb

Overview

Lightweight transcription service for historical call record backfill.

Uses RubyLLM.transcribe with Gemini's native audio (generateContent) path
instead of AssemblyAI, providing a large cost reduction for bulk historical
data. Per audio-hour (Gemini bills audio input at 32 tokens/sec ⇒
1,920 tokens/min):

AssemblyAI (+ LeMUR): $0.06/call (premium recent-call pipeline)
gpt-4o-mini-transcribe: $0.18/hour (
$0.003/min) — prior backfill model
gemini-3.1-flash-lite: $0.076/hour (~$0.0013/min) — ~2.3× cheaper, current

The model comes from AiModelConstants.id(:transcription); pass an explicit
model: (e.g. 'gpt-4o-mini-transcribe') to override. The provider is
inferred from the model id, so the same code path serves both Gemini and
OpenAI transcription models.

Trade-offs vs AssemblyAI TranscriptionService:

  • No speaker diarization (plain text transcript, no Speaker A/B labels)
  • No LeMUR analysis (no ai_summary, action_items, call_phases, etc.)
  • No PII redaction, custom spelling, or sentiment analysis
  • Good enough for semantic search and keyword discovery on historical calls

The transcript is still useful for:

  • Embedding generation (semantic search over call content)
  • Full-text search (tsvector)
  • Manual review and keyword discovery

Examples:

Transcribe a single call record

service = CallRecordProcessing::BulkTranscriptionService.new(call_record)
result = service.transcribe
# => { status: :success, word_count: 342, model: "gpt-4o-mini-transcribe" }

Force re-transcription with a specific model

service = CallRecordProcessing::BulkTranscriptionService.new(call_record, model: "whisper-1")
result = service.transcribe(force: true)

Constant Summary collapse

MIN_DURATION_SECONDS =

Minimum duration seconds.

30
MIN_DURATION_SECONDS_VOICEMAIL =

Minimum duration seconds voicemail.

5
DEFAULT_MODEL =

Default model — sourced from the canonical registry.

AiModelConstants.id(:transcription)
MAX_FILE_SIZE_GEMINI =

Maximum audio file size. Gemini transcription inlines the audio as base64
in the request body (RubyLLM's generateContent path), and base64 inflates
by ~33%, so the raw file must stay under Gemini's ~20MB inline-request cap
— 15MB raw ≈ 20MB encoded. OpenAI's transcribe endpoint accepts up to 25MB.

15.megabytes
MAX_FILE_SIZE_OPENAI =
25.megabytes

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService

Returns a new instance of BulkTranscriptionService.



56
57
58
59
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 56

def initialize(call_record, model: DEFAULT_MODEL)
  @call_record = call_record
  @model = model
end

Instance Attribute Details

#call_recordObject (readonly)

Returns the value of attribute call_record.



54
55
56
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 54

def call_record
  @call_record
end

#modelObject (readonly)

Returns the value of attribute model.



54
55
56
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 54

def model
  @model
end

Instance Method Details

#transcribe(force: false) ⇒ Object



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 61

def transcribe(force: false)
  return skip_result(:already_transcribed) if already_transcribed? && !force
  return skip_result(:too_short) if too_short?
  return skip_result(:no_audio) unless has_audio?

  begin
    call_record.update!(transcription_state: :processing)

    temp_file = download_audio
    return error_result(:no_audio_file, 'Could not download audio') unless temp_file
    return error_result(:file_too_large, "#{File.size(temp_file)} bytes exceeds #{max_file_size} limit") if File.size(temp_file) > max_file_size

    transcription = RubyLLM.transcribe(
      temp_file,
      model: model,
      provider: provider,
      assume_model_exists: true,
      language: 'en'
    )

    transcript_text = transcription.text.to_s.strip
    return error_result(:empty_transcript, 'Transcription returned empty text') if transcript_text.blank?

    call_record.update!(
      transcript: transcript_text,
      transcription_state: :completed,
      transcribed_at: Time.current
    )

    EmbeddingWorker.perform_async('CallRecord', call_record.id)

    {
      status: :success,
      word_count: transcript_text.split.size,
      model: model,
      duration_secs: call_record.duration_secs
    }
  rescue RubyLLM::Error => e
    log_error "RubyLLM error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  rescue StandardError => e
    log_error "Unexpected error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  ensure
    cleanup_temp_file(temp_file) if temp_file
  end
end