Class: CallRecordProcessing::BulkTranscriptionService

Inherits:
Object
  • Object
show all
Defined in:
app/services/call_record_processing/bulk_transcription_service.rb

Overview

Lightweight transcription service for historical call record backfill.

Uses RubyLLM.transcribe (OpenAI Whisper/GPT-4o-mini-transcribe) instead of
AssemblyAI, providing ~60x cost reduction for bulk historical data:

AssemblyAI: $0.37/minute ($0.06/call with LeMUR)
GPT-4o-mini-transcribe: $0.003/minute (~$0.005/call)

Trade-offs vs AssemblyAI TranscriptionService:

  • No speaker diarization (plain text transcript, no Speaker A/B labels)
  • No LeMUR analysis (no ai_summary, action_items, call_phases, etc.)
  • No PII redaction, custom spelling, or sentiment analysis
  • Good enough for semantic search and keyword discovery on historical calls

The transcript is still useful for:

  • Embedding generation (semantic search over call content)
  • Full-text search (tsvector)
  • Manual review and keyword discovery

Examples:

Transcribe a single call record

service = CallRecordProcessing::BulkTranscriptionService.new(call_record)
result = service.transcribe
# => { status: :success, word_count: 342, model: "gpt-4o-mini-transcribe" }

Force re-transcription with a specific model

service = CallRecordProcessing::BulkTranscriptionService.new(call_record, model: "whisper-1")
result = service.transcribe(force: true)

Constant Summary collapse

MIN_DURATION_SECONDS =
30
MIN_DURATION_SECONDS_VOICEMAIL =
5
DEFAULT_MODEL =
'gpt-4o-mini-transcribe'
MAX_FILE_SIZE =
25.megabytes

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService

Returns a new instance of BulkTranscriptionService.



40
41
42
43
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 40

def initialize(call_record, model: DEFAULT_MODEL)
  @call_record = call_record
  @model = model
end

Instance Attribute Details

#call_recordObject (readonly)

Returns the value of attribute call_record.



38
39
40
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38

def call_record
  @call_record
end

#modelObject (readonly)

Returns the value of attribute model.



38
39
40
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38

def model
  @model
end

Instance Method Details

#transcribe(force: false) ⇒ Object



45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 45

def transcribe(force: false)
  return skip_result(:already_transcribed) if already_transcribed? && !force
  return skip_result(:too_short) if too_short?
  return skip_result(:no_audio) unless has_audio?

  begin
    call_record.update!(transcription_state: :processing)

    temp_file = download_audio
    return error_result(:no_audio_file, 'Could not download audio') unless temp_file
    return error_result(:file_too_large, "#{File.size(temp_file)} bytes exceeds 25MB limit") if File.size(temp_file) > MAX_FILE_SIZE

    transcription = RubyLLM.transcribe(temp_file, model: model, language: 'en')

    transcript_text = transcription.text.to_s.strip
    return error_result(:empty_transcript, 'Transcription returned empty text') if transcript_text.blank?

    call_record.update!(
      transcript: transcript_text,
      transcription_state: :completed,
      transcribed_at: Time.current
    )

    EmbeddingWorker.perform_async('CallRecord', call_record.id)

    {
      status: :success,
      word_count: transcript_text.split.size,
      model: model,
      duration_secs: call_record.duration_secs
    }
  rescue RubyLLM::Error => e
    log_error "RubyLLM error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  rescue StandardError => e
    log_error "Unexpected error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  ensure
    cleanup_temp_file(temp_file) if temp_file
  end
end