Class: CallRecordProcessing::BulkTranscriptionService

Inherits:

Object

Object
CallRecordProcessing::BulkTranscriptionService

show all

Defined in:: app/services/call_record_processing/bulk_transcription_service.rb

Overview

Lightweight transcription service for historical call record backfill.

Uses RubyLLM.transcribe (OpenAI Whisper/GPT-4o-mini-transcribe) instead of
AssemblyAI, providing ~60x cost reduction for bulk historical data:

AssemblyAI: $0.37/minute ($0.06/call with LeMUR)
GPT-4o-mini-transcribe: $0.003/minute (~$0.005/call)

Trade-offs vs AssemblyAI TranscriptionService:

No speaker diarization (plain text transcript, no Speaker A/B labels)
No LeMUR analysis (no ai_summary, action_items, call_phases, etc.)
No PII redaction, custom spelling, or sentiment analysis
Good enough for semantic search and keyword discovery on historical calls

The transcript is still useful for:

Embedding generation (semantic search over call content)
Full-text search (tsvector)
Manual review and keyword discovery

Examples:

Transcribe a single call record

service = CallRecordProcessing::BulkTranscriptionService.new(call_record)
result = service.transcribe
# => { status: :success, word_count: 342, model: "gpt-4o-mini-transcribe" }

Force re-transcription with a specific model

service = CallRecordProcessing::BulkTranscriptionService.new(call_record, model: "whisper-1")
result = service.transcribe(force: true)

Constant Summary collapse

MIN_DURATION_SECONDS =

MIN_DURATION_SECONDS_VOICEMAIL =

DEFAULT_MODEL =

'gpt-4o-mini-transcribe'

MAX_FILE_SIZE =

25.megabytes

Instance Attribute Summary collapse

#call_record ⇒ Object readonly
Returns the value of attribute call_record.
#model ⇒ Object readonly
Returns the value of attribute model.

Instance Method Summary collapse

#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService constructor
A new instance of BulkTranscriptionService.
#transcribe(force: false) ⇒ Object

Constructor Details

#initialize(call_record, model: DEFAULT_MODEL) ⇒ `BulkTranscriptionService`

Returns a new instance of BulkTranscriptionService.

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 40

def initialize(call_record, model: DEFAULT_MODEL)
  @call_record = call_record
  @model = model
end

Instance Attribute Details

#call_record ⇒ `Object` (readonly)

Returns the value of attribute call_record.



38
39
40

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38

def call_record
  @call_record
end

#model ⇒ `Object` (readonly)

Returns the value of attribute model.



38
39
40

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38

def model
  @model
end

Instance Method Details

#transcribe(force: false) ⇒ `Object`

# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 45

def transcribe(force: false)
  return skip_result(:already_transcribed) if already_transcribed? && !force
  return skip_result(:too_short) if too_short?
  return skip_result(:no_audio) unless has_audio?

  begin
    call_record.update!(transcription_state: :processing)

    temp_file = download_audio
    return error_result(:no_audio_file, 'Could not download audio') unless temp_file
    return error_result(:file_too_large, "#{File.size(temp_file)} bytes exceeds 25MB limit") if File.size(temp_file) > MAX_FILE_SIZE

    transcription = RubyLLM.transcribe(temp_file, model: model, language: 'en')

    transcript_text = transcription.text.to_s.strip
    return error_result(:empty_transcript, 'Transcription returned empty text') if transcript_text.blank?

    call_record.update!(
      transcript: transcript_text,
      transcription_state: :completed,
      transcribed_at: Time.current
    )

    EmbeddingWorker.perform_async('CallRecord', call_record.id)

    {
      status: :success,
      word_count: transcript_text.split.size,
      model: model,
      duration_secs: call_record.duration_secs
    }
  rescue RubyLLM::Error => e
    log_error "RubyLLM error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  rescue StandardError => e
    log_error "Unexpected error: #{e.message}"
    call_record.update!(transcription_state: :error)
    error_result(:transcription_failed, e.message)
  ensure
    cleanup_temp_file(temp_file) if temp_file
  end
end