Class: CallRecordProcessing::BulkTranscriptionService
- Inherits:
-
Object
- Object
- CallRecordProcessing::BulkTranscriptionService
- Defined in:
- app/services/call_record_processing/bulk_transcription_service.rb
Overview
Lightweight transcription service for historical call record backfill.
Uses RubyLLM.transcribe (OpenAI Whisper/GPT-4o-mini-transcribe) instead of
AssemblyAI, providing ~60x cost reduction for bulk historical data:
AssemblyAI: $0.37/minute ($0.06/call with LeMUR)
GPT-4o-mini-transcribe: $0.003/minute (~$0.005/call)
Trade-offs vs AssemblyAI TranscriptionService:
- No speaker diarization (plain text transcript, no Speaker A/B labels)
- No LeMUR analysis (no ai_summary, action_items, call_phases, etc.)
- No PII redaction, custom spelling, or sentiment analysis
- Good enough for semantic search and keyword discovery on historical calls
The transcript is still useful for:
- Embedding generation (semantic search over call content)
- Full-text search (tsvector)
- Manual review and keyword discovery
Constant Summary collapse
- MIN_DURATION_SECONDS =
30- MIN_DURATION_SECONDS_VOICEMAIL =
5- DEFAULT_MODEL =
'gpt-4o-mini-transcribe'- MAX_FILE_SIZE =
25.megabytes
Instance Attribute Summary collapse
-
#call_record ⇒ Object
readonly
Returns the value of attribute call_record.
-
#model ⇒ Object
readonly
Returns the value of attribute model.
Instance Method Summary collapse
-
#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService
constructor
A new instance of BulkTranscriptionService.
- #transcribe(force: false) ⇒ Object
Constructor Details
#initialize(call_record, model: DEFAULT_MODEL) ⇒ BulkTranscriptionService
Returns a new instance of BulkTranscriptionService.
40 41 42 43 |
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 40 def initialize(call_record, model: DEFAULT_MODEL) @call_record = call_record @model = model end |
Instance Attribute Details
#call_record ⇒ Object (readonly)
Returns the value of attribute call_record.
38 39 40 |
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38 def call_record @call_record end |
#model ⇒ Object (readonly)
Returns the value of attribute model.
38 39 40 |
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 38 def model @model end |
Instance Method Details
#transcribe(force: false) ⇒ Object
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'app/services/call_record_processing/bulk_transcription_service.rb', line 45 def transcribe(force: false) return skip_result(:already_transcribed) if already_transcribed? && !force return skip_result(:too_short) if too_short? return skip_result(:no_audio) unless has_audio? begin call_record.update!(transcription_state: :processing) temp_file = download_audio return error_result(:no_audio_file, 'Could not download audio') unless temp_file return error_result(:file_too_large, "#{File.size(temp_file)} bytes exceeds 25MB limit") if File.size(temp_file) > MAX_FILE_SIZE transcription = RubyLLM.transcribe(temp_file, model: model, language: 'en') transcript_text = transcription.text.to_s.strip return error_result(:empty_transcript, 'Transcription returned empty text') if transcript_text.blank? call_record.update!( transcript: transcript_text, transcription_state: :completed, transcribed_at: Time.current ) EmbeddingWorker.perform_async('CallRecord', call_record.id) { status: :success, word_count: transcript_text.split.size, model: model, duration_secs: call_record.duration_secs } rescue RubyLLM::Error => e log_error "RubyLLM error: #{e.}" call_record.update!(transcription_state: :error) error_result(:transcription_failed, e.) rescue StandardError => e log_error "Unexpected error: #{e.}" call_record.update!(transcription_state: :error) error_result(:transcription_failed, e.) ensure cleanup_temp_file(temp_file) if temp_file end end |