Skip to content

Call Recording System

The Call Recording System captures, transcribes, and analyzes phone calls from two sources running in parallel:

SourceRecording TypeSpeaker SeparationSchedule
SwitchvoxMonoAI diarization (~90-95% accuracy)Hourly 6am-7pm
Twilio SIP TrunkStereoChannel-based (100% accuracy)Hourly 24/7

Both sources flow through the same AI pipeline for transcription, LeMUR analysis, and semantic search embeddings.

┌─────────────────────────────────────────────────────────────────────────────┐
│ RECORDING IMPORT │
├─────────────────────────────────┬───────────────────────────────────────────┤
│ │ │
│ Switchvox PBX │ Twilio SIP Trunk │
│ │ │ │ │
│ ▼ │ ▼ │
│ CallRecordImporterWorker │ TwilioRecordingImportWorker │
│ (reads from R2) │ (API polling) │
│ │ │ │ │
│ ▼ │ ▼ │
│ SwitchvoxCloudStorage │ TwilioRecordingImporter │
│ ├─ Read from R2 store │ ├─ Download WAV │
│ ├─ Already compressed │ ├─ Compress to AAC │
│ └─ Switchvox account match │ ├─ Direction-aware matching │
│ │ │ └─ Store raw Twilio JSON │
│ ▼ │ │ │
│ CallRecord │ ▼ │
│ recording_source: nil │ CallRecord │
│ audio_channels: 1 │ recording_source: 'twilio' │
│ │ audio_channels: 2 │
└─────────────────────────────────┴───────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSCRIPTION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ CallRecordTranscriptionWorker │
│ │ │
│ ▼ │
│ TranscriptionService │
│ ├─ Download audio from S3/Dragonfly │
│ ├─ Upload to AssemblyAI │
│ └─ Submit transcription request │
│ │ │
│ ├─── Mono Recording ──────────────────┐ │
│ │ speaker_labels: true │ │
│ │ speech_model: 'slam-1' │ │
│ │ keyterms_prompt: [...] │ │
│ │ speech_understanding: {...} │ │
│ │ │ │
│ └─── Stereo Recording ────────────────┤ │
│ multichannel: true │ │
│ speaker_labels: false │ │
│ (default speech model) │ │
│ │ │
│ ▼ │
│ AssemblyAI API │
│ ├─ Transcription │
│ ├─ PII Redaction │
│ ├─ Sentiment Analysis │
│ └─ Custom Spelling │
│ │ │
│ ▼ │
│ Webhook Callback │
│ │ │
└──────────────────────────────────────────────┼──────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI ANALYSIS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ AssemblyAI LeMUR (via LLM Gateway → Claude) │
│ ├─ Summary generation │
│ ├─ Action item extraction │
│ ├─ Call phase detection │
│ ├─ Customer satisfaction inference │
│ ├─ Agent performance scoring │
│ └─ Key topic extraction │
│ │ │
│ ▼ │
│ EmbeddingWorker (OpenAI) │
│ └─ Generate semantic search embeddings │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
ColumnTypeDescription
Recording Source
recording_sourcestring’twilio’ or nil (Switchvox)
audio_channelsinteger1 (mono) or 2 (stereo)
agent_speaker_labelstringDetected agent speaker (A, B, 1, 2, or name)
Twilio-Specific
twilio_recording_sidstringUnique Twilio Recording SID (indexed)
twilio_call_sidstringTwilio Call SID
twilio_call_detailsjsonbRaw Twilio API response
Switchvox-Specific
switchvox_recorded_call_idintegerSwitchvox recording ID
switchvox_from_account_idintegerCaller’s Switchvox account
switchvox_to_account_idintegerRecipient’s Switchvox account
Transcription
transcripttextFull text transcript
structured_transcript_jsonjsonbUtterances with timestamps, confidence, sentiment
assemblyai_transcript_idstringFor LeMUR analysis reference
transcription_stateenumpending, processing, completed, error, no_audio, too_short
transcribed_atdatetimeWhen transcription completed
AI Analysis
ai_summarytextLeMUR-generated summary
action_itemsjsonbTasks with responsible party and priority
call_phasesjsonbSegments with timestamps
customer_satisfactionenumvery_satisfied, satisfied, neutral, frustrated, angry
agent_performance_scoreinteger0-100 score
key_topicsstring[]Main topics discussed
lemur_analyzed_atdatetimeWhen LeMUR analysis completed
Call Metadata
call_directionenuminbound, outbound
call_outcomeenumunknown, sale, support, inquiry, voicemail

The twilio_call_details column provides typed accessors:

call_record.twilio_caller_name # CNAM lookup result
call_record.twilio_direction # 'trunking-originating' or 'trunking-terminating'
call_record.twilio_from # Originating number/SIP
call_record.twilio_to # Destination number/SIP
call_record.twilio_trunk_sid # SIP trunk identifier
call_record.twilio_price # Call cost
call_record.twilio_price_unit # Currency (USD)
call_record.twilio_start_time # DateTime
call_record.twilio_end_time # DateTime
call_record.twilio_recording_channels # 2 for stereo
call_record.twilio_recording_duration # Seconds

Worker: CallRecordImporterWorker — triggered two ways:

  • Real-time (primary): SFTPGo fires its upload action hook the moment the PBX finishes writing a recording, hitting Webhooks::V1::SftpgoController, which enqueues the single-file import (perform_async(wav_key)). Recordings land in the UI within seconds instead of waiting for the next poll. See SFTPGo § Real-time import hook.

  • Hourly poll (backstop): the scheduled import_new_records scan (6am-7pm CT) still runs, catching anything the hook misses (a dropped notification, SFTPGo downtime, an xml-less .wav). The two paths converge on the same CallRecordImporterWorker.perform_async(wav_key) call, so the worker’s :until_executed lock + the importer’s first_or_initialize make double-delivery a no-op — a recording is never imported twice.

  • Reads .wav recordings from the Cloudflare R2 bucket the SFTPGo gateway writes (the Switchvox PBX uploads over SFTP → SFTPGo → R2). The worker no longer SFTPs anywhere itself. See SFTPGo.

  • Pre-compressed audio files

  • Party matching via Switchvox account IDs → Employee lookup

Worker: TwilioRecordingImportWorker (hourly 24/7)

  • Polls Twilio API for new recordings
  • Downloads WAV, compresses to AAC (93% size reduction)
  • Direction-aware party matching

Company Main Numbers (excluded from matching):

COMPANY_MAIN_NUMBERS = %w[
+18008755285 # US 800#
+18664361444 # Canada toll-free
+18475502400 # Main line
].freeze

Direction-Aware Matching:

Call DirectionOrigin PartyDestination Party
Inbound (trunking-originating)Match by caller numberSkip (main line)
Outbound (trunking-terminating)Match by agent DIDMatch by destination

For outbound calls, agent DID is extracted from SIP address:

sip:18475502430@warmlyyours.pstn.twilio.com → +18475502430 → Employee match
FeatureMono (Switchvox)Stereo (Twilio)
speaker_labelstrue (AI diarization)false
multichannelfalsetrue
speech_model'slam-1'default
keyterms_promptCompany termsN/A
speech_understandingAgent identificationN/A
Speaker detectionHeuristic + LeMUR fallbackBy channel + direction

Automatically redacted:

  • banking_information
  • credit_card_cvv, credit_card_expiration, credit_card_number
  • us_social_security_number
  • passport_number
  • password

Redacted text replaced with [CREDIT_CARD_NUMBER], etc.

Dynamic corrections for:

  • Company name variations (“Warmly Yours”, “Warm Lee Yours”, etc.)
  • Employee name phonetic variations (auto-generated from active employees)
  • Custom corrections from Settings

Using AssemblyAI’s LLM Gateway with Claude:

  1. Summary - 2-4 sentence call summary
  2. Action Items - Tasks with responsible party (agent/customer) and priority
  3. Call Phases - Segments: greeting, problem identification, solution, closing
  4. Customer Satisfaction - Inferred satisfaction level
  5. Agent Performance Score - 0-100 based on professionalism and problem-solving
  6. Key Topics - Main topics discussed
  • Recording Source Filter: Dropdown to filter by Twilio/Switchvox/All
  • Stereo Badge: Visual indicator for dual-channel recordings
  • Transcription State: Color-coded status badges

Tabs:

  • Overview: Audio player, call metadata, parties, AI summary
  • Transcript: Structured transcript with speaker avatars, timestamps, sentiment badges
  • AI Analysis: Action items, call phases, performance metrics
  • Twilio (if applicable): Recording metadata, raw JSON data

Actions:

  • Re-transcribe: Queue new transcription
  • Re-analyze: Run LeMUR again
  • Swap Speakers: Manual speaker correction
  • Generate Embedding: Create semantic search vector
config/sidekiq_production_schedule.yml
# Switchvox import (6am-7pm hourly)
call_record_importer_worker:
cron: '0 6-19 * * * America/Chicago'
class: CallRecordImporterWorker
# Twilio import (all hours)
twilio_recording_import_worker:
cron: '0 * * * * America/Chicago'
class: TwilioRecordingImportWorker
# Daily transcription (6 AM)
daily_call_transcription:
cron: '0 6 * * * America/Chicago'
class: DailyCallRecordTranscriptionWorker

Runs at 6 AM daily:

  1. Processes all new calls from previous 24 hours
  2. Backfills up to 500 older eligible calls
  3. Uses ai_embeddings queue for controlled throughput
Terminal window
# View statistics
bundle exec rake call_records:stats
# Backfill transcriptions (most recent first)
bundle exec rake call_records:backfill_transcriptions[LIMIT,DAYS_BACK]
# Backfill LeMUR analysis
bundle exec rake call_records:backfill_lemur[LIMIT]
# Process a single call
bundle exec rake call_records:process_one[CALL_RECORD_ID]
# Twilio operations
bundle exec rake call_records:twilio_check
bundle exec rake call_records:twilio_import[LIMIT]
# Import Twilio recordings
importer = CallRecordTwilioRecordingImporter.new
importer.import_new_recordings(limit: 50, since: 24.hours.ago)
# Dry run
importer = CallRecordTwilioRecordingImporter.new(dry_run: true)
importer.import_new_recordings
# Transcribe a single record
CallRecordTranscriptionWorker.perform_async(call_record_id: 123, force: true)
# Re-run LeMUR analysis
CallRecordSummaryWorker.perform_async(call_record_id: 123)
# By source
CallRecord.where(recording_source: 'twilio')
CallRecord.where(recording_source: [nil, 'switchvox'])
# Stereo recordings
CallRecord.where('audio_channels >= 2')
# Transcription state
CallRecord.where(transcription_state: :completed)
CallRecord.where(transcription_state: [:pending, :error])
# Eligible for transcription
CallRecord.joins(:upload)
.where(transcription_state: [:pending, :error])
.where('duration_secs >= 30')
# Ransack (for UI)
CallRecord.ransack(recording_source_eq: 'twilio')

Switchvox:

  1. Check SFTP connectivity
  2. Verify Switchvox recording paths
  3. Check CallRecordImporterWorker logs

Twilio:

  1. Check Sidekiq logs for TwilioRecordingImportWorker
  2. Verify Twilio credentials in config/credentials.yml.enc
  3. Check trunk ID matches configured value

Stereo recordings:

  1. Verify call_direction is set correctly
  2. Use “Swap Speakers” button for manual correction

Mono recordings:

  1. Check if heuristic detection found agent greeting
  2. LeMUR should have identified speaker
  3. Use “Swap Speakers” if still incorrect
ErrorSolution
"Invalid endpoint schema"Ensure speaker_labels: false for multichannel
"custom_spelling 'to' fields must contain only one word"Filter multi-word targets
400 on transcription submitCheck all parameters are valid
Terminal window
grep '\[TwilioClient\]' log/production.log | tail -50

Agent Identification for Inbound Twilio Calls

Section titled “Agent Identification for Inbound Twilio Calls”

Twilio records at the trunk level before PBX routing, so we cannot identify which agent answered inbound calls. The to address is the company’s main line.

Workarounds:

  • Use Swap Speakers button for manual correction
  • Future: Correlate with Switchvox call_logs by time/number
  • Future: Create employee DID registry for direct-dial matching

The same call could exist as both Switchvox (mono) and Twilio (stereo) recordings. Currently stored separately - use recording_source filter to analyze independently.

ServiceCostNotes
AssemblyAI Transcription~$0.00025/secAverage 2.9 min call = ~$0.044
AssemblyAI LeMUR~$0.015/callClaude Sonnet, ~2000 tokens
OpenAI Embedding~$0.0001/1K tokens~$0.0002/call
Total per call~$0.06Conservative estimate
ItemCostNotes
Twilio Recording Storage~$0.0025/minPer minute stored
S3 Storage (compressed)MinimalAAC ~93% smaller than WAV
MetricValue
Daily new calls~124/day
Daily processing cost~$7.50/day
Monthly processing cost~$225/month
Backfill (2 years)~$6,679
FilePurpose
app/models/call_record.rbModel with jsonb_accessor, embeddable
app/services/call_record_processing/transcription_service.rbOrchestrates transcription
app/services/assemblyai_client.rbAssemblyAI API client
app/services/call_record/twilio_recording_importer.rbTwilio import service
app/services/call_record/switchvox_importer_sftp.rbSwitchvox import service
app/services/twilio_client.rbTwilio API client
app/workers/call_record_transcription_worker.rbTranscription worker
app/workers/twilio_recording_import_worker.rbTwilio import worker
app/workers/daily_call_record_transcription_worker.rbDaily processing
app/helpers/call_records_helper.rbSpeaker detection helper
app/controllers/call_records_controller.rbController with actions
lib/tasks/call_records.rakeManual rake tasks