Call Recording System
Overview
The Call Recording System captures, transcribes, and analyzes phone calls from two sources running in parallel:
| Source | Recording Type | Speaker Separation | Schedule |
|---|---|---|---|
| Switchvox | Mono | AI diarization (~90-95% accuracy) | Hourly 6am-7pm |
| Twilio SIP Trunk | Stereo | Channel-based (100% accuracy) | Hourly 24/7 |
Both sources flow through the same AI pipeline for transcription, LeMUR analysis, and semantic search embeddings.
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECORDING IMPORT │
├─────────────────────────────────┬───────────────────────────────────────────┤
│ │ │
│ Switchvox PBX │ Twilio SIP Trunk │
│ │ │ │ │
│ ▼ │ ▼ │
│ CallRecordImporterWorker │ TwilioRecordingImportWorker │
│ (reads from R2) │ (API polling) │
│ │ │ │ │
│ ▼ │ ▼ │
│ SwitchvoxCloudStorage │ TwilioRecordingImporter │
│ ├─ Read from R2 store │ ├─ Download WAV │
│ ├─ Already compressed │ ├─ Compress to AAC │
│ └─ Switchvox account match │ ├─ Direction-aware matching │
│ │ │ └─ Store raw Twilio JSON │
│ ▼ │ │ │
│ CallRecord │ ▼ │
│ recording_source: nil │ CallRecord │
│ audio_channels: 1 │ recording_source: 'twilio' │
│ │ audio_channels: 2 │
└─────────────────────────────────┴───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSCRIPTION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ CallRecordTranscriptionWorker │
│ │ │
│ ▼ │
│ TranscriptionService │
│ ├─ Download audio from S3/Dragonfly │
│ ├─ Upload to AssemblyAI │
│ └─ Submit transcription request │
│ │ │
│ ├─── Mono Recording ──────────────────┐ │
│ │ speaker_labels: true │ │
│ │ speech_model: 'slam-1' │ │
│ │ keyterms_prompt: [...] │ │
│ │ speech_understanding: {...} │ │
│ │ │ │
│ └─── Stereo Recording ────────────────┤ │
│ multichannel: true │ │
│ speaker_labels: false │ │
│ (default speech model) │ │
│ │ │
│ ▼ │
│ AssemblyAI API │
│ ├─ Transcription │
│ ├─ PII Redaction │
│ ├─ Sentiment Analysis │
│ └─ Custom Spelling │
│ │ │
│ ▼ │
│ Webhook Callback │
│ │ │
└──────────────────────────────────────────────┼──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI ANALYSIS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ AssemblyAI LeMUR (via LLM Gateway → Claude) │
│ ├─ Summary generation │
│ ├─ Action item extraction │
│ ├─ Call phase detection │
│ ├─ Customer satisfaction inference │
│ ├─ Agent performance scoring │
│ └─ Key topic extraction │
│ │ │
│ ▼ │
│ EmbeddingWorker (OpenAI) │
│ └─ Generate semantic search embeddings │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Database Schema
CallRecord Fields
| Column | Type | Description |
|---|---|---|
| Recording Source | ||
recording_source |
string | 'twilio' or nil (Switchvox) |
audio_channels |
integer | 1 (mono) or 2 (stereo) |
agent_speaker_label |
string | Detected agent speaker (A, B, 1, 2, or name) |
| Twilio-Specific | ||
twilio_recording_sid |
string | Unique Twilio Recording SID (indexed) |
twilio_call_sid |
string | Twilio Call SID |
twilio_call_details |
jsonb | Raw Twilio API response |
| Switchvox-Specific | ||
switchvox_recorded_call_id |
integer | Switchvox recording ID |
switchvox_from_account_id |
integer | Caller's Switchvox account |
switchvox_to_account_id |
integer | Recipient's Switchvox account |
| Transcription | ||
transcript |
text | Full text transcript |
structured_transcript_json |
jsonb | Utterances with timestamps, confidence, sentiment |
assemblyai_transcript_id |
string | For LeMUR analysis reference |
transcription_state |
enum | pending, processing, completed, error, no_audio, too_short |
transcribed_at |
datetime | When transcription completed |
| AI Analysis | ||
ai_summary |
text | LeMUR-generated summary |
action_items |
jsonb | Tasks with responsible party and priority |
call_phases |
jsonb | Segments with timestamps |
customer_satisfaction |
enum | very_satisfied, satisfied, neutral, frustrated, angry |
agent_performance_score |
integer | 0-100 score |
key_topics |
string[] | Main topics discussed |
lemur_analyzed_at |
datetime | When LeMUR analysis completed |
| Call Metadata | ||
call_direction |
enum | inbound, outbound |
call_outcome |
enum | unknown, sale, support, inquiry, voicemail |
Twilio JSONB Accessors
The twilio_call_details column provides typed accessors:
call_record.twilio_caller_name # CNAM lookup result
call_record.twilio_direction # 'trunking-originating' or 'trunking-terminating'
call_record.twilio_from # Originating number/SIP
call_record.twilio_to # Destination number/SIP
call_record.twilio_trunk_sid # SIP trunk identifier
call_record.twilio_price # Call cost
call_record.twilio_price_unit # Currency (USD)
call_record.twilio_start_time # DateTime
call_record.twilio_end_time # DateTime
call_record.twilio_recording_channels # 2 for stereo
call_record.twilio_recording_duration # Seconds
Recording Import
Switchvox (Mono)
Worker: CallRecordImporterWorker — triggered two ways:
-
Real-time (primary): SFTPGo fires its
uploadaction hook the moment the
PBX finishes writing a recording, hittingWebhooks::V1::SftpgoController,
which enqueues the single-file import (perform_async(wav_key)). Recordings
land in the UI within seconds instead of waiting for the next poll. See
SFTPGo § Real-time import hook. -
Hourly poll (backstop): the scheduled
import_new_recordsscan (6am-7pm CT)
still runs, catching anything the hook misses (a dropped notification, SFTPGo
downtime, an xml-less.wav). The two paths converge on the same
CallRecordImporterWorker.perform_async(wav_key)call, so the worker's
:until_executedlock + the importer'sfirst_or_initializemake
double-delivery a no-op — a recording is never imported twice. -
Reads
.wavrecordings from the Cloudflare R2 bucket the SFTPGo gateway
writes (the Switchvox PBX uploads over SFTP → SFTPGo → R2). The worker no longer
SFTPs anywhere itself. See SFTPGo. -
Pre-compressed audio files
-
Party matching via Switchvox account IDs → Employee lookup
Twilio SIP Trunk (Stereo)
Worker: TwilioRecordingImportWorker (hourly 24/7)
- Polls Twilio API for new recordings
- Downloads WAV, compresses to AAC (93% size reduction)
- Direction-aware party matching
Party Matching Logic
Company Main Numbers (excluded from matching):
COMPANY_MAIN_NUMBERS = %w[
+18008755285 # US 800#
+18664361444 # Canada toll-free
+18475502400 # Main line
].freeze
Direction-Aware Matching:
| Call Direction | Origin Party | Destination Party |
|---|---|---|
| Inbound (trunking-originating) | Match by caller number | Skip (main line) |
| Outbound (trunking-terminating) | Match by agent DID | Match by destination |
For outbound calls, agent DID is extracted from SIP address:
sip:18475502430@warmlyyours.pstn.twilio.com → +18475502430 → Employee match
Transcription
Mono vs Stereo Modes
| Feature | Mono (Switchvox) | Stereo (Twilio) |
|---|---|---|
speaker_labels |
true (AI diarization) |
false |
multichannel |
false |
true |
speech_model |
'slam-1' |
default |
keyterms_prompt |
Company terms | N/A |
speech_understanding |
Agent identification | N/A |
| Speaker detection | Heuristic + LeMUR fallback | By channel + direction |
PII Redaction
Automatically redacted:
banking_informationcredit_card_cvv,credit_card_expiration,credit_card_numberus_social_security_numberpassport_numberpassword
Redacted text replaced with [CREDIT_CARD_NUMBER], etc.
Custom Spelling
Dynamic corrections for:
- Company name variations ("Warmly Yours", "Warm Lee Yours", etc.)
- Employee name phonetic variations (auto-generated from active employees)
- Custom corrections from Settings
LeMUR Analysis
Using AssemblyAI's LLM Gateway with Claude:
- Summary - 2-4 sentence call summary
- Action Items - Tasks with responsible party (agent/customer) and priority
- Call Phases - Segments: greeting, problem identification, solution, closing
- Customer Satisfaction - Inferred satisfaction level
- Agent Performance Score - 0-100 based on professionalism and problem-solving
- Key Topics - Main topics discussed
UI Features
Call Records Index
- Recording Source Filter: Dropdown to filter by Twilio/Switchvox/All
- Stereo Badge: Visual indicator for dual-channel recordings
- Transcription State: Color-coded status badges
Call Record Show Page
Tabs:
- Overview: Audio player, call metadata, parties, AI summary
- Transcript: Structured transcript with speaker avatars, timestamps, sentiment badges
- AI Analysis: Action items, call phases, performance metrics
- Twilio (if applicable): Recording metadata, raw JSON data
Actions:
- Re-transcribe: Queue new transcription
- Re-analyze: Run LeMUR again
- Swap Speakers: Manual speaker correction
- Generate Embedding: Create semantic search vector
Automated Processing
Scheduler
# config/sidekiq_production_schedule.yml
# Switchvox import (6am-7pm hourly)
call_record_importer_worker:
cron: '0 6-19 * * * America/Chicago'
class: CallRecordImporterWorker
# Twilio import (all hours)
twilio_recording_import_worker:
cron: '0 * * * * America/Chicago'
class: TwilioRecordingImportWorker
# Daily transcription (6 AM)
daily_call_transcription:
cron: '0 6 * * * America/Chicago'
class: DailyCallRecordTranscriptionWorker
Daily Transcription Worker
Runs at 6 AM daily:
- Processes all new calls from previous 24 hours
- Backfills up to 500 older eligible calls
- Uses
ai_embeddingsqueue for controlled throughput
Manual Operations
Rake Tasks
# View statistics
bundle exec rake call_records:stats
# Backfill transcriptions (most recent first)
bundle exec rake call_records:backfill_transcriptions[LIMIT,DAYS_BACK]
# Backfill LeMUR analysis
bundle exec rake call_records:backfill_lemur[LIMIT]
# Process a single call
bundle exec rake call_records:process_one[CALL_RECORD_ID]
# Twilio operations
bundle exec rake call_records:twilio_check
bundle exec rake call_records:twilio_import[LIMIT]
Ruby Console
# Import Twilio recordings
importer = CallRecordTwilioRecordingImporter.new
importer.import_new_recordings(limit: 50, since: 24.hours.ago)
# Dry run
importer = CallRecordTwilioRecordingImporter.new(dry_run: true)
importer.import_new_recordings
# Transcribe a single record
CallRecordTranscriptionWorker.perform_async(call_record_id: 123, force: true)
# Re-run LeMUR analysis
CallRecordSummaryWorker.perform_async(call_record_id: 123)
Filtering & Queries
# By source
CallRecord.where(recording_source: 'twilio')
CallRecord.where(recording_source: [nil, 'switchvox'])
# Stereo recordings
CallRecord.where('audio_channels >= 2')
# Transcription state
CallRecord.where(transcription_state: :completed)
CallRecord.where(transcription_state: [:pending, :error])
# Eligible for transcription
CallRecord.joins(:upload)
.where(transcription_state: [:pending, :error])
.where('duration_secs >= 30')
# Ransack (for UI)
CallRecord.ransack(recording_source_eq: 'twilio')
Troubleshooting
Recordings Not Importing
Switchvox:
- Check SFTP connectivity
- Verify Switchvox recording paths
- Check
CallRecordImporterWorkerlogs
Twilio:
- Check Sidekiq logs for
TwilioRecordingImportWorker - Verify Twilio credentials in
config/credentials.yml.enc - Check trunk ID matches configured value
Wrong Speaker Labels
Stereo recordings:
- Verify
call_directionis set correctly - Use "Swap Speakers" button for manual correction
Mono recordings:
- Check if heuristic detection found agent greeting
- LeMUR should have identified speaker
- Use "Swap Speakers" if still incorrect
AssemblyAI Errors
| Error | Solution |
|---|---|
"Invalid endpoint schema" |
Ensure speaker_labels: false for multichannel |
"custom_spelling 'to' fields must contain only one word" |
Filter multi-word targets |
400 on transcription submit |
Check all parameters are valid |
Twilio API Errors
grep '\[TwilioClient\]' log/production.log | tail -50
Known Limitations
Agent Identification for Inbound Twilio Calls
Twilio records at the trunk level before PBX routing, so we cannot identify which agent answered inbound calls. The to address is the company's main line.
Workarounds:
- Use Swap Speakers button for manual correction
- Future: Correlate with Switchvox
call_logsby time/number - Future: Create employee DID registry for direct-dial matching
Potential Duplicates
The same call could exist as both Switchvox (mono) and Twilio (stereo) recordings. Currently stored separately - use recording_source filter to analyze independently.
Cost Considerations
Per-Call Costs
| Service | Cost | Notes |
|---|---|---|
| AssemblyAI Transcription | ~$0.00025/sec | Average 2.9 min call = ~$0.044 |
| AssemblyAI LeMUR | ~$0.015/call | Claude Sonnet, ~2000 tokens |
| OpenAI Embedding | ~$0.0001/1K tokens | ~$0.0002/call |
| Total per call | ~$0.06 | Conservative estimate |
Storage Costs
| Item | Cost | Notes |
|---|---|---|
| Twilio Recording Storage | ~$0.0025/min | Per minute stored |
| S3 Storage (compressed) | Minimal | AAC ~93% smaller than WAV |
Volume Estimates
| Metric | Value |
|---|---|
| Daily new calls | ~124/day |
| Daily processing cost | ~$7.50/day |
| Monthly processing cost | ~$225/month |
| Backfill (2 years) | ~$6,679 |
Related Files
| File | Purpose |
|---|---|
app/models/call_record.rb |
Model with jsonb_accessor, embeddable |
app/services/call_record_processing/transcription_service.rb |
Orchestrates transcription |
app/services/assemblyai_client.rb |
AssemblyAI API client |
app/services/call_record/twilio_recording_importer.rb |
Twilio import service |
app/services/call_record/switchvox_importer_sftp.rb |
Switchvox import service |
app/services/twilio_client.rb |
Twilio API client |
app/workers/call_record_transcription_worker.rb |
Transcription worker |
app/workers/twilio_recording_import_worker.rb |
Twilio import worker |
app/workers/daily_call_record_transcription_worker.rb |
Daily processing |
app/helpers/call_records_helper.rb |
Speaker detection helper |
app/controllers/call_records_controller.rb |
Controller with actions |
lib/tasks/call_records.rake |
Manual rake tasks |