Call Recording System

Overview

The Call Recording System captures, transcribes, and analyzes phone calls from two sources running in parallel:

Source	Recording Type	Speaker Separation	Schedule
Switchvox	Mono	AI diarization (~90-95% accuracy)	Hourly 6am-7pm
Twilio SIP Trunk	Stereo	Channel-based (100% accuracy)	Hourly 24/7

Both sources flow through the same AI pipeline for transcription, LeMUR analysis, and semantic search embeddings.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           RECORDING IMPORT                                   │
├─────────────────────────────────┬───────────────────────────────────────────┤
│                                 │                                           │
│  Switchvox PBX                  │  Twilio SIP Trunk                         │
│       │                         │       │                                   │
│       ▼                         │       ▼                                   │
│  CallRecordImporterWorker       │  TwilioRecordingImportWorker              │
│    (reads from R2)              │    (API polling)                          │
│       │                         │       │                                   │
│       ▼                         │       ▼                                   │
│  SwitchvoxCloudStorage          │  TwilioRecordingImporter                  │
│  ├─ Read from R2 store          │  ├─ Download WAV                          │
│  ├─ Already compressed          │  ├─ Compress to AAC                       │
│  └─ Switchvox account match     │  ├─ Direction-aware matching              │
│       │                         │  └─ Store raw Twilio JSON                 │
│       ▼                         │       │                                   │
│  CallRecord                     │       ▼                                   │
│    recording_source: nil        │  CallRecord                               │
│    audio_channels: 1            │    recording_source: 'twilio'             │
│                                 │    audio_channels: 2                      │
└─────────────────────────────────┴───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         TRANSCRIPTION PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  CallRecordTranscriptionWorker                                               │
│       │                                                                      │
│       ▼                                                                      │
│  TranscriptionService                                                        │
│  ├─ Download audio from S3/Dragonfly                                        │
│  ├─ Upload to AssemblyAI                                                    │
│  └─ Submit transcription request                                            │
│       │                                                                      │
│       ├─── Mono Recording ──────────────────┐                               │
│       │    speaker_labels: true              │                               │
│       │    speech_model: 'slam-1'            │                               │
│       │    keyterms_prompt: [...]            │                               │
│       │    speech_understanding: {...}       │                               │
│       │                                      │                               │
│       └─── Stereo Recording ────────────────┤                               │
│            multichannel: true                │                               │
│            speaker_labels: false             │                               │
│            (default speech model)            │                               │
│                                              │                               │
│                                              ▼                               │
│                                     AssemblyAI API                           │
│                                     ├─ Transcription                         │
│                                     ├─ PII Redaction                         │
│                                     ├─ Sentiment Analysis                    │
│                                     └─ Custom Spelling                       │
│                                              │                               │
│                                              ▼                               │
│                                     Webhook Callback                         │
│                                              │                               │
└──────────────────────────────────────────────┼──────────────────────────────┘
                                               │
                                               ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            AI ANALYSIS                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  AssemblyAI LeMUR (via LLM Gateway → Claude)                                │
│  ├─ Summary generation                                                       │
│  ├─ Action item extraction                                                   │
│  ├─ Call phase detection                                                     │
│  ├─ Customer satisfaction inference                                          │
│  ├─ Agent performance scoring                                                │
│  └─ Key topic extraction                                                     │
│       │                                                                      │
│       ▼                                                                      │
│  EmbeddingWorker (OpenAI)                                                   │
│  └─ Generate semantic search embeddings                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Database Schema

CallRecord Fields

Column	Type	Description
Recording Source
`recording_source`	string	’twilio’ or nil (Switchvox)
`audio_channels`	integer	1 (mono) or 2 (stereo)
`agent_speaker_label`	string	Detected agent speaker (A, B, 1, 2, or name)
Twilio-Specific
`twilio_recording_sid`	string	Unique Twilio Recording SID (indexed)
`twilio_call_sid`	string	Twilio Call SID
`twilio_call_details`	jsonb	Raw Twilio API response
Switchvox-Specific
`switchvox_recorded_call_id`	integer	Switchvox recording ID
`switchvox_from_account_id`	integer	Caller’s Switchvox account
`switchvox_to_account_id`	integer	Recipient’s Switchvox account
Transcription
`transcript`	text	Full text transcript
`structured_transcript_json`	jsonb	Utterances with timestamps, confidence, sentiment
`assemblyai_transcript_id`	string	For LeMUR analysis reference
`transcription_state`	enum	pending, processing, completed, error, no_audio, too_short
`transcribed_at`	datetime	When transcription completed
AI Analysis
`ai_summary`	text	LeMUR-generated summary
`action_items`	jsonb	Tasks with responsible party and priority
`call_phases`	jsonb	Segments with timestamps
`customer_satisfaction`	enum	very_satisfied, satisfied, neutral, frustrated, angry
`agent_performance_score`	integer	0-100 score
`key_topics`	string[]	Main topics discussed
`lemur_analyzed_at`	datetime	When LeMUR analysis completed
Call Metadata
`call_direction`	enum	inbound, outbound
`call_outcome`	enum	unknown, sale, support, inquiry, voicemail

Twilio JSONB Accessors

The twilio_call_details column provides typed accessors:

call_record.twilio_caller_name       # CNAM lookup result
call_record.twilio_direction         # 'trunking-originating' or 'trunking-terminating'
call_record.twilio_from              # Originating number/SIP
call_record.twilio_to                # Destination number/SIP
call_record.twilio_trunk_sid         # SIP trunk identifier
call_record.twilio_price             # Call cost
call_record.twilio_price_unit        # Currency (USD)
call_record.twilio_start_time        # DateTime
call_record.twilio_end_time          # DateTime
call_record.twilio_recording_channels   # 2 for stereo
call_record.twilio_recording_duration   # Seconds

Recording Import

Switchvox (Mono)

Worker: CallRecordImporterWorker — triggered two ways:

Real-time (primary): SFTPGo fires its upload action hook the moment the PBX finishes writing a recording, hitting Webhooks::V1::SftpgoController, which enqueues the single-file import (perform_async(wav_key)). Recordings land in the UI within seconds instead of waiting for the next poll. See SFTPGo § Real-time import hook.
Hourly poll (backstop): the scheduled import_new_records scan (6am-7pm CT) still runs, catching anything the hook misses (a dropped notification, SFTPGo downtime, an xml-less .wav). The two paths converge on the same CallRecordImporterWorker.perform_async(wav_key) call, so the worker’s :until_executed lock + the importer’s first_or_initialize make double-delivery a no-op — a recording is never imported twice.
Reads .wav recordings from the Cloudflare R2 bucket the SFTPGo gateway writes (the Switchvox PBX uploads over SFTP → SFTPGo → R2). The worker no longer SFTPs anywhere itself. See SFTPGo.
Pre-compressed audio files
Party matching via Switchvox account IDs → Employee lookup

Twilio SIP Trunk (Stereo)

Worker: TwilioRecordingImportWorker (hourly 24/7)

Polls Twilio API for new recordings
Downloads WAV, compresses to AAC (93% size reduction)
Direction-aware party matching

Party Matching Logic

Company Main Numbers (excluded from matching):

COMPANY_MAIN_NUMBERS = %w[
  +18008755285  # US 800#
  +18664361444  # Canada toll-free
  +18475502400  # Main line
].freeze

Direction-Aware Matching:

Call Direction	Origin Party	Destination Party
Inbound (trunking-originating)	Match by caller number	Skip (main line)
Outbound (trunking-terminating)	Match by agent DID	Match by destination

For outbound calls, agent DID is extracted from SIP address:

sip:18475502430@warmlyyours.pstn.twilio.com → +18475502430 → Employee match

Transcription

Mono vs Stereo Modes

Feature	Mono (Switchvox)	Stereo (Twilio)
`speaker_labels`	`true` (AI diarization)	`false`
`multichannel`	`false`	`true`
`speech_model`	`'slam-1'`	default
`keyterms_prompt`	Company terms	N/A
`speech_understanding`	Agent identification	N/A
Speaker detection	Heuristic + LeMUR fallback	By channel + direction

PII Redaction

Automatically redacted:

banking_information
credit_card_cvv, credit_card_expiration, credit_card_number
us_social_security_number
passport_number
password

Redacted text replaced with [CREDIT_CARD_NUMBER], etc.

Custom Spelling

Dynamic corrections for:

Company name variations (“Warmly Yours”, “Warm Lee Yours”, etc.)
Employee name phonetic variations (auto-generated from active employees)
Custom corrections from Settings

LeMUR Analysis

Using AssemblyAI’s LLM Gateway with Claude:

Summary - 2-4 sentence call summary
Action Items - Tasks with responsible party (agent/customer) and priority
Call Phases - Segments: greeting, problem identification, solution, closing
Customer Satisfaction - Inferred satisfaction level
Agent Performance Score - 0-100 based on professionalism and problem-solving
Key Topics - Main topics discussed

UI Features

Call Records Index

Recording Source Filter: Dropdown to filter by Twilio/Switchvox/All
Stereo Badge: Visual indicator for dual-channel recordings
Transcription State: Color-coded status badges

Call Record Show Page

Tabs:

Overview: Audio player, call metadata, parties, AI summary
Transcript: Structured transcript with speaker avatars, timestamps, sentiment badges
AI Analysis: Action items, call phases, performance metrics
Twilio (if applicable): Recording metadata, raw JSON data

Actions:

Re-transcribe: Queue new transcription
Re-analyze: Run LeMUR again
Swap Speakers: Manual speaker correction
Generate Embedding: Create semantic search vector

Automated Processing

Scheduler

# Switchvox import (6am-7pm hourly)
call_record_importer_worker:
  cron: '0 6-19 * * * America/Chicago'
  class: CallRecordImporterWorker

# Twilio import (all hours)
twilio_recording_import_worker:
  cron: '0 * * * * America/Chicago'
  class: TwilioRecordingImportWorker

# Daily transcription (6 AM)
daily_call_transcription:
  cron: '0 6 * * * America/Chicago'
  class: DailyCallRecordTranscriptionWorker

Daily Transcription Worker

Runs at 6 AM daily:

Processes all new calls from previous 24 hours
Backfills up to 500 older eligible calls
Uses ai_embeddings queue for controlled throughput

Manual Operations

Rake Tasks

# View statistics
bundle exec rake call_records:stats

# Backfill transcriptions (most recent first)
bundle exec rake call_records:backfill_transcriptions[LIMIT,DAYS_BACK]

# Backfill LeMUR analysis
bundle exec rake call_records:backfill_lemur[LIMIT]

# Process a single call
bundle exec rake call_records:process_one[CALL_RECORD_ID]

# Twilio operations
bundle exec rake call_records:twilio_check
bundle exec rake call_records:twilio_import[LIMIT]

Ruby Console

# Import Twilio recordings
importer = CallRecordTwilioRecordingImporter.new
importer.import_new_recordings(limit: 50, since: 24.hours.ago)

# Dry run
importer = CallRecordTwilioRecordingImporter.new(dry_run: true)
importer.import_new_recordings

# Transcribe a single record
CallRecordTranscriptionWorker.perform_async(call_record_id: 123, force: true)

# Re-run LeMUR analysis
CallRecordSummaryWorker.perform_async(call_record_id: 123)

Filtering & Queries

# By source
CallRecord.where(recording_source: 'twilio')
CallRecord.where(recording_source: [nil, 'switchvox'])

# Stereo recordings
CallRecord.where('audio_channels >= 2')

# Transcription state
CallRecord.where(transcription_state: :completed)
CallRecord.where(transcription_state: [:pending, :error])

# Eligible for transcription
CallRecord.joins(:upload)
          .where(transcription_state: [:pending, :error])
          .where('duration_secs >= 30')

# Ransack (for UI)
CallRecord.ransack(recording_source_eq: 'twilio')

Troubleshooting

Recordings Not Importing

Switchvox:

Check SFTP connectivity
Verify Switchvox recording paths
Check CallRecordImporterWorker logs

Twilio:

Check Sidekiq logs for TwilioRecordingImportWorker
Verify Twilio credentials in config/credentials.yml.enc
Check trunk ID matches configured value

Wrong Speaker Labels

Stereo recordings:

Verify call_direction is set correctly
Use “Swap Speakers” button for manual correction

Mono recordings:

Check if heuristic detection found agent greeting
LeMUR should have identified speaker
Use “Swap Speakers” if still incorrect

AssemblyAI Errors

Error	Solution
`"Invalid endpoint schema"`	Ensure `speaker_labels: false` for multichannel
`"custom_spelling 'to' fields must contain only one word"`	Filter multi-word targets
`400` on transcription submit	Check all parameters are valid

Twilio API Errors

grep '\[TwilioClient\]' log/production.log | tail -50

Known Limitations

Agent Identification for Inbound Twilio Calls

Twilio records at the trunk level before PBX routing, so we cannot identify which agent answered inbound calls. The to address is the company’s main line.

Workarounds:

Use Swap Speakers button for manual correction
Future: Correlate with Switchvox call_logs by time/number
Future: Create employee DID registry for direct-dial matching

Potential Duplicates

The same call could exist as both Switchvox (mono) and Twilio (stereo) recordings. Currently stored separately - use recording_source filter to analyze independently.

Cost Considerations

Per-Call Costs

Service	Cost	Notes
AssemblyAI Transcription	~$0.00025/sec	Average 2.9 min call = ~$0.044
AssemblyAI LeMUR	~$0.015/call	Claude Sonnet, ~2000 tokens
OpenAI Embedding	~$0.0001/1K tokens	~$0.0002/call
Total per call	~$0.06	Conservative estimate

Storage Costs

Item	Cost	Notes
Twilio Recording Storage	~$0.0025/min	Per minute stored
S3 Storage (compressed)	Minimal	AAC ~93% smaller than WAV

Volume Estimates

Metric	Value
Daily new calls	~124/day
Daily processing cost	~$7.50/day
Monthly processing cost	~$225/month
Backfill (2 years)	~$6,679

File	Purpose
`app/models/call_record.rb`	Model with jsonb_accessor, embeddable
`app/services/call_record_processing/transcription_service.rb`	Orchestrates transcription
`app/services/assemblyai_client.rb`	AssemblyAI API client
`app/services/call_record/twilio_recording_importer.rb`	Twilio import service
`app/services/call_record/switchvox_importer_sftp.rb`	Switchvox import service
`app/services/twilio_client.rb`	Twilio API client
`app/workers/call_record_transcription_worker.rb`	Transcription worker
`app/workers/twilio_recording_import_worker.rb`	Twilio import worker
`app/workers/daily_call_record_transcription_worker.rb`	Daily processing
`app/helpers/call_records_helper.rb`	Speaker detection helper
`app/controllers/call_records_controller.rb`	Controller with actions
`lib/tasks/call_records.rake`	Manual rake tasks