Call Recording System

Overview

The Call Recording System captures, transcribes, and analyzes phone calls from two sources running in parallel:

Source Recording Type Speaker Separation Schedule
Switchvox Mono AI diarization (~90-95% accuracy) Hourly 6am-7pm
Twilio SIP Trunk Stereo Channel-based (100% accuracy) Hourly 24/7

Both sources flow through the same AI pipeline for transcription, LeMUR analysis, and semantic search embeddings.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           RECORDING IMPORT                                   │
├─────────────────────────────────┬───────────────────────────────────────────┤
│                                 │                                           │
│  Switchvox PBX                  │  Twilio SIP Trunk                         │
│       │                         │       │                                   │
│       ▼                         │       ▼                                   │
│  CallRecordImporterWorker       │  TwilioRecordingImportWorker              │
│    (reads from R2)              │    (API polling)                          │
│       │                         │       │                                   │
│       ▼                         │       ▼                                   │
│  SwitchvoxCloudStorage          │  TwilioRecordingImporter                  │
│  ├─ Read from R2 store          │  ├─ Download WAV                          │
│  ├─ Already compressed          │  ├─ Compress to AAC                       │
│  └─ Switchvox account match     │  ├─ Direction-aware matching              │
│       │                         │  └─ Store raw Twilio JSON                 │
│       ▼                         │       │                                   │
│  CallRecord                     │       ▼                                   │
│    recording_source: nil        │  CallRecord                               │
│    audio_channels: 1            │    recording_source: 'twilio'             │
│                                 │    audio_channels: 2                      │
└─────────────────────────────────┴───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         TRANSCRIPTION PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  CallRecordTranscriptionWorker                                               │
│       │                                                                      │
│       ▼                                                                      │
│  TranscriptionService                                                        │
│  ├─ Download audio from S3/Dragonfly                                        │
│  ├─ Upload to AssemblyAI                                                    │
│  └─ Submit transcription request                                            │
│       │                                                                      │
│       ├─── Mono Recording ──────────────────┐                               │
│       │    speaker_labels: true              │                               │
│       │    speech_model: 'slam-1'            │                               │
│       │    keyterms_prompt: [...]            │                               │
│       │    speech_understanding: {...}       │                               │
│       │                                      │                               │
│       └─── Stereo Recording ────────────────┤                               │
│            multichannel: true                │                               │
│            speaker_labels: false             │                               │
│            (default speech model)            │                               │
│                                              │                               │
│                                              ▼                               │
│                                     AssemblyAI API                           │
│                                     ├─ Transcription                         │
│                                     ├─ PII Redaction                         │
│                                     ├─ Sentiment Analysis                    │
│                                     └─ Custom Spelling                       │
│                                              │                               │
│                                              ▼                               │
│                                     Webhook Callback                         │
│                                              │                               │
└──────────────────────────────────────────────┼──────────────────────────────┘
                                               │
                                               ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            AI ANALYSIS                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  AssemblyAI LeMUR (via LLM Gateway → Claude)                                │
│  ├─ Summary generation                                                       │
│  ├─ Action item extraction                                                   │
│  ├─ Call phase detection                                                     │
│  ├─ Customer satisfaction inference                                          │
│  ├─ Agent performance scoring                                                │
│  └─ Key topic extraction                                                     │
│       │                                                                      │
│       ▼                                                                      │
│  EmbeddingWorker (OpenAI)                                                   │
│  └─ Generate semantic search embeddings                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Database Schema

CallRecord Fields

Column Type Description
Recording Source
recording_source string 'twilio' or nil (Switchvox)
audio_channels integer 1 (mono) or 2 (stereo)
agent_speaker_label string Detected agent speaker (A, B, 1, 2, or name)
Twilio-Specific
twilio_recording_sid string Unique Twilio Recording SID (indexed)
twilio_call_sid string Twilio Call SID
twilio_call_details jsonb Raw Twilio API response
Switchvox-Specific
switchvox_recorded_call_id integer Switchvox recording ID
switchvox_from_account_id integer Caller's Switchvox account
switchvox_to_account_id integer Recipient's Switchvox account
Transcription
transcript text Full text transcript
structured_transcript_json jsonb Utterances with timestamps, confidence, sentiment
assemblyai_transcript_id string For LeMUR analysis reference
transcription_state enum pending, processing, completed, error, no_audio, too_short
transcribed_at datetime When transcription completed
AI Analysis
ai_summary text LeMUR-generated summary
action_items jsonb Tasks with responsible party and priority
call_phases jsonb Segments with timestamps
customer_satisfaction enum very_satisfied, satisfied, neutral, frustrated, angry
agent_performance_score integer 0-100 score
key_topics string[] Main topics discussed
lemur_analyzed_at datetime When LeMUR analysis completed
Call Metadata
call_direction enum inbound, outbound
call_outcome enum unknown, sale, support, inquiry, voicemail

Twilio JSONB Accessors

The twilio_call_details column provides typed accessors:

call_record.twilio_caller_name       # CNAM lookup result
call_record.twilio_direction         # 'trunking-originating' or 'trunking-terminating'
call_record.twilio_from              # Originating number/SIP
call_record.twilio_to                # Destination number/SIP
call_record.twilio_trunk_sid         # SIP trunk identifier
call_record.twilio_price             # Call cost
call_record.twilio_price_unit        # Currency (USD)
call_record.twilio_start_time        # DateTime
call_record.twilio_end_time          # DateTime
call_record.twilio_recording_channels   # 2 for stereo
call_record.twilio_recording_duration   # Seconds

Recording Import

Switchvox (Mono)

Worker: CallRecordImporterWorker — triggered two ways:

  • Real-time (primary): SFTPGo fires its upload action hook the moment the
    PBX finishes writing a recording, hitting Webhooks::V1::SftpgoController,
    which enqueues the single-file import (perform_async(wav_key)). Recordings
    land in the UI within seconds instead of waiting for the next poll. See
    SFTPGo § Real-time import hook.

  • Hourly poll (backstop): the scheduled import_new_records scan (6am-7pm CT)
    still runs, catching anything the hook misses (a dropped notification, SFTPGo
    downtime, an xml-less .wav). The two paths converge on the same
    CallRecordImporterWorker.perform_async(wav_key) call, so the worker's
    :until_executed lock + the importer's first_or_initialize make
    double-delivery a no-op — a recording is never imported twice.

  • Reads .wav recordings from the Cloudflare R2 bucket the SFTPGo gateway
    writes (the Switchvox PBX uploads over SFTP → SFTPGo → R2). The worker no longer
    SFTPs anywhere itself. See SFTPGo.

  • Pre-compressed audio files

  • Party matching via Switchvox account IDs → Employee lookup

Twilio SIP Trunk (Stereo)

Worker: TwilioRecordingImportWorker (hourly 24/7)

  • Polls Twilio API for new recordings
  • Downloads WAV, compresses to AAC (93% size reduction)
  • Direction-aware party matching

Party Matching Logic

Company Main Numbers (excluded from matching):

COMPANY_MAIN_NUMBERS = %w[
  +18008755285  # US 800#
  +18664361444  # Canada toll-free
  +18475502400  # Main line
].freeze

Direction-Aware Matching:

Call Direction Origin Party Destination Party
Inbound (trunking-originating) Match by caller number Skip (main line)
Outbound (trunking-terminating) Match by agent DID Match by destination

For outbound calls, agent DID is extracted from SIP address:

sip:18475502430@warmlyyours.pstn.twilio.com → +18475502430 → Employee match

Transcription

Mono vs Stereo Modes

Feature Mono (Switchvox) Stereo (Twilio)
speaker_labels true (AI diarization) false
multichannel false true
speech_model 'slam-1' default
keyterms_prompt Company terms N/A
speech_understanding Agent identification N/A
Speaker detection Heuristic + LeMUR fallback By channel + direction

PII Redaction

Automatically redacted:

  • banking_information
  • credit_card_cvv, credit_card_expiration, credit_card_number
  • us_social_security_number
  • passport_number
  • password

Redacted text replaced with [CREDIT_CARD_NUMBER], etc.

Custom Spelling

Dynamic corrections for:

  • Company name variations ("Warmly Yours", "Warm Lee Yours", etc.)
  • Employee name phonetic variations (auto-generated from active employees)
  • Custom corrections from Settings

LeMUR Analysis

Using AssemblyAI's LLM Gateway with Claude:

  1. Summary - 2-4 sentence call summary
  2. Action Items - Tasks with responsible party (agent/customer) and priority
  3. Call Phases - Segments: greeting, problem identification, solution, closing
  4. Customer Satisfaction - Inferred satisfaction level
  5. Agent Performance Score - 0-100 based on professionalism and problem-solving
  6. Key Topics - Main topics discussed

UI Features

Call Records Index

  • Recording Source Filter: Dropdown to filter by Twilio/Switchvox/All
  • Stereo Badge: Visual indicator for dual-channel recordings
  • Transcription State: Color-coded status badges

Call Record Show Page

Tabs:

  • Overview: Audio player, call metadata, parties, AI summary
  • Transcript: Structured transcript with speaker avatars, timestamps, sentiment badges
  • AI Analysis: Action items, call phases, performance metrics
  • Twilio (if applicable): Recording metadata, raw JSON data

Actions:

  • Re-transcribe: Queue new transcription
  • Re-analyze: Run LeMUR again
  • Swap Speakers: Manual speaker correction
  • Generate Embedding: Create semantic search vector

Automated Processing

Scheduler

# config/sidekiq_production_schedule.yml

# Switchvox import (6am-7pm hourly)
call_record_importer_worker:
  cron: '0 6-19 * * * America/Chicago'
  class: CallRecordImporterWorker

# Twilio import (all hours)
twilio_recording_import_worker:
  cron: '0 * * * * America/Chicago'
  class: TwilioRecordingImportWorker

# Daily transcription (6 AM)
daily_call_transcription:
  cron: '0 6 * * * America/Chicago'
  class: DailyCallRecordTranscriptionWorker

Daily Transcription Worker

Runs at 6 AM daily:

  1. Processes all new calls from previous 24 hours
  2. Backfills up to 500 older eligible calls
  3. Uses ai_embeddings queue for controlled throughput

Manual Operations

Rake Tasks

# View statistics
bundle exec rake call_records:stats

# Backfill transcriptions (most recent first)
bundle exec rake call_records:backfill_transcriptions[LIMIT,DAYS_BACK]

# Backfill LeMUR analysis
bundle exec rake call_records:backfill_lemur[LIMIT]

# Process a single call
bundle exec rake call_records:process_one[CALL_RECORD_ID]

# Twilio operations
bundle exec rake call_records:twilio_check
bundle exec rake call_records:twilio_import[LIMIT]

Ruby Console

# Import Twilio recordings
importer = CallRecordTwilioRecordingImporter.new
importer.import_new_recordings(limit: 50, since: 24.hours.ago)

# Dry run
importer = CallRecordTwilioRecordingImporter.new(dry_run: true)
importer.import_new_recordings

# Transcribe a single record
CallRecordTranscriptionWorker.perform_async(call_record_id: 123, force: true)

# Re-run LeMUR analysis
CallRecordSummaryWorker.perform_async(call_record_id: 123)

Filtering & Queries

# By source
CallRecord.where(recording_source: 'twilio')
CallRecord.where(recording_source: [nil, 'switchvox'])

# Stereo recordings
CallRecord.where('audio_channels >= 2')

# Transcription state
CallRecord.where(transcription_state: :completed)
CallRecord.where(transcription_state: [:pending, :error])

# Eligible for transcription
CallRecord.joins(:upload)
          .where(transcription_state: [:pending, :error])
          .where('duration_secs >= 30')

# Ransack (for UI)
CallRecord.ransack(recording_source_eq: 'twilio')

Troubleshooting

Recordings Not Importing

Switchvox:

  1. Check SFTP connectivity
  2. Verify Switchvox recording paths
  3. Check CallRecordImporterWorker logs

Twilio:

  1. Check Sidekiq logs for TwilioRecordingImportWorker
  2. Verify Twilio credentials in config/credentials.yml.enc
  3. Check trunk ID matches configured value

Wrong Speaker Labels

Stereo recordings:

  1. Verify call_direction is set correctly
  2. Use "Swap Speakers" button for manual correction

Mono recordings:

  1. Check if heuristic detection found agent greeting
  2. LeMUR should have identified speaker
  3. Use "Swap Speakers" if still incorrect

AssemblyAI Errors

Error Solution
"Invalid endpoint schema" Ensure speaker_labels: false for multichannel
"custom_spelling 'to' fields must contain only one word" Filter multi-word targets
400 on transcription submit Check all parameters are valid

Twilio API Errors

grep '\[TwilioClient\]' log/production.log | tail -50

Known Limitations

Agent Identification for Inbound Twilio Calls

Twilio records at the trunk level before PBX routing, so we cannot identify which agent answered inbound calls. The to address is the company's main line.

Workarounds:

  • Use Swap Speakers button for manual correction
  • Future: Correlate with Switchvox call_logs by time/number
  • Future: Create employee DID registry for direct-dial matching

Potential Duplicates

The same call could exist as both Switchvox (mono) and Twilio (stereo) recordings. Currently stored separately - use recording_source filter to analyze independently.

Cost Considerations

Per-Call Costs

Service Cost Notes
AssemblyAI Transcription ~$0.00025/sec Average 2.9 min call = ~$0.044
AssemblyAI LeMUR ~$0.015/call Claude Sonnet, ~2000 tokens
OpenAI Embedding ~$0.0001/1K tokens ~$0.0002/call
Total per call ~$0.06 Conservative estimate

Storage Costs

Item Cost Notes
Twilio Recording Storage ~$0.0025/min Per minute stored
S3 Storage (compressed) Minimal AAC ~93% smaller than WAV

Volume Estimates

Metric Value
Daily new calls ~124/day
Daily processing cost ~$7.50/day
Monthly processing cost ~$225/month
Backfill (2 years) ~$6,679

Related Files

File Purpose
app/models/call_record.rb Model with jsonb_accessor, embeddable
app/services/call_record_processing/transcription_service.rb Orchestrates transcription
app/services/assemblyai_client.rb AssemblyAI API client
app/services/call_record/twilio_recording_importer.rb Twilio import service
app/services/call_record/switchvox_importer_sftp.rb Switchvox import service
app/services/twilio_client.rb Twilio API client
app/workers/call_record_transcription_worker.rb Transcription worker
app/workers/twilio_recording_import_worker.rb Twilio import worker
app/workers/daily_call_record_transcription_worker.rb Daily processing
app/helpers/call_records_helper.rb Speaker detection helper
app/controllers/call_records_controller.rb Controller with actions
lib/tasks/call_records.rake Manual rake tasks

References