Class: VideoProcessing::TranscriptionService

Inherits:

Object

Object
VideoProcessing::TranscriptionService

show all

Defined in:: app/services/video_processing/transcription_service.rb

Instance Method Summary collapse

#apply_terminology_regex(text) ⇒ Object
Apply simple regex-based terminology fixes.
#auto_detect_no_spoken_words(transcription_data) ⇒ Object
Step 1: Retrieve and overwrite structured and HTML transcription Automatically detect and mark videos as having no spoken words.
#count_words(text) ⇒ Object
#ensure_transcription_completed!(transcript_id) ⇒ Object
Wait for AssemblyAI transcript to reach 'completed'.
#export_and_store_paragraphs ⇒ Object
Export and store paragraphs.
#export_and_store_sentences ⇒ Object
Export and store sentences.
#export_sentences(transcript_id) ⇒ Object
#export_vtt_captions(transcript_id) ⇒ Object
#format_caption_text(text, max_chars_per_line) ⇒ Object
Format caption text with line breaks.
#format_paragraphs_as_text(paragraphs) ⇒ Object
Format paragraphs as clean text.
#format_sentences_as_text(sentences) ⇒ Object
#format_timestamp(milliseconds) ⇒ Object
#format_transcript_as_html(structured_data) ⇒ Object
Format transcript as HTML using paragraphs for more natural segmentation.
#format_transcript_data(transcription_data) ⇒ Object
#format_transcript_to_html_with_speakers(transcription_data) ⇒ Object
#format_vtt_timestamp(milliseconds) ⇒ Object
Format timestamp for VTT (HH:MM:SS.mmm).
#generate_html_transcript_from_paragraphs(paragraphs) ⇒ Object
Generate HTML transcript from paragraphs.
#generate_paragraphs_by_chunking(vtt_polished, captions_per_paragraph: 15) ⇒ Object
Fallback: Generate paragraphs by chunking captions into groups Used when native AssemblyAI paragraphs are not available.
#generate_paragraphs_from_polished_text(vtt_polished) ⇒ Object
Generate paragraphs from polished VTT text using AssemblyAI's native paragraph API This uses AssemblyAI's built-in paragraph detection (based on pauses, topic changes, etc.) and maps the polished captions to those natural paragraph breaks.
#generate_vtt_content_from_polished_vtt ⇒ Object
Generate VTT content from polished VTT data.
#generate_vtt_content_from_structured_transcript ⇒ Object
Generate VTT content from structured transcript.
#get_and_polish_native_paragraphs ⇒ Object
Get native paragraphs from AssemblyAI and polish them with LLM Gateway This preserves natural paragraph structure while fixing terminology.
#get_existing_transcript_for_seo ⇒ Object
Method specifically for SEO operations that don't require audio extraction.
#get_sentences_from_assemblyai(transcript_id) ⇒ Object
Get sentences from AssemblyAI via the shared client (words nodes stripped to reduce size).
#get_transcript_data(transcript_id) ⇒ Object
Get full transcript data using paragraphs API for more natural segmentation.
#get_vtt_from_assemblyai(transcript_id) ⇒ Object
Get VTT content from AssemblyAI via the shared client.
#initialize(video, options = {}) ⇒ TranscriptionService constructor
A new instance of TranscriptionService.
#parse_numbered_captions(polished_text, original_captions) ⇒ Object
Parse numbered captions from LLM response back into structured format.
#parse_vtt_file(vtt_content) ⇒ Object
Parse AssemblyAI VTT file and extract timing and text.
#parse_vtt_timestamp(timestamp) ⇒ Object
Parse VTT timestamp to milliseconds.
#polish_paragraphs_with_llm(paragraphs) ⇒ Object
Polish paragraphs with AssemblyAI LLM Gateway (terminology fixes only).
#polish_transcript_with_company_terminology ⇒ Object
Step 2: Polish transcript with company terminology and formatting Uses AssemblyAI's native paragraphs and polishes them directly (not individual captions) This preserves the natural paragraph structure while fixing terminology.
#polish_vtt_text(vtt_original) ⇒ Object
Polish VTT captions using the LLM.
#polish_vtt_text_regex(vtt_original) ⇒ Object
Fallback regex-based polishing (legacy).
#poll_transcription(transcript_id, progress_callback = nil) ⇒ Object
Poll for transcription completion with progress callback.
#process_vtt_from_assemblyai_and_update_structured_transcript ⇒ Object
Legacy method - now calls the three-step workflow.
#record_transcription_error(source:, endpoint:, http_status:, message:, transcript_id: nil) ⇒ Object
Persist details about the most recent transcription-related error so the UI can surface it.
#retrieve_and_overwrite_structured_transcript ⇒ Object
Downloads the completed transcript with full speaker identification, timestamps, and confidence scores.
#retrieve_existing_transcript_from_assemblyai ⇒ Object
Retrieve existing transcript from AssemblyAI.
#safe_parse_error_message(body) ⇒ Object
Extract a readable error message from an API response body.
#submit_transcription(use_webhook: false) ⇒ Hash
Submit transcription and return transcript ID (for granular control).
#summarize_video_and_update_metadata ⇒ Object
Step 4: Summarize video and update expanded description and metadata Uses AI to generate SEO-friendly meta title, description, and expanded description.
#transcribe ⇒ Object
#transcribe_audio ⇒ Object
#translate_transcript(locales = nil) ⇒ Hash
Step 3: Translate transcript and captions to specified locales Uses DeepL API to translate VTT captions and plain transcript.

Constructor Details

#initialize(video, options = {}) ⇒ `TranscriptionService`

Returns a new instance of TranscriptionService.

# File 'app/services/video_processing/transcription_service.rb', line 5

def initialize(video, options = {})
  @video = video
  @options = options.symbolize_keys
end

Instance Method Details

#apply_terminology_regex(text) ⇒ `Object`

Apply simple regex-based terminology fixes

# File 'app/services/video_processing/transcription_service.rb', line 741

def apply_terminology_regex(text)
  polisher = TranscriptionPolisherService.new(company_terminology)
  polisher.polish_utterances([text]).first || text
end

#auto_detect_no_spoken_words(transcription_data) ⇒ `Object`

Step 1: Retrieve and overwrite structured and HTML transcription
Automatically detect and mark videos as having no spoken words

# File 'app/services/video_processing/transcription_service.rb', line 508

def auto_detect_no_spoken_words(transcription_data)
  return unless transcription_data

  # Check if the transcript is essentially empty (no meaningful text)
  text = transcription_data['text'] || ''
  sentences = transcription_data['sentences'] || []

  # If we have no text or very minimal text, mark as no spoken words
  if text.strip.blank? || text.strip.length < 10 || sentences.empty?
    Rails.logger.info "Auto-detecting video as having no spoken words (text: '#{text.strip}', sentences: #{sentences.length})"
    @video.mark_as_no_spoken_words!
    return true
  end

  false
end

#count_words(text) ⇒ `Object`



1117
1118
1119

# File 'app/services/video_processing/transcription_service.rb', line 1117

def count_words(text)
  text.split(/\s+/).count
end

#ensure_transcription_completed!(transcript_id) ⇒ `Object`

Wait for AssemblyAI transcript to reach 'completed'. Returns:

true when completed
:error when AssemblyAI reports an error (and records it)
false when we time out waiting

# File 'app/services/video_processing/transcription_service.rb', line 1162

def ensure_transcription_completed!(transcript_id)
  assemblyai_client = AssemblyaiClient.instance

  # Quick check first
  begin
    status_result = assemblyai_client.get_transcription(transcript_id)
  rescue StandardError => e
    Rails.logger.warn "Initial status check failed: #{e.message}. Proceeding to poll."
    status_result = nil
  end

  if status_result
    status = status_result['status']
    case status
    when 'completed'
      Rails.logger.info 'AssemblyAI transcript already completed'
      return true
    when 'error'
      message = status_result['error'] || 'Unknown error'
      record_transcription_error(
        source: 'AssemblyAI',
        endpoint: 'status',
        http_status: 200,
        message: message,
        transcript_id: transcript_id
      )
      return :error
    end
  end

  # Poll until completion with a generous timeout based on video duration
  base_timeout = 1200 # 20 minutes
  video_duration_minutes = (@video.duration_in_seconds&.to_f&./ 60.0) || 0
  additional_timeout = (video_duration_minutes * 120).to_i # +2 minutes per minute of video
  max_wait_time = base_timeout + additional_timeout

  begin
    assemblyai_client.poll_transcription(transcript_id, max_wait_time)
    true
  rescue StandardError => e
    # If AssemblyAI reports an explicit failure, record it; otherwise just time out
    if e.message.include?('failed') || e.message.downcase.include?('error')
      record_transcription_error(
        source: 'AssemblyAI',
        endpoint: 'status',
        http_status: nil,
        message: e.message,
        transcript_id: transcript_id
      )
      :error
    else
      Rails.logger.warn "Polling timed out or was interrupted: #{e.message}"
      false
    end
  end
end

#export_and_store_paragraphs ⇒ `Object`

Export and store paragraphs

# File 'app/services/video_processing/transcription_service.rb', line 1276

def export_and_store_paragraphs
  return nil unless @video.has_assemblyai_transcript_id?

  Rails.logger.info "Exporting paragraphs for video: #{@video.title}"

  retries = 3
  delay = 5 # seconds
  begin
    assemblyai_client = AssemblyaiClient.instance
    paragraphs_data = nil
    retries.times do |i|
      # Get paragraphs (speaker info is included by default when speaker_labels is enabled)
      paragraphs_data = assemblyai_client.export_paragraphs(@video.assemblyai_transcript_id)
      break if paragraphs_data && paragraphs_data['paragraphs'].present?
    rescue StandardError => e
      Rails.logger.warn "Attempt #{i + 1} to export paragraphs failed: #{e.message}. Retrying in #{delay} seconds..."
      sleep(delay)
    end

    if paragraphs_data && paragraphs_data['paragraphs'].present?
      # Format paragraphs as clean, readable text
      paragraphs_text = format_paragraphs_as_text(paragraphs_data['paragraphs'])

      Rails.logger.info "Successfully exported paragraphs (#{paragraphs_text.length} characters)"
      @video.update!(transcript: paragraphs_text)
      paragraphs_text
    else
      Rails.logger.error "Failed to export paragraphs - no data after #{retries} attempts."
      nil
    end
  rescue StandardError => e
    Rails.logger.error "Failed to export paragraphs: #{e.message}"
    nil
  end
end

#export_and_store_sentences ⇒ `Object`

Export and store sentences

# File 'app/services/video_processing/transcription_service.rb', line 1313

def export_and_store_sentences
  return nil unless @video.has_assemblyai_transcript_id?

  Rails.logger.info "Exporting sentences for video: #{@video.title}"

  retries = 3
  delay = 5 # seconds
  begin
    assemblyai_client = AssemblyaiClient.instance
    sentences_data = nil
    retries.times do |i|
      # Get sentences (speaker info is included by default when speaker_labels is enabled)
      sentences_data = assemblyai_client.export_sentences(@video.assemblyai_transcript_id)
      break if sentences_data && sentences_data['sentences'].present?
    rescue StandardError => e
      Rails.logger.warn "Attempt #{i + 1} to export sentences failed: #{e.message}. Retrying in #{delay} seconds..."
      sleep(delay)
    end

    if sentences_data && sentences_data['sentences'].present?
      # Format sentences as clean, readable text
      sentences_text = format_sentences_as_text(sentences_data['sentences'])

      Rails.logger.info "Successfully exported sentences (#{sentences_text.length} characters)"
      @video.update!(transcript: sentences_text)
      sentences_text
    else
      Rails.logger.error "Failed to export sentences - no data after #{retries} attempts."
      nil
    end
  rescue StandardError => e
    Rails.logger.error "Failed to export sentences: #{e.message}"
    nil
  end
end

#export_sentences(transcript_id) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 346

def export_sentences(transcript_id)
  Rails.logger.info "Exporting sentences for transcript: #{transcript_id}"

  begin
    assemblyai_client = AssemblyaiClient.instance
    sentences_data = assemblyai_client.export_sentences(transcript_id)

    if sentences_data && sentences_data['sentences'].present?
      # Format sentences as readable text
      sentences_text = format_sentences_as_text(sentences_data['sentences'])
      Rails.logger.info "Successfully exported sentences (#{sentences_text.length} characters)"
      sentences_text
    else
      Rails.logger.warn 'Failed to export sentences - no data'
      nil
    end
  rescue StandardError => e
    Rails.logger.error "Failed to export sentences: #{e.message}"
    nil
  end
end

#export_vtt_captions(transcript_id) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 368

def export_vtt_captions(transcript_id)
  Rails.logger.info "Exporting VTT captions for transcript: #{transcript_id}"

  begin
    assemblyai_client = AssemblyaiClient.instance
    vtt_content = assemblyai_client.export_vtt(transcript_id, 32)

    if vtt_content.present?
      # Create a temporary file with VTT content
      temp_file = Tempfile.new(['captions', '.vtt'], binmode: true)
      temp_file.write(vtt_content)
      temp_file.close

      # Store as upload
      upload = Upload.uploadify(
        temp_file.path,
        'captions',
        @video,
        "#{@video.title.parameterize}-captions.vtt"
      )

      # Clean up temp file
      temp_file.unlink

      Rails.logger.info "Successfully exported VTT captions (#{vtt_content.length} characters)"
      upload
    else
      Rails.logger.warn 'Failed to export VTT captions - empty content'
      nil
    end
  rescue StandardError => e
    Rails.logger.error "Failed to export VTT captions: #{e.message}"
    nil
  end
end

#format_caption_text(text, max_chars_per_line) ⇒ `Object`

Format caption text with line breaks

# File 'app/services/video_processing/transcription_service.rb', line 1069

def format_caption_text(text, max_chars_per_line)
  return text if text.length <= max_chars_per_line

  # Try to break at natural points (commas, periods, spaces)
  words = text.split(/\s+/)
  lines = []
  current_line = ''

  words.each do |word|
    if (current_line + word).length <= max_chars_per_line
      current_line += (current_line.empty? ? word : " #{word}")
    else
      lines << current_line if current_line.present?
      current_line = word
    end
  end

  lines << current_line if current_line.present?
  lines.join("\n")
end

#format_paragraphs_as_text(paragraphs) ⇒ `Object`

Format paragraphs as clean text

# File 'app/services/video_processing/transcription_service.rb', line 1350

def format_paragraphs_as_text(paragraphs)
  paragraphs.map do |paragraph|
    text = paragraph['text'] || ''
    text
  end.join("\n\n").strip
end

#format_sentences_as_text(sentences) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 1103

def format_sentences_as_text(sentences)
  sentences.map do |sentence|
    # Include timestamp if available
    timestamp = sentence['start'] ? format_timestamp(sentence['start']) : nil
    text = sentence['text'] || ''

    if timestamp
      "[#{timestamp}] #{text}"
    else
      text
    end
  end.join("\n\n")
end

#format_timestamp(milliseconds) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 1148

def format_timestamp(milliseconds)
  return '00:00' if milliseconds.nil?

  total_seconds = milliseconds / 1000
  minutes = (total_seconds / 60).to_i
  seconds = (total_seconds % 60).to_i

  format('%02d:%02d', minutes, seconds)
end

#format_transcript_as_html(structured_data) ⇒ `Object`

Format transcript as HTML using paragraphs for more natural segmentation

# File 'app/services/video_processing/transcription_service.rb', line 275

def format_transcript_as_html(structured_data)
  Rails.logger.info "Formatting transcript as HTML for video #{@video.id}..."

  # Extract paragraphs from structured data (mapped from utterances for compatibility)
  paragraphs = structured_data['utterances'] || []

  # Format each paragraph as a simple paragraph with data attributes
  formatted_paragraphs = paragraphs.map do |paragraph|
    start_time = format_timestamp(paragraph['start'])
    end_time = format_timestamp(paragraph['end'])
    text = paragraph['text']
    confidence = paragraph['confidence']

    # Create data attributes for timing and confidence (no speaker info in paragraphs API)
    data_attrs = {
      'data-start': start_time,
      'data-end': end_time,
      'data-confidence': confidence
    }

    # Convert data attributes hash to HTML string
    data_attr_string = data_attrs.map { |key, value| "#{key}=\"#{value}\"" }.join(' ')

    "<p #{data_attr_string}>#{text}</p>"
  end

  formatted_paragraphs.join("\n\n")
end

#format_transcript_data(transcription_data) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 304

def format_transcript_data(transcription_data)
  Rails.logger.debug('Formatting transcript data')

  # Extract the full text from the transcription data
  full_text = transcription_data['text'] || ''

  Rails.logger.info "Extracted text length: #{full_text.length} characters"

  # NOTE: SEO content generation is now handled by the worker, not this service
  seo_content = nil

  Rails.logger.info 'SEO content generation skipped (handled by worker)'

  # Store the raw structured transcript JSON
  @video.structured_transcript_json = transcription_data

  # Export paragraphs and store in transcript field
  paragraphs_result = export_and_store_paragraphs
  if paragraphs_result
    Rails.logger.info 'Successfully exported and stored paragraphs'
  else
    Rails.logger.warn 'Failed to export paragraphs, using plain text transcript'
    # Fallback to plain text transcript if paragraphs export fails
    plain_text = transcription_data['text'].gsub(/<[^>]*>/, '').strip
    @video.update!(transcript: plain_text)
  end

  # VTT captions are now generated dynamically on-demand

  # Generate result with paragraphs
  result = {
    html: @video.transcript, # Use the updated transcript content
    transcript: @video.transcript, # Store paragraphs in transcript field (already updated by export_and_store_paragraphs)
    assemblyai_transcript_id: transcription_data['id'], # Store the transcript ID
    seo_content: seo_content,
    duration_in_seconds: transcription_data['audio_duration']&.to_i
  }

  Rails.logger.info "Final result: #{result.keys}"
  result
end

#format_transcript_to_html_with_speakers(transcription_data) ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 1121

def format_transcript_to_html_with_speakers(transcription_data)
  html_content = +''

  # Add speaker diarization if available
  if transcription_data['utterances'].present?
    transcription_data['utterances'].each do |utterance|
      speaker = utterance['speaker'] || 'Unknown'
      start_time = format_timestamp(utterance['start'])
      end_time = format_timestamp(utterance['end'])
      text = utterance['text']
      confidence = utterance['confidence']

      html_content << '<div class="utterance">'
      html_content << "<span class=\"timestamp\">[#{start_time} - #{end_time}]</span> "
      html_content << "<span class=\"speaker\">#{speaker}:</span> "
      html_content << "<span class=\"text\">#{text}</span>"
      html_content << "<span class=\"confidence\"> (#{confidence}%)</span>" if confidence
      html_content << "</div>\n"
    end
  else
    # Fallback to plain text if no speaker data
    html_content << "<p>#{transcription_data['text']}</p>"
  end

  html_content
end

#format_vtt_timestamp(milliseconds) ⇒ `Object`

Format timestamp for VTT (HH:MM:SS.mmm)

# File 'app/services/video_processing/transcription_service.rb', line 1091

def format_vtt_timestamp(milliseconds)
  return '00:00:00.000' if milliseconds.nil?

  total_seconds = milliseconds / 1000.0
  hours = (total_seconds / 3600).to_i
  minutes = ((total_seconds % 3600) / 60).to_i
  seconds = (total_seconds % 60).to_i
  millis = (milliseconds % 1000).to_i

  format('%02d:%02d:%02d.%03d', hours, minutes, seconds, millis)
end

#generate_html_transcript_from_paragraphs(paragraphs) ⇒ `Object`

Generate HTML transcript from paragraphs

# File 'app/services/video_processing/transcription_service.rb', line 1060

def generate_html_transcript_from_paragraphs(paragraphs)
  return '' if paragraphs.blank?

  paragraphs.map do |paragraph|
    "<p>#{paragraph['text']}</p>"
  end.join("\n")
end

#generate_paragraphs_by_chunking(vtt_polished, captions_per_paragraph: 15) ⇒ `Object`

Fallback: Generate paragraphs by chunking captions into groups
Used when native AssemblyAI paragraphs are not available

# File 'app/services/video_processing/transcription_service.rb', line 1041

def generate_paragraphs_by_chunking(vtt_polished, captions_per_paragraph: 15)
  Rails.logger.info "Generating paragraphs by chunking (#{captions_per_paragraph} captions per paragraph)"

  return [] if vtt_polished.blank?

  paragraphs = vtt_polished.each_slice(captions_per_paragraph).map do |caption_group|
    text = caption_group.map { |c| c['text'] }.join(' ').strip
    first_c = caption_group.first
    last_c = caption_group.last
    start_ms = first_c['start_time'] || first_c['start']
    end_ms = last_c['end_time'] || last_c['end']
    { 'text' => text, 'start' => start_ms, 'end' => end_ms }
  end.reject { |p| p['text'].blank? }

  Rails.logger.info "Generated #{paragraphs.length} paragraphs by chunking"
  paragraphs
end

#generate_paragraphs_from_polished_text(vtt_polished) ⇒ `Object`

Generate paragraphs from polished VTT text using AssemblyAI's native paragraph API
This uses AssemblyAI's built-in paragraph detection (based on pauses, topic changes, etc.)
and maps the polished captions to those natural paragraph breaks.

# File 'app/services/video_processing/transcription_service.rb', line 976

def generate_paragraphs_from_polished_text(vtt_polished)
  Rails.logger.info 'Generating paragraphs using AssemblyAI native paragraph API'

  begin
    # Step 1: Get native paragraphs from AssemblyAI (they already analyzed the audio for natural breaks)
    assemblyai = AssemblyaiClient.instance
    paragraphs_data = assemblyai.export_paragraphs(@video.assemblyai_transcript_id)

    unless paragraphs_data && paragraphs_data['paragraphs'].present?
      Rails.logger.warn 'No native paragraphs available from AssemblyAI, falling back to simple chunking'
      return generate_paragraphs_by_chunking(vtt_polished)
    end

    native_paragraphs = paragraphs_data['paragraphs']
    Rails.logger.info "Retrieved #{native_paragraphs.length} native paragraphs from AssemblyAI"

    # Step 2: Assign each polished caption to exactly ONE paragraph based on start time
    # This prevents captions from appearing in multiple paragraphs
    caption_to_paragraph = {}
    vtt_polished.each_with_index do |caption, caption_idx|
      caption_start = caption['start_time'] || caption['start']
      next unless caption_start

      # Find the paragraph where this caption's start time falls
      para_idx = native_paragraphs.find_index do |para|
        caption_start >= para['start'] && caption_start < para['end']
      end

      # If no exact match, find the closest paragraph
      if para_idx.nil?
        para_idx = native_paragraphs.each_with_index.min_by do |para, _idx|
          [(para['start'] - caption_start).abs, (para['end'] - caption_start).abs].min
        end&.last
      end

      caption_to_paragraph[caption_idx] = para_idx if para_idx
    end

    # Step 3: Group captions by paragraph and build paragraph text
    polished_paragraphs = native_paragraphs.each_with_index.map do |native_para, para_idx|
      # Get all captions assigned to this paragraph, maintaining order
      assigned_caption_indices = caption_to_paragraph.select { |_cap_idx, p_idx| p_idx == para_idx }.keys.sort
      matching_captions = assigned_caption_indices.map { |idx| vtt_polished[idx] }

      # Combine the polished caption texts into the paragraph
      polished_text = matching_captions.map { |c| c['text'] }.join(' ').strip

      # If no polished text found, use the original native paragraph text
      polished_text = native_para['text'] if polished_text.blank?

      { 'text' => polished_text }
    end.reject { |p| p['text'].blank? }

    Rails.logger.info "Generated #{polished_paragraphs.length} paragraphs using native AssemblyAI structure with polished text"
    polished_paragraphs
  rescue StandardError => e
    Rails.logger.error "Error generating paragraphs with AssemblyAI native API: #{e.message}"
    Rails.logger.error e.backtrace.join("\n")
    # Fall back to simple chunking if native API fails
    generate_paragraphs_by_chunking(vtt_polished)
  end
end

#generate_vtt_content_from_polished_vtt ⇒ `Object`

Generate VTT content from polished VTT data

# File 'app/services/video_processing/transcription_service.rb', line 419

def generate_vtt_content_from_polished_vtt
  Rails.logger.info 'Generating VTT content from polished VTT data'

  vtt_data = @video.structured_transcript_json['vtt_polished']
  return nil if vtt_data.blank?

  # Start with VTT header
  vtt_lines = ['WEBVTT', '', '']

  caption_index = 1

  vtt_data.each do |caption|
    start_time = caption['start_time']
    end_time = caption['end_time']
    text = caption['text']

    next unless start_time && end_time && text.present?

    vtt_lines << caption_index.to_s
    vtt_lines << "#{format_vtt_timestamp(start_time)} --> #{format_vtt_timestamp(end_time)}"
    vtt_lines << text
    vtt_lines << ''
    caption_index += 1
  end

  vtt_lines.join("\n")
end

#generate_vtt_content_from_structured_transcript ⇒ `Object`

Generate VTT content from structured transcript

# File 'app/services/video_processing/transcription_service.rb', line 405

def generate_vtt_content_from_structured_transcript
  Rails.logger.info "Generating VTT content from structured transcript for video #{@video.id}"

  return nil unless @video.structured_transcript_json.present?

  # Use the polished VTT data if available
  if @video.structured_transcript_json['vtt_polished'].present?
    generate_vtt_content_from_polished_vtt
  else
    nil
  end
end

#get_and_polish_native_paragraphs ⇒ `Object`

Get native paragraphs from AssemblyAI and polish them with LLM Gateway
This preserves natural paragraph structure while fixing terminology

# File 'app/services/video_processing/transcription_service.rb', line 637

def get_and_polish_native_paragraphs
  Rails.logger.info 'Getting native paragraphs from AssemblyAI and polishing with LLM Gateway'

  assemblyai = AssemblyaiClient.instance
  paragraphs_data = assemblyai.export_paragraphs(@video.assemblyai_transcript_id)

  unless paragraphs_data && paragraphs_data['paragraphs'].present?
    Rails.logger.warn 'No native paragraphs available, falling back to chunking'
    return generate_paragraphs_by_chunking(@video.vtt_original_data)
  end

  native_paragraphs = paragraphs_data['paragraphs']
  Rails.logger.info "Retrieved #{native_paragraphs.length} native paragraphs from AssemblyAI"

  # Polish the paragraphs in batches using AssemblyAI LLM Gateway
  polished_paragraphs = polish_paragraphs_with_llm(native_paragraphs)

  Rails.logger.info "Polished #{polished_paragraphs.length} paragraphs"
  polished_paragraphs
end

#get_existing_transcript_for_seo ⇒ `Object`

Method specifically for SEO operations that don't require audio extraction

# File 'app/services/video_processing/transcription_service.rb', line 52

def get_existing_transcript_for_seo
  Rails.logger.info 'Getting existing transcript data for SEO operations...'

  # If we have structured transcript JSON, use that
  if @video.structured_transcript_json.present?
    Rails.logger.info 'Using existing structured transcript JSON'
    return {
      html: @video.transcript,
      transcript: @video.transcript,
      structured_data: @video.structured_transcript_json,
      duration_in_seconds: @video.duration_in_seconds
    }
  end

  # If we have transcript text, use that
  if @video.transcript.present?
    Rails.logger.info 'Using existing transcript text'
    return {
      html: @video.transcript,
      transcript: @video.transcript,
      duration_in_seconds: @video.duration_in_seconds
    }
  end

  # If we have AssemblyAI transcript ID but no data, try to retrieve it
  if @video.can_retrieve_existing_transcript?
    Rails.logger.info 'Attempting to retrieve existing transcript from AssemblyAI for SEO'
    existing_result = retrieve_existing_transcript_from_assemblyai

    if existing_result
      Rails.logger.info 'Successfully retrieved existing transcript for SEO'
      result = format_transcript_data(existing_result)
      @video.update_transcript_data(result)
      return result
    else
      Rails.logger.warn 'Failed to retrieve existing transcript for SEO'
      return nil
    end
  end

  Rails.logger.warn 'No transcript data available for SEO operations'
  nil
end

#get_sentences_from_assemblyai(transcript_id) ⇒ `Object`

Get sentences from AssemblyAI via the shared client (words nodes stripped to reduce size).

# File 'app/services/video_processing/transcription_service.rb', line 854

def get_sentences_from_assemblyai(transcript_id)
  Rails.logger.info "Getting sentences from AssemblyAI API for transcript: #{transcript_id}"

  begin
    sentences_data = AssemblyaiClient.instance.export_sentences(transcript_id)

    # Remove verbose 'words' nodes from each sentence to reduce JSON size
    if sentences_data['sentences'].present?
      sentences_data['sentences'].each { |s| s.delete('words') }
    end

    Rails.logger.info 'Successfully retrieved sentences from AssemblyAI API'
    Rails.logger.info "Sentences data keys: #{sentences_data.keys}"
    Rails.logger.info "Sentences count: #{sentences_data['sentences']&.length}"
    sentences_data
  rescue StandardError => e
    Rails.logger.error "Error getting sentences from AssemblyAI API: #{e.message}"
    record_transcription_error(
      source: 'AssemblyAI', endpoint: 'sentences', http_status: nil,
      message: e.message, transcript_id: transcript_id
    )
    nil
  end
end

#get_transcript_data(transcript_id) ⇒ `Object`

Get full transcript data using paragraphs API for more natural segmentation

# File 'app/services/video_processing/transcription_service.rb', line 246

def get_transcript_data(transcript_id)
  Rails.logger.info "Retrieving full transcript data using paragraphs API: #{transcript_id}"

  assemblyai_client = AssemblyaiClient.instance
  paragraphs_data = assemblyai_client.export_paragraphs(transcript_id)

  Rails.logger.info "Retrieved paragraphs data with #{paragraphs_data['paragraphs']&.length || 0} paragraphs"
  Rails.logger.info "Words count: #{paragraphs_data['paragraphs']&.sum { |p| p['words']&.length || 0 } || 0}"

  # Structure the data consistently, using paragraphs instead of utterances
  # Exclude 'words' arrays to improve performance when displaying with pretty_json_tag
  cleaned_paragraphs = (paragraphs_data['paragraphs'] || []).map do |paragraph|
    # Remove the 'words' array from each paragraph to reduce JSON size
    paragraph.except('words')
  end

  {
    'id' => transcript_id,
    'status' => 'completed',
    'confidence' => paragraphs_data['confidence'],
    'audio_duration' => paragraphs_data['audio_duration'],
    'utterances' => cleaned_paragraphs, # Map cleaned paragraphs to utterances for compatibility
    'speaker_labels' => false, # Paragraphs API doesn't include speaker information
    'text' => cleaned_paragraphs.map { |p| p['text'] }.join(' ') || '',
    'language_code' => 'en_us' # Default language code
  }
end

#get_vtt_from_assemblyai(transcript_id) ⇒ `Object`

Get VTT content from AssemblyAI via the shared client.

# File 'app/services/video_processing/transcription_service.rb', line 841

def get_vtt_from_assemblyai(transcript_id)
  Rails.logger.info "Getting VTT content from AssemblyAI API for transcript: #{transcript_id}"
  AssemblyaiClient.instance.export_vtt(transcript_id)
rescue StandardError => e
  Rails.logger.error "Error getting VTT from AssemblyAI API: #{e.message}"
  record_transcription_error(
    source: 'AssemblyAI', endpoint: 'vtt', http_status: nil,
    message: e.message, transcript_id: transcript_id
  )
  nil
end

#parse_numbered_captions(polished_text, original_captions) ⇒ `Object`

Parse numbered captions from LLM response back into structured format

# File 'app/services/video_processing/transcription_service.rb', line 946

def parse_numbered_captions(polished_text, original_captions)
  # Extract numbered captions using regex
  caption_pattern = /\[(\d+)\]\s*(.+?)(?=\[\d+\]|\z)/m
  matches = polished_text.scan(caption_pattern)

  # Build a hash of caption number -> text
  polished_map = {}
  matches.each do |match|
    num = match[0].to_i
    text = match[1].strip.gsub(/\n+/, ' ')
    polished_map[num] = text
  end

  # Combine with original timing
  polished_captions = []
  original_captions.each_with_index do |caption, index|
    caption_num = index + 1
    polished_captions << {
      'start_time' => caption['start_time'],
      'end_time' => caption['end_time'],
      'text' => polished_map[caption_num] || caption['text']
    }
  end

  polished_captions
end

#parse_vtt_file(vtt_content) ⇒ `Object`

Parse AssemblyAI VTT file and extract timing and text

# File 'app/services/video_processing/transcription_service.rb', line 448

def parse_vtt_file(vtt_content)
  Rails.logger.info 'Parsing VTT file content'

  captions = []
  lines = vtt_content.split("\n")
  current_caption = nil

  lines.each do |line|
    line = line.strip
    next if line.empty? || line == 'WEBVTT'

    # Check if line contains timestamp (format: MM:SS.mmm --> MM:SS.mmm or HH:MM:SS.mmm --> HH:MM:SS.mmm)
    if line.match?(/\d{2}:\d{2}(:\d{2})?\.\d{3}\s+-->\s+\d{2}:\d{2}(:\d{2})?\.\d{3}/)
      captions << current_caption if current_caption

      start_time, end_time = line.split(' --> ')
      current_caption = {
        'start_time' => parse_vtt_timestamp(start_time),
        'end_time' => parse_vtt_timestamp(end_time),
        'text' => ''
      }
    elsif current_caption && line.present?
      # This is caption text
      current_caption['text'] += (current_caption['text'].empty? ? '' : ' ') + line
    end
  end

  # Add the last caption
  captions << current_caption if current_caption

  captions
end

#parse_vtt_timestamp(timestamp) ⇒ `Object`

Parse VTT timestamp to milliseconds

# File 'app/services/video_processing/transcription_service.rb', line 482

def parse_vtt_timestamp(timestamp)
  return 0 if timestamp.blank?

  parts = timestamp.split(':')

  if parts.length == 3
    # Format: HH:MM:SS.mmm
    hours = parts[0].to_i
    minutes = parts[1].to_i
    seconds_parts = parts[2].split('.')
    seconds = seconds_parts[0].to_i
    milliseconds = seconds_parts[1].to_i
  else
    # Format: MM:SS.mmm
    hours = 0
    minutes = parts[0].to_i
    seconds_parts = parts[1].split('.')
    seconds = seconds_parts[0].to_i
    milliseconds = seconds_parts[1].to_i
  end

  (((hours * 3600) + (minutes * 60) + seconds) * 1000) + milliseconds
end

#polish_paragraphs_with_llm(paragraphs) ⇒ `Object`

Polish paragraphs with AssemblyAI LLM Gateway (terminology fixes only)

# File 'app/services/video_processing/transcription_service.rb', line 659

def polish_paragraphs_with_llm(paragraphs)
  # Process in batches to stay within token limits
  batch_size = 30
  polished_results = []

  paragraphs.each_slice(batch_size).with_index do |batch, batch_idx|
    Rails.logger.info "Polishing paragraph batch #{batch_idx + 1}/#{(paragraphs.length.to_f / batch_size).ceil}"

    # Format paragraphs with numbers for tracking
    numbered_text = batch.map.with_index { |p, i| "[#{batch_idx * batch_size + i + 1}] #{p['text']}" }.join("\n\n")

    # Build terminology list from single source (company_terminology hash)
    terminology_lines = company_terminology.map { |from, to| "- \"#{from}\" → \"#{to}\"" }.join("\n")

    system_prompt = <<~PROMPT
      You are a transcript editor for WarmlyYours, a radiant heating company.
      Your task is to lightly polish spoken language for written readability while preserving the speaker's voice.

      DO:
      - Fix terminology and proper nouns (see list below)
      - Clean up false starts and filler phrases (e.g., "you know", "um", "like")
      - Fix awkward spoken constructions that don't read well (e.g., "we're going to be having you have myself here" → "I'll be joining you today")
      - Correct obvious grammatical errors from speech-to-text

      DO NOT:
      - Change the meaning or intent
      - Make it sound overly formal or scripted
      - Add information not present in the original
      - Change paragraph boundaries

      Terminology to correct:
      #{terminology_lines}

      Return the paragraphs with the same [number] format. Keep all paragraph breaks intact.
    PROMPT

    begin
      thinking_budget = 4096
      response = RubyLLM::Instrumentation.with(feature: 'video_polish') do
        VideoProcessing::PolishAgent.chat
          .with_temperature(0.1)
          .with_params(generationConfig: {
            maxOutputTokens: 8000 + thinking_budget,
            thinkingConfig: { thinkingBudget: thinking_budget }
          })
          .with_instructions(system_prompt)
          .ask(numbered_text)
      end

      polished_text = response.content || ''

      # Parse numbered paragraphs back
      batch.each_with_index do |original_para, i|
        para_num = batch_idx * batch_size + i + 1
        # Extract polished text for this paragraph number
        match = polished_text.match(/\[#{para_num}\]\s*(.+?)(?=\[\d+\]|\z)/m)
        polished = match ? match[1].strip : original_para['text']
        # Preserve AssemblyAI timing — required for YouTube chapters, SEO timing, etc.
        polished_results << {
          'text' => polished,
          'start' => original_para['start'],
          'end' => original_para['end']
        }
      end
    rescue StandardError => e
      Rails.logger.error "Error polishing batch #{batch_idx + 1}: #{e.message}"
      # Fallback: use original paragraphs with regex polish
      batch.each do |para|
        polished_text = apply_terminology_regex(para['text'])
        polished_results << {
          'text' => polished_text,
          'start' => para['start'],
          'end' => para['end']
        }
      end
    end
  end

  polished_results
end

#polish_transcript_with_company_terminology ⇒ `Object`

Step 2: Polish transcript with company terminology and formatting
Uses AssemblyAI's native paragraphs and polishes them directly (not individual captions)
This preserves the natural paragraph structure while fixing terminology.

# File 'app/services/video_processing/transcription_service.rb', line 603

def polish_transcript_with_company_terminology
  Rails.logger.info 'Polishing transcript with company terminology and formatting'

  return unless @video.has_assemblyai_transcript_id?

  # Get current structured transcript JSON
  current_json = @video.structured_transcript_json || {}

  return unless @video.vtt_original_data.present?

  # Step 1: Get native paragraphs directly from AssemblyAI (already well-formed)
  paragraphs = get_and_polish_native_paragraphs

  # Step 2: Polish VTT captions separately with simple regex (for subtitles only)
  vtt_polished = polish_vtt_text_regex(@video.vtt_original_data)

  # Generate HTML transcript from paragraphs
  html_transcript = generate_html_transcript_from_paragraphs(paragraphs)

  # Update the structured transcript JSON with polished data
  current_json['vtt_polished'] = vtt_polished
  current_json['paragraphs'] = paragraphs

  # Save the updated structured transcript JSON and HTML transcript
  @video.update!(
    structured_transcript_json: current_json,
    transcript: html_transcript
  )

  Rails.logger.info "Polished transcript: #{vtt_polished.length} captions, #{paragraphs.length} paragraphs, HTML transcript saved"
end

#polish_vtt_text(vtt_original) ⇒ `Object`

Polish VTT captions using the LLM.
Provides context-aware polishing that fixes terminology, grammar, and flow.

# File 'app/services/video_processing/transcription_service.rb', line 881

def polish_vtt_text(vtt_original)
  Rails.logger.info 'Polishing VTT text using LLM'

  begin
    # Format captions with numbers for LLM processing
    captions_text = vtt_original.map.with_index { |c, i| "[#{i + 1}] #{c['text']}" }.join("\n")

    # Get prompts from Settings
    system_prompt = Setting.video_processing_polish_system_prompt
    user_prompt_template = Setting.video_processing_polish_user_prompt
    model = LlmDefaults::DEFAULT_SONNET_MODEL
    max_tokens = Setting.video_processing_llm_max_tokens || 8000
    temperature = Setting.video_processing_llm_temperature || 0.2

    # Substitute placeholder
    user_prompt = user_prompt_template.gsub('{{captions}}', captions_text)

    thinking_budget = 4096
    response = RubyLLM::Instrumentation.with(feature: 'video_polish') do
      VideoProcessing::PolishAgent.chat
        .with_temperature(temperature)
        .with_params(generationConfig: {
          maxOutputTokens: max_tokens + thinking_budget,
          thinkingConfig: { thinkingBudget: thinking_budget }
        })
        .with_instructions(system_prompt)
        .ask(user_prompt)
    end

    polished_text = response.content || ''

    # Parse the polished captions back into structured format
    polished_captions = parse_numbered_captions(polished_text, vtt_original)

    Rails.logger.info "Polished #{polished_captions.length} captions using LLM"
    polished_captions
  rescue StandardError => e
    Rails.logger.error "Error polishing with LLM: #{e.message}, falling back to regex"
    Rails.logger.error e.backtrace.first(5).join("\n")
    # Fallback to regex-based polishing
    polish_vtt_text_regex(vtt_original)
  end
end

#polish_vtt_text_regex(vtt_original) ⇒ `Object`

Fallback regex-based polishing (legacy)

# File 'app/services/video_processing/transcription_service.rb', line 926

def polish_vtt_text_regex(vtt_original)
  Rails.logger.info 'Polishing VTT text using regexp-based corrections (fallback)'

  caption_texts = vtt_original.map { |caption| caption['text'] }
  polisher = TranscriptionPolisherService.new(company_terminology)
  polished_texts = polisher.polish_utterances(caption_texts)

  polished_captions = []
  vtt_original.each_with_index do |caption, index|
    polished_captions << {
      'start_time' => caption['start_time'],
      'end_time' => caption['end_time'],
      'text' => polished_texts[index] || caption['text']
    }
  end

  polished_captions
end

#poll_transcription(transcript_id, progress_callback = nil) ⇒ `Object`

Poll for transcription completion with progress callback

# File 'app/services/video_processing/transcription_service.rb', line 226

def poll_transcription(transcript_id, progress_callback = nil)
  Rails.logger.info "Polling for transcription completion: #{transcript_id}"

  # Calculate timeout based on video duration (if available)
  # Default to 20 minutes, add 2 minutes per minute of video
  base_timeout = 1200 # 20 minutes
  video_duration_minutes = (@video.duration_in_seconds&.to_f&./ 60.0) || 0
  additional_timeout = video_duration_minutes * 120 # 2 minutes per minute of video
  max_wait_time = (base_timeout + additional_timeout).to_i

  Rails.logger.info "Using timeout of #{max_wait_time} seconds for video duration of #{video_duration_minutes.round(1)} minutes"

  assemblyai_client = AssemblyaiClient.instance
  result = assemblyai_client.poll_transcription(transcript_id, max_wait_time, progress_callback)

  Rails.logger.info 'AssemblyAI transcription completed successfully'
  result
end

#process_vtt_from_assemblyai_and_update_structured_transcript ⇒ `Object`

Legacy method - now calls the three-step workflow

# File 'app/services/video_processing/transcription_service.rb', line 823

def process_vtt_from_assemblyai_and_update_structured_transcript
  Rails.logger.info 'Processing VTT from AssemblyAI API and updating structured transcript JSON'

  return unless @video.has_assemblyai_transcript_id?

  # Step 1: Retrieve VTT
  retrieve_and_overwrite_structured_transcript

  # Step 2: Polish transcript
  polish_transcript_with_company_terminology

  # Step 3: Generate metadata
  summarize_video_and_update_metadata

  Rails.logger.info 'Completed full VTT processing workflow'
end

#record_transcription_error(source:, endpoint:, http_status:, message:, transcript_id: nil) ⇒ `Object`

Persist details about the most recent transcription-related error so the UI can surface it

# File 'app/services/video_processing/transcription_service.rb', line 1220

def record_transcription_error(source:, endpoint:, http_status:, message:, transcript_id: nil)
  current_json = @video.structured_transcript_json || {}
  current_json['errors'] ||= {}

  key = case endpoint
        when 'sentences' then 'assemblyai_sentences'
        when 'vtt' then 'assemblyai_vtt'
        else "#{source.to_s.downcase}_#{endpoint}"
        end

  current_json['errors'][key] = {
    'source' => source,
    'endpoint' => endpoint,
    'http_status' => http_status,
    'message' => message,
    'transcript_id' => transcript_id,
    'at' => Time.current.iso8601
  }.compact

  @video.update!(structured_transcript_json: current_json)
rescue StandardError => e
  Rails.logger.warn "Failed to record transcription error on video #{@video.id}: #{e.message}"
end

#retrieve_and_overwrite_structured_transcript ⇒ `Object`

Downloads the completed transcript with full speaker identification, timestamps, and confidence scores.
Formats as HTML and exports VTT captions for video players.

# File 'app/services/video_processing/transcription_service.rb', line 527

def retrieve_and_overwrite_structured_transcript
  Rails.logger.info 'Retrieving and overwriting structured transcript from AssemblyAI'

  return unless @video.has_assemblyai_transcript_id?

  # Ensure the AssemblyAI transcript has completed before attempting exports
  completion = ensure_transcription_completed!(@video.assemblyai_transcript_id)
  case completion
  when :error
    Rails.logger.warn 'Aborting retrieval: AssemblyAI transcript status is error'
    return false
  when false
    Rails.logger.warn 'Aborting retrieval: Timed out waiting for AssemblyAI transcript to complete'
    return false
  end

  # First, get the sentences to check status and get comprehensive transcript data
  sentences_data = get_sentences_from_assemblyai(@video.assemblyai_transcript_id)
  return unless sentences_data

  # Check if we have sentences data (if we do, transcription is completed)
  if sentences_data['sentences'].nil?
    Rails.logger.warn 'No sentences data found. Transcription may not be completed yet.'
    return false
  end

  # Check if we have 0 sentences (meaning no speech was detected)
  if sentences_data['sentences'].empty?
    Rails.logger.info 'Auto-detecting video as having no spoken words (0 sentences returned from AssemblyAI)'
    @video.mark_as_no_spoken_words!
    return false
  end

  Rails.logger.info 'Transcription completed - proceeding with VTT retrieval'

  # Get VTT content directly from AssemblyAI API
  vtt_content = get_vtt_from_assemblyai(@video.assemblyai_transcript_id)
  return unless vtt_content

  # Parse the VTT content
  vtt_original = parse_vtt_file(vtt_content)

  # Check if the transcript is essentially empty and auto-mark as no spoken words
  if vtt_original.empty? || vtt_original.all? { |caption| caption['text'].strip.blank? }
    Rails.logger.info 'Auto-detecting video as having no spoken words (empty VTT content)'
    @video.mark_as_no_spoken_words!
    return false
  end

  # Get current structured transcript JSON
  current_json = @video.structured_transcript_json || {}

  # Update with original VTT data and sentences data
  current_json['vtt_original'] = vtt_original
  current_json['sentences'] = sentences_data

  # Remove any existing polished data since we're starting fresh
  current_json.delete('vtt_polished')
  current_json.delete('paragraphs')
  current_json.delete('utterances')
  current_json.delete('original_transcript')

  # Clear any old errors since we successfully retrieved the transcript
  current_json.delete('errors')

  # Save the updated structured transcript JSON
  @video.update!(structured_transcript_json: current_json)

  Rails.logger.info "Retrieved and stored #{vtt_original.length} VTT captions from AssemblyAI"
  Rails.logger.info "Retrieved sentences: #{sentences_data['sentences']&.length || 0} sentences"
  true
end

#retrieve_existing_transcript_from_assemblyai ⇒ `Object`

Retrieve existing transcript from AssemblyAI

# File 'app/services/video_processing/transcription_service.rb', line 1253

def retrieve_existing_transcript_from_assemblyai
  return nil unless @video.can_retrieve_existing_transcript?

  Rails.logger.info "Retrieving existing transcript from AssemblyAI: #{@video.assemblyai_transcript_id}"

  begin
    assemblyai_client = AssemblyaiClient.instance
    result = assemblyai_client.get_transcription(@video.assemblyai_transcript_id)

    if result['status'] == 'completed'
      Rails.logger.info 'Successfully retrieved existing transcript'
      result
    else
      Rails.logger.warn "Existing transcript not completed: #{result['status']}"
      nil
    end
  rescue StandardError => e
    Rails.logger.error "Failed to retrieve existing transcript: #{e.message}"
    nil
  end
end

#safe_parse_error_message(body) ⇒ `Object`

Extract a readable error message from an API response body

# File 'app/services/video_processing/transcription_service.rb', line 1245

def safe_parse_error_message(body)
  json = JSON.parse(body)
  json['error'] || json['message'] || body.to_s
rescue StandardError
  body.to_s
end

#submit_transcription(use_webhook: false) ⇒ `Hash`

Submit transcription and return transcript ID (for granular control)

Parameters:

use_webhook (Boolean) (defaults to: false) —
If true, uses webhook callback instead of polling (default: false for backwards compat)

Returns:

(Hash) —
Result with :transcript_id and :mode

# File 'app/services/video_processing/transcription_service.rb', line 151

def submit_transcription(use_webhook: false)
  Rails.logger.info 'Submitting video to AssemblyAI for transcription...'

  # Check if audio download is ready
  raise 'Audio download is not ready yet. Please enable audio download on Cloudflare and wait for it to process.' unless @video.audio_download_ready?

  # Use the audio download URL for better transcription quality
  media_url = @video.cloudflare_audio_download_url
  raise 'Failed to get Cloudflare audio download URL' if media_url.blank?

  Rails.logger.info "Using Cloudflare audio download URL for AssemblyAI: #{media_url}"

  # Mark as processing
  @video.update!(transcription_state: :processing)

  # Use AssemblyAI client to submit transcription
  assemblyai_client = AssemblyaiClient.instance
  transcription_options = {
    language_code: 'en_us',
    punctuate: true,
    format_text: true,
    speaker_labels: true,
    auto_highlights: false,
    entity_detection: false,
    iab_categories: false,
    auto_chapters: false, # Disabled due to issues
    content_safety: false,
    speech_models: ['universal-3-pro'],
    keyterms_prompt: keyterms_for_assemblyai # Improve accuracy with domain-specific terms
  }

  # Only include speakers_expected if specified
  transcription_options[:speakers_expected] = @options[:speakers_expected] if @options[:speakers_expected]

  # Add webhook URL if using webhook mode
  if use_webhook
    webhook_url = AssemblyaiCallbackTokenService.video_webhook_url(video_id: @video.id)
    transcription_options[:webhook_url] = webhook_url
    Rails.logger.info "[VideoTranscription] Using webhook URL: #{webhook_url.truncate(100)}"
  end

  transcript_id = assemblyai_client.submit_transcription(media_url, transcription_options)

  Rails.logger.info "AssemblyAI transcription submitted with ID: #{transcript_id}"

  # Store the transcript ID
  @video.update!(assemblyai_transcript_id: transcript_id)

  if use_webhook
    # Create a pending WebhookLog entry to track this submission
    # This allows us to detect jobs that never received a callback
    webhook_data = {
      transcript_id: transcript_id,
      submitted_at: Time.current.iso8601,
      video_title: @video.title.truncate(100)
    }

    # Store the user who requested the transcription for notification
    webhook_data[:requested_by_id] = @options[:requested_by_id] if @options[:requested_by_id].present?

    WebhookLog.create_pending!(
      provider: 'assemblyai',
      category: 'transcription_complete',
      resource_type: 'Video',
      resource_id: @video.id,
      data: webhook_data
    )

    { transcript_id: transcript_id, mode: :webhook, status: :submitted }
  else
    transcript_id
  end
end

#summarize_video_and_update_metadata ⇒ `Object`

Step 4: Summarize video and update expanded description and metadata
Uses AI to generate SEO-friendly meta title, description, and expanded description.

# File 'app/services/video_processing/transcription_service.rb', line 777

def summarize_video_and_update_metadata
  Rails.logger.info 'Summarizing video and updating metadata'

  return unless @video.has_assemblyai_transcript_id?
  return unless @video.structured_transcript_paragraphs.present?

  # Use the dedicated SEO service
  seo_service = VideoProcessing::SeoService.new(@video)
  seo_content = seo_service.generate_seo_content

  if seo_content['status'] == 'success'
    # Update only fields that are present in the LLM response. This prevents
    # partial structured-output recoveries from clearing existing SEO fields.
    updated_fields = []
    Mobility.with_locale(:en) do
      {
        sub_header: 'sub_header',
        meta_title: 'meta_title',
        meta_description: 'meta_description',
        expanded_description: 'expanded_description'
      }.each do |attribute, key|
        next if seo_content[key].blank?

        @video.public_send("#{attribute}=", seo_content[key])
        updated_fields << attribute
      end

      @video.save! if updated_fields.any?
    end

    if updated_fields.any?
      # Clean up any superfluous en-US and en-CA translations that may have been created
      # These should not exist since we only generate English content for the :en locale
      cleanup_superfluous_english_translations

      Rails.logger.info "Successfully generated SEO metadata fields: #{updated_fields.join(', ')}"
    else
      Rails.logger.warn "SEO generation returned success for video #{@video.id}, but no fields were present to persist"
    end
  else
    Rails.logger.error "Failed to generate SEO metadata: #{seo_content['message']}"
    raise "SEO generation failed: #{seo_content['message']}"
  end
end

#transcribe ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 10

def transcribe
  # Check if we should skip transcription
  if @video.should_skip_transcription?
    Rails.logger.info 'Skipping transcription - video already has transcript data'

    # Try to retrieve existing transcript if we have an ID but no data
    if @video.can_retrieve_existing_transcript?
      Rails.logger.info 'Attempting to retrieve existing transcript from AssemblyAI'
      existing_result = retrieve_existing_transcript_from_assemblyai

      if existing_result
        Rails.logger.info 'Successfully retrieved existing transcript'
        result = format_transcript_data(existing_result)
        @video.update_transcript_data(result)
        return result
      else
        Rails.logger.info 'Failed to retrieve existing transcript, proceeding with new transcription'
      end
    else
      Rails.logger.info 'Video already has complete transcript data'
      return {
        html: @video.transcript,
        transcript: @video.transcript,
        seo_content: {},
        duration_in_seconds: @video.duration_in_seconds
      }
    end
  end

  # AssemblyAI supports MP4 directly - no audio extraction needed
  raise 'Video must have a Cloudflare UID to be transcribed. Please upload the video to Cloudflare Stream first.' unless @video.cloudflare_uid.present?

  # Refresh Cloudflare data to get latest download status
  @video.refresh_cloudflare_data

  transcription_data = transcribe_audio
  result = format_transcript_data(transcription_data)
  @video.update_transcript_data(result)
  result
end

#transcribe_audio ⇒ `Object`

# File 'app/services/video_processing/transcription_service.rb', line 96

def transcribe_audio
  Rails.logger.info 'Starting video transcription with AssemblyAI...'

  # Check if audio download is ready, otherwise ensure it's enabled
  unless @video.audio_download_ready?
    Rails.logger.info 'Audio download not ready, checking if it needs to be enabled'
    @video.ensure_mp4_downloads_enabled

    # If still not ready after enabling, we'll need to wait
    raise 'Audio download is not ready yet. Please wait for Cloudflare to process the audio download, then try again.' unless @video.audio_download_ready?
  end

  # Use the audio download URL for better transcription quality
  media_url = @video.cloudflare_audio_download_url
  raise 'Failed to get Cloudflare audio download URL' if media_url.blank?

  Rails.logger.info "Using Cloudflare audio download URL for AssemblyAI: #{media_url}"

  # Use AssemblyAI client to transcribe with enhanced configuration
  assemblyai_client = AssemblyaiClient.instance
  transcription_options = {
    language_code: 'en_us',
    punctuate: true,
    format_text: true,
    speaker_labels: true,
    auto_highlights: false,
    entity_detection: true,
    iab_categories: false,
    auto_chapters: false, # Disabled due to issues
    content_safety: false,
    speech_models: ['universal-3-pro'],
    keyterms_prompt: keyterms_for_assemblyai # Improve accuracy with domain-specific terms
  }

  # Only include speakers_expected if specified
  transcription_options[:speakers_expected] = @options[:speakers_expected] if @options[:speakers_expected]

  result = assemblyai_client.submit_transcription(media_url, transcription_options)

  Rails.logger.info "AssemblyAI transcription submitted with ID: #{result}"

  # Store the transcript ID
  @video.update!(assemblyai_transcript_id: result)

  # Poll for completion
  completed_result = assemblyai_client.poll_transcription(result)

  Rails.logger.info 'AssemblyAI transcription completed successfully'

  completed_result
end

#translate_transcript(locales = nil) ⇒ `Hash`

Step 3: Translate transcript and captions to specified locales
Uses DeepL API to translate VTT captions and plain transcript.

Parameters:

locales (Array<String, Symbol>) (defaults to: nil) —
Target locales (default: all supported)
progress_callback (Proc) —
Optional callback for progress updates

Returns:

(Hash) —
Translation results for each locale

# File 'app/services/video_processing/transcription_service.rb', line 751

def translate_transcript(locales = nil, &)
  Rails.logger.info 'Translating transcript and captions'

  return unless @video.has_polished_vtt?

  translation_service = VideoTranslationService.new(@video)

  # Use provided locales or translate to all supported locales
  target_locales = locales || VideoTranslationService::SUPPORTED_LOCALES.keys

  # Translate VTT captions with progress reporting
  caption_results = translation_service.translate_captions(target_locales, &)

  # Also translate plain transcript if available
  transcript_results = translation_service.translate_transcript(target_locales) if @video.transcript.present?

  Rails.logger.debug('Translation completed', locale_count: target_locales&.size)

  {
    captions: caption_results,
    transcript: transcript_results || {}
  }
end

Class: VideoProcessing::TranscriptionService

Instance Method Summary collapse

Constructor Details

#initialize(video, options = {}) ⇒ TranscriptionService

Instance Method Details

#apply_terminology_regex(text) ⇒ Object

#auto_detect_no_spoken_words(transcription_data) ⇒ Object

#count_words(text) ⇒ Object

#ensure_transcription_completed!(transcript_id) ⇒ Object

#export_and_store_paragraphs ⇒ Object

#export_and_store_sentences ⇒ Object

#export_sentences(transcript_id) ⇒ Object

#export_vtt_captions(transcript_id) ⇒ Object

#format_caption_text(text, max_chars_per_line) ⇒ Object

#format_paragraphs_as_text(paragraphs) ⇒ Object

#format_sentences_as_text(sentences) ⇒ Object

#format_timestamp(milliseconds) ⇒ Object

#format_transcript_as_html(structured_data) ⇒ Object

#format_transcript_data(transcription_data) ⇒ Object

#format_transcript_to_html_with_speakers(transcription_data) ⇒ Object

#format_vtt_timestamp(milliseconds) ⇒ Object

#generate_html_transcript_from_paragraphs(paragraphs) ⇒ Object

#generate_paragraphs_by_chunking(vtt_polished, captions_per_paragraph: 15) ⇒ Object

#generate_paragraphs_from_polished_text(vtt_polished) ⇒ Object

#generate_vtt_content_from_polished_vtt ⇒ Object

#generate_vtt_content_from_structured_transcript ⇒ Object

#get_and_polish_native_paragraphs ⇒ Object

#get_existing_transcript_for_seo ⇒ Object

#get_sentences_from_assemblyai(transcript_id) ⇒ Object

#get_transcript_data(transcript_id) ⇒ Object

#get_vtt_from_assemblyai(transcript_id) ⇒ Object

#parse_numbered_captions(polished_text, original_captions) ⇒ Object

#parse_vtt_file(vtt_content) ⇒ Object

#parse_vtt_timestamp(timestamp) ⇒ Object

#polish_paragraphs_with_llm(paragraphs) ⇒ Object

#polish_transcript_with_company_terminology ⇒ Object

#polish_vtt_text(vtt_original) ⇒ Object

#polish_vtt_text_regex(vtt_original) ⇒ Object

#poll_transcription(transcript_id, progress_callback = nil) ⇒ Object

#process_vtt_from_assemblyai_and_update_structured_transcript ⇒ Object

#record_transcription_error(source:, endpoint:, http_status:, message:, transcript_id: nil) ⇒ Object

#retrieve_and_overwrite_structured_transcript ⇒ Object

#retrieve_existing_transcript_from_assemblyai ⇒ Object

#safe_parse_error_message(body) ⇒ Object

#submit_transcription(use_webhook: false) ⇒ Hash

#summarize_video_and_update_metadata ⇒ Object

#transcribe ⇒ Object

#transcribe_audio ⇒ Object

#translate_transcript(locales = nil) ⇒ Hash