Class: SeoBatchCollectorWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::Status::Worker, Sidekiq::Worker
Defined in:
app/workers/seo_batch_collector_worker.rb

Overview

Phase 1 of the two-phase SEO batch pipeline.

Runs nightly AFTER first-party data syncs (Visits, GSC, GA4).
For each page needing analysis:

  1. Syncs keywords from Ahrefs
  2. Builds the AI prompt (gather context, keyword research, related content)
  3. Stores the prompt in a SeoBatchItem record

Zero AI tokens consumed — all work is data collection and prompt construction.
After collection completes, enqueues SeoBatchSubmitWorker for Phase 2.

Tiered freshness — high-traffic pages are refreshed more often:

  • High (100+ combined traffic): every 7 days
  • Moderate (10-99): every 14 days
  • Low (1-9): every 30 days

Usage:
SeoBatchCollectorWorker.perform_async
SeoBatchCollectorWorker.perform_async({ 'limit' => 250 })

Constant Summary collapse

TIER_CONFIG =
{
  high:     { min_traffic: 100, freshness_days: 7 },
  moderate: { min_traffic: 10,  freshness_days: 14 },
  low:      { min_traffic: 1,   freshness_days: 30 }
}.freeze
SKIP_CATEGORIES =

DB-generated and low-traffic page types not worth AI analysis tokens.
Focus budget on: post, static_page, tech_article, video.

%w[product showcase support post_tag author towel_warmer_filter floor_plan form].freeze
DEFAULT_LIMIT =
500

Instance Method Summary collapse

Instance Method Details

#perform(options = {}) ⇒ Object



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'app/workers/seo_batch_collector_worker.rb', line 41

def perform(options = {})
  options = options.with_indifferent_access
  limit = options[:limit] || DEFAULT_LIMIT
  model = options[:model] || Seo::PageAnalysisService::ANALYSIS_MODEL

  # Don't start a new batch if one is already in progress
  if SeoBatchJob.active.exists?
    log_info 'Active batch job already exists — skipping'
    store info_message: 'Skipped: active batch job already running'
    return
  end

  pages = find_pages_needing_analysis(limit)
  if pages.empty?
    log_info 'No pages need analysis'
    store info_message: 'No pages need analysis'
    return
  end

  batch_job = SeoBatchJob.create!(status: 'collecting', model: model)
  stats = { collected: 0, keyword_syncs: 0, errors: [] }

  total pages.size
  log_info "Phase 1: Collecting prompts for #{pages.size} pages (batch job #{batch_job.id})"

  pages.each_with_index do |site_map, index|
    at index + 1, "Collecting #{site_map.path}..."

    begin
      collect_page(batch_job, site_map, model, stats)
      stats[:collected] += 1
    rescue StandardError => e
      log_error "Failed to collect SiteMap #{site_map.id}: #{e.message}"
      stats[:errors] << { site_map_id: site_map.id, path: site_map.path, error: e.message }
    end
  end

  if batch_job.items.pending.none?
    batch_job.mark_failed!('No prompts collected — all pages errored')
    log_error 'No prompts collected — all pages errored'
    store info_message: "Failed: #{stats[:errors].size} errors, 0 prompts"
    return
  end

  batch_job.update!(
    status: 'submitting',
    total_pages: batch_job.items.count,
    metadata: batch_job..merge('collection_stats' => stats.except(:errors),
                                       'error_count' => stats[:errors].size)
  )

  summary = "Phase 1 complete: #{stats[:collected]} prompts, #{stats[:keyword_syncs]} keyword syncs, #{stats[:errors].size} errors"
  log_info summary
  store info_message: summary

  # Enqueue Phase 2: submit to Batch API
  SeoBatchSubmitWorker.perform_async(batch_job.id)
end