Class: SiteMapContentExtractionWorker
- Inherits:
-
Object
- Object
- SiteMapContentExtractionWorker
- Includes:
- Sidekiq::IterableJob, Sidekiq::Worker
- Defined in:
- app/workers/site_map_content_extraction_worker.rb
Overview
Crawls all cacheable pages (all categories except publications and videos) to
refresh extracted content, rendered schema, and the internal link graph.
Uses Sidekiq::IterableJob so progress is saved after each page — a mid-run
deploy or worker restart resumes from the last successful page rather than
restarting from the top.
AI-powered SEO analysis (SeoPageAnalysisWorker) is intentionally excluded here
due to cost — trigger that manually from the CRM per-page or in targeted batches.
Triggered by:
- Nightly cron (config/sidekiq_production_schedule.yml)
- SitemapRegeneratedHandler (via Events::SitemapRegenerated)
Instance Method Summary collapse
Instance Method Details
#build_enumerator(options = nil, cursor:) ⇒ Object
36 37 38 39 40 41 42 43 44 45 46 |
# File 'app/workers/site_map_content_extraction_worker.rb', line 36 def build_enumerator( = nil, cursor:) opts = .to_h.with_indifferent_access locale = opts[:locale] category = opts[:category] pages = SiteMap.cacheable pages = pages.where(locale: locale) if locale.present? pages = pages.where(category: category) if category.present? active_record_records_enumerator(pages, cursor: cursor) end |
#each_iteration(site_map, *_args) ⇒ Object
48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'app/workers/site_map_content_extraction_worker.rb', line 48 def each_iteration(site_map, *_args) results = Cache::SiteCrawler.new.process( pages: SiteMap.where(id: site_map.id), extract_content: true ) status = results.values.first log_info "#{site_map.locale} #{site_map.path} → #{status}" rescue StandardError => e log_error "Failed #{site_map.path}: #{e.}" ErrorReporting.error(e) raise end |