Class: Retailer::Extractors::Rona

Inherits:

Base

Object
Base
Retailer::Extractors::Rona

show all

Defined in:: app/services/retailer/extractors/rona.rb

Overview

Rona/RenoDepot data extractor.
Handles both search result pages and product detail pages.

Strategy for Rona (based on Playwright testing):

First check uses stored URL if available (from previous successful extraction)
If no URL stored, use browser automation to search by SKU
Rona's search results are JS-rendered, requiring browser_instructions
Extract canonical URL from successful response and store it in catalog_item.url
Future checks use the stored direct URL for faster, more reliable results

URL Pattern: https://www.rona.ca/en/product/slug-article_id

Instance Attribute Summary collapse

#canonical_url ⇒ Object readonly
Returns the value of attribute canonical_url.

Class Method Summary collapse

.browser_instructions(search_term) ⇒ Array<Hash>
Browser instructions to search on Rona Generated by Oxypilot based on site analysis.
.build_payload(url:) ⇒ Hash
Build Oxylabs payload for scraping a direct product URL.
.discovery_payload ⇒ Hash
Build payload to discover all WarmlyYours product URLs Uses a single search and extracts all matching product URLs.
.fetch_all_product_urls(api) ⇒ Array<String>
Fetch all WarmlyYours product URLs from Rona.
.make_url_absolute(url) ⇒ String^?
Ensure URL is absolute (Rona sometimes returns relative URLs).
.match_url(article_id, sku, urls) ⇒ String^?
Find a matching URL for a catalog item Primary match: article_id at end of URL (most reliable) Secondary match: SKU patterns in URL slug (fallback).
.parsing_instructions ⇒ Hash
Parsing instructions for Rona product data Uses XPath to extract structured data from rendered HTML.
.search_payload(sku) ⇒ Hash
Build Oxylabs request payload for searching Rona by SKU Uses browser_instructions to interact with JS-rendered search.
.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ Hash
Discover and seed URLs for all Rona catalog items without URLs Uses a single search for "WarmlyYours" and matches URLs to items.
.sku_parts_match?(sku, url) ⇒ Boolean
Check if SKU components match URL patterns.

Instance Method Summary collapse

#discovered_url ⇒ String^?
Returns the canonical URL if found (to be saved for future checks).
#extract(check, content) ⇒ Object
========================================================================== Instance Methods - Data Extraction ==========================================================================.
#extract_from_parsed(check, parsed_data) ⇒ Object
Extract from Oxylabs parsed response (when using browser_instructions).

Instance Attribute Details

#canonical_url ⇒ `Object` (readonly)

Returns the value of attribute canonical_url.



16
17
18

# File 'app/services/retailer/extractors/rona.rb', line 16

def canonical_url
  @canonical_url
end

Class Method Details

.browser_instructions(search_term) ⇒ `Array<Hash>`

Browser instructions to search on Rona
Generated by Oxypilot based on site analysis

Parameters:

search_term (String) —
The term to search for

Returns:

(Array<Hash>) —
Browser instruction sequence

# File 'app/services/retailer/extractors/rona.rb', line 77

def self.browser_instructions(search_term)
  [
    # Type into search box
    {
      type: 'input',
      selector: {
        type: 'xpath',
        value: '//div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/label[1]/input[1]'
      },
      value: search_term,
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Click search button
    {
      type: 'click',
      selector: {
        type: 'xpath',
        value: '//body[1]/div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/button[3]'
      },
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Wait for results to load
    {
      type: 'wait',
      wait_time_s: 7
    }
  ]
end

.build_payload(url:) ⇒ `Hash`

Build Oxylabs payload for scraping a direct product URL

Parameters:

url (String) —
Full product URL

Returns:

(Hash) —
Oxylabs API payload

# File 'app/services/retailer/extractors/rona.rb', line 25

def self.build_payload(url:)
  {
    source: 'universal',
    url: url,
    geo_location: 'Canada',
    render: 'html'
  }
end

.discovery_payload ⇒ `Hash`

Build payload to discover all WarmlyYours product URLs
Uses a single search and extracts all matching product URLs

Returns:

(Hash) —
Oxylabs API payload

# File 'app/services/retailer/extractors/rona.rb', line 54

def self.discovery_payload
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: 'html',
    parse: true,
    browser_instructions: browser_instructions('WarmlyYours'),
    parsing_instructions: {
      all_product_urls: {
        _fns: [
          { _fn: 'xpath', _args: [".//a[contains(@href, '/product/') and contains(@href, 'warmly')]/@href"] }
        ]
      }
    }
  }
end

.fetch_all_product_urls(api) ⇒ `Array<String>`

Fetch all WarmlyYours product URLs from Rona

Parameters:

api (Retailer::OxylabsApi) —
API client instance

Returns:

(Array<String>) —
Array of product URLs

# File 'app/services/retailer/extractors/rona.rb', line 230

def self.fetch_all_product_urls(api)
  result = api.request(discovery_payload)

  unless result.success?
    Rails.logger.error "[Rona] API error: #{result.error}"
    return []
  end

  data = result.data&.first
  parsed = data['content']
  return [] unless parsed.is_a?(Hash)

  urls = parsed['all_product_urls']
  return [] unless urls.is_a?(Array)

  # Deduplicate and make absolute
  urls.uniq.map { |url| make_url_absolute(url) }.compact
end

.make_url_absolute(url) ⇒ `String`^?

Ensure URL is absolute (Rona sometimes returns relative URLs)

Parameters:

url (String) —
URL to make absolute

Returns:

(String, nil)

# File 'app/services/retailer/extractors/rona.rb', line 292

def self.make_url_absolute(url)
  return nil unless url.present?
  return url if url.start_with?('http')

  "https://www.rona.ca#{'/' unless url.start_with?('/')}#{url}"
end

.match_url(article_id, sku, urls) ⇒ `String`^?

Find a matching URL for a catalog item
Primary match: article_id at end of URL (most reliable)
Secondary match: SKU patterns in URL slug (fallback)

Parameters:

article_id (String) —
The Rona article ID (third_party_part_number)
sku (String) —
The product SKU (e.g., "TRT120-3.0X10")
urls (Array<String>) —
Available URLs

Returns:

(String, nil) —
Matching URL or nil

# File 'app/services/retailer/extractors/rona.rb', line 257

def self.match_url(article_id, sku, urls)
  # Primary: Match by article ID (URLs end with -{article_id})
  if article_id.present?
    matched = urls.find { |url| url.end_with?("-#{article_id}") }
    return matched if matched
  end

  # Secondary: Try matching by SKU in URL slug
  return nil unless sku.present?

  sku_normalized = sku.downcase.gsub(/[^a-z0-9]/, '')

  urls.find do |url|
    url_normalized = url.downcase.gsub(/[^a-z0-9]/, '')
    url_normalized.include?(sku_normalized) || sku_parts_match?(sku, url)
  end
end

.parsing_instructions ⇒ `Hash`

Parsing instructions for Rona product data
Uses XPath to extract structured data from rendered HTML

Returns:

(Hash) —
Oxylabs parsing_instructions format

# File 'app/services/retailer/extractors/rona.rb', line 114

def self.parsing_instructions
  {
    current_price: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='price-box__price__amount']",
            ".//div[contains(@class, 'price-box')]//span[@class='price-box__price__amount']",
            ".//div[contains(@data-cnstrc-item-name, '')]//span[@class='price-box__price__amount']"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' },
        { _fn: 'amount_from_string' }
      ]
    },
    availability: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='plp-stock-message-new']",
            ".//div[@class='plp-stock-message']//span[@style]",
            ".//div[contains(@class, 'availability')]//span"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' }
      ]
    },
    product_url: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//a[contains(@href, '/product/')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Product')]/@href"
          ]
        },
        { _fn: 'regex_search', _args: ['^\\s*(.[\\s\\S]*?)\\s*$', 1] }
      ]
    },
    product_title: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//div[contains(@class, 'product-tile')]//a[@class='product-tile-link']",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]",
            ".//h2[contains(@class, 'product')]"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] }
      ]
    }
  }
end

.search_payload(sku) ⇒ `Hash`

Build Oxylabs request payload for searching Rona by SKU
Uses browser_instructions to interact with JS-rendered search

Parameters:

sku (String) —
Product SKU to search for (e.g., "TRT120-3.0X10")

Returns:

(Hash) —
Complete Oxylabs API payload

# File 'app/services/retailer/extractors/rona.rb', line 39

def self.search_payload(sku)
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: 'html',
    parse: true,
    browser_instructions: browser_instructions(sku),
    parsing_instructions: parsing_instructions
  }
end

.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ `Hash`

Discover and seed URLs for all Rona catalog items without URLs
Uses a single search for "WarmlyYours" and matches URLs to items

Parameters:

api (Retailer::OxylabsApi) (defaults to: Retailer::OxylabsApi.new) —
API client instance

Returns:

(Hash) —
Summary with :total, :found, :not_found counts

# File 'app/services/retailer/extractors/rona.rb', line 183

def self.seed_catalog_urls(api: Retailer::OxylabsApi.new)
  rona_items = CatalogItem.where(catalog_id: CatalogConstants::RONA_CANADA)
                          .where(state: 'active')
                          .where(url: [nil, ''])
                          .includes(:item)

  results = { total: rona_items.count, found: 0, not_found: 0, errors: 0 }
  return results if rona_items.empty?

  # Get all WarmlyYours product URLs from Rona (single API call)
  Rails.logger.info '[Rona] Fetching all WarmlyYours URLs...'
  all_urls = fetch_all_product_urls(api)

  if all_urls.empty?
    Rails.logger.error '[Rona] No URLs found!'
    return results
  end

  Rails.logger.info "[Rona] Found #{all_urls.size} WarmlyYours URLs"

  # Match URLs to catalog items using article_id (third_party_part_number)
  rona_items.find_each do |catalog_item|
    article_id = catalog_item.third_party_part_number
    sku = catalog_item.item&.sku

    matched_url = match_url(article_id, sku, all_urls)

    if matched_url.present?
      catalog_item.update!(url: matched_url)
      results[:found] += 1
      Rails.logger.info "[Rona] Matched: #{sku} (#{article_id}) -> #{matched_url.truncate(70)}"
    else
      results[:not_found] += 1
      Rails.logger.debug { "[Rona] No match for: #{sku} (#{article_id})" }
    end
  rescue StandardError => e
    results[:errors] += 1
    Rails.logger.error "[Rona] Error matching #{sku}: #{e.message}"
  end

  Rails.logger.info "[Rona] Complete: #{results[:found]} found, #{results[:not_found]} not found"
  results
end

.sku_parts_match?(sku, url) ⇒ `Boolean`

Check if SKU components match URL patterns

Parameters:

sku (String) —
SKU like "TRT120-3.0X10" or "TWS6-GRD10BH"
url (String) —
URL like ".../warmlyyours-grande-342-in-brushed-stainless-steel-10-bar..."

Returns:

(Boolean)

# File 'app/services/retailer/extractors/rona.rb', line 278

def self.sku_parts_match?(sku, url)
  parts = sku.downcase.split(/[-_.]/)
  url_lower = url.downcase

  # Check if any significant part appears in URL
  significant_parts = parts.select { |p| p.length >= 4 }
  return false if significant_parts.empty?

  significant_parts.any? { |part| url_lower.include?(part) }
end

Instance Method Details

#discovered_url ⇒ `String`^?

Returns the canonical URL if found (to be saved for future checks)

Returns:

(String, nil)



343
344
345

# File 'app/services/retailer/extractors/rona.rb', line 343

def discovered_url
  @canonical_url
end

#extract(check, content) ⇒ `Object`

==========================================================================
Instance Methods - Data Extraction

# File 'app/services/retailer/extractors/rona.rb', line 303

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = 'CAD'

  doc = parse_html(content)

  # Try to extract canonical URL for future use
  @canonical_url = extract_canonical_url(doc)

  # Determine page type
  if search_results_page?(content)
    extract_from_search_page(check, doc)
  else
    extract_from_product_page(check, doc, content)
  end
end

#extract_from_parsed(check, parsed_data) ⇒ `Object`

Extract from Oxylabs parsed response (when using browser_instructions)

Parameters:

check (CatalogItemRetailerProbe) —
The probe to populate
parsed_data (Hash) —
Parsed data from Oxylabs

# File 'app/services/retailer/extractors/rona.rb', line 325

def extract_from_parsed(check, parsed_data)
  check.scraper_source = source_name
  check.currency = 'CAD'

  price = parsed_data['current_price']
  check.price = price if price.is_a?(Numeric) && price.positive?

  availability = parsed_data['availability'].to_s
  check.product_available = availability.present? && !availability.downcase.include?('unavailable')

  check.raw_title = parsed_data['product_title']&.truncate(255)

  product_url = parsed_data['product_url']
  @canonical_url = self.class.make_url_absolute(product_url) if product_url.present?
end

Class: Retailer::Extractors::Rona

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#canonical_url ⇒ Object (readonly)

Class Method Details

.browser_instructions(search_term) ⇒ Array<Hash>

.build_payload(url:) ⇒ Hash

.discovery_payload ⇒ Hash

.fetch_all_product_urls(api) ⇒ Array<String>

.make_url_absolute(url) ⇒ String?

.match_url(article_id, sku, urls) ⇒ String?

.parsing_instructions ⇒ Hash

.search_payload(sku) ⇒ Hash

.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ Hash

.sku_parts_match?(sku, url) ⇒ Boolean

Instance Method Details

#discovered_url ⇒ String?

#extract(check, content) ⇒ Object

========================================================================== Instance Methods - Data Extraction

#extract_from_parsed(check, parsed_data) ⇒ Object

#canonical_url ⇒ `Object` (readonly)

.browser_instructions(search_term) ⇒ `Array<Hash>`

.build_payload(url:) ⇒ `Hash`

.discovery_payload ⇒ `Hash`

.fetch_all_product_urls(api) ⇒ `Array<String>`

.make_url_absolute(url) ⇒ `String`^?

.match_url(article_id, sku, urls) ⇒ `String`^?

.parsing_instructions ⇒ `Hash`

.search_payload(sku) ⇒ `Hash`

.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ `Hash`

.sku_parts_match?(sku, url) ⇒ `Boolean`

#discovered_url ⇒ `String`^?

#extract(check, content) ⇒ `Object`

==========================================================================
Instance Methods - Data Extraction

#extract_from_parsed(check, parsed_data) ⇒ `Object`