Class: Retailer::Extractors::Rona

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/rona.rb

Overview

Rona/RenoDepot data extractor.
Handles both search result pages and product detail pages.

Strategy for Rona (based on Playwright testing):

  1. First check uses stored URL if available (from previous successful extraction)
  2. If no URL stored, use browser automation to search by SKU
  3. Rona's search results are JS-rendered, requiring browser_instructions
  4. Extract canonical URL from successful response and store it in catalog_item.url
  5. Future checks use the stored direct URL for faster, more reliable results

URL Pattern: https://www.rona.ca/en/product/slug-article_id

Constant Summary collapse

RENDER_REQUIRED =

JS-rendered search results + browser_instructions assume rendering. Must
stay true.

true

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#canonical_urlObject (readonly)

Returns the value of attribute canonical_url.



20
21
22
# File 'app/services/retailer/extractors/rona.rb', line 20

def canonical_url
  @canonical_url
end

Class Method Details

.browser_instructions(search_term) ⇒ Array<Hash>

Browser instructions to search on Rona
Generated by Oxypilot based on site analysis

Parameters:

  • search_term (String)

    The term to search for

Returns:

  • (Array<Hash>)

    Browser instruction sequence



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'app/services/retailer/extractors/rona.rb', line 81

def self.browser_instructions(search_term)
  [
    # Type into search box
    {
      type: 'input',
      selector: {
        type: 'xpath',
        value: '//div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/label[1]/input[1]'
      },
      value: search_term,
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Click search button
    {
      type: 'click',
      selector: {
        type: 'xpath',
        value: '//body[1]/div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/button[3]'
      },
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Wait for results to load
    {
      type: 'wait',
      wait_time_s: 7
    }
  ]
end

.build_payload(url:) ⇒ Hash

Build Oxylabs payload for scraping a direct product URL

Parameters:

  • url (String)

    Full product URL

Returns:

  • (Hash)

    Oxylabs API payload



29
30
31
32
33
34
35
36
# File 'app/services/retailer/extractors/rona.rb', line 29

def self.build_payload(url:)
  {
    source: 'universal',
    url: url,
    geo_location: 'Canada',
    render: render_value
  }.compact
end

.discovery_payloadHash

Build payload to discover all WarmlyYours product URLs
Uses a single search and extracts all matching product URLs

Returns:

  • (Hash)

    Oxylabs API payload



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'app/services/retailer/extractors/rona.rb', line 58

def self.discovery_payload
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: render_value,
    parse: true,
    browser_instructions: browser_instructions('WarmlyYours'),
    parsing_instructions: {
      all_product_urls: {
        _fns: [
          { _fn: 'xpath', _args: [".//a[contains(@href, '/product/') and contains(@href, 'warmly')]/@href"] }
        ]
      }
    }
  }.compact
end

.fetch_all_product_urls(api) ⇒ Array<String>

Fetch all WarmlyYours product URLs from Rona

Parameters:

Returns:

  • (Array<String>)

    Array of product URLs



234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'app/services/retailer/extractors/rona.rb', line 234

def self.fetch_all_product_urls(api)
  result = api.request(discovery_payload)

  unless result.success?
    Rails.logger.error "[Rona] API error: #{result.error}"
    return []
  end

  data = result.data&.first
  parsed = data['content']
  return [] unless parsed.is_a?(Hash)

  urls = parsed['all_product_urls']
  return [] unless urls.is_a?(Array)

  # Deduplicate and make absolute
  urls.uniq.filter_map { |url| make_url_absolute(url) }
end

.make_url_absolute(url) ⇒ String?

Ensure URL is absolute (Rona sometimes returns relative URLs)

Parameters:

  • url (String)

    URL to make absolute

Returns:

  • (String, nil)


296
297
298
299
300
301
# File 'app/services/retailer/extractors/rona.rb', line 296

def self.make_url_absolute(url)
  return nil if url.blank?
  return url if url.start_with?('http')

  "https://www.rona.ca#{'/' unless url.start_with?('/')}#{url}"
end

.match_url(article_id, sku, urls) ⇒ String?

Find a matching URL for a catalog item
Primary match: article_id at end of URL (most reliable)
Secondary match: SKU patterns in URL slug (fallback)

Parameters:

  • article_id (String)

    The Rona article ID (third_party_part_number)

  • sku (String)

    The product SKU (e.g., "TRT120-3.0X10")

  • urls (Array<String>)

    Available URLs

Returns:

  • (String, nil)

    Matching URL or nil



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# File 'app/services/retailer/extractors/rona.rb', line 261

def self.match_url(article_id, sku, urls)
  # Primary: Match by article ID (URLs end with -{article_id})
  if article_id.present?
    matched = urls.find { |url| url.end_with?("-#{article_id}") }
    return matched if matched
  end

  # Secondary: Try matching by SKU in URL slug
  return nil if sku.blank?

  sku_normalized = sku.downcase.gsub(/[^a-z0-9]/, '')

  urls.find do |url|
    url_normalized = url.downcase.gsub(/[^a-z0-9]/, '')
    url_normalized.include?(sku_normalized) || sku_parts_match?(sku, url)
  end
end

.parsing_instructionsHash

Parsing instructions for Rona product data
Uses XPath to extract structured data from rendered HTML

Returns:

  • (Hash)

    Oxylabs parsing_instructions format



118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'app/services/retailer/extractors/rona.rb', line 118

def self.parsing_instructions
  {
    current_price: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='price-box__price__amount']",
            ".//div[contains(@class, 'price-box')]//span[@class='price-box__price__amount']",
            ".//div[contains(@data-cnstrc-item-name, '')]//span[@class='price-box__price__amount']"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' },
        { _fn: 'amount_from_string' }
      ]
    },
    availability: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='plp-stock-message-new']",
            ".//div[@class='plp-stock-message']//span[@style]",
            ".//div[contains(@class, 'availability')]//span"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' }
      ]
    },
    product_url: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//a[contains(@href, '/product/')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Product')]/@href"
          ]
        },
        { _fn: 'regex_search', _args: ['^\\s*(.[\\s\\S]*?)\\s*$', 1] }
      ]
    },
    product_title: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//div[contains(@class, 'product-tile')]//a[@class='product-tile-link']",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]",
            ".//h2[contains(@class, 'product')]"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] }
      ]
    }
  }
end

.search_payload(sku) ⇒ Hash

Build Oxylabs request payload for searching Rona by SKU
Uses browser_instructions to interact with JS-rendered search

Parameters:

  • sku (String)

    Product SKU to search for (e.g., "TRT120-3.0X10")

Returns:

  • (Hash)

    Complete Oxylabs API payload



43
44
45
46
47
48
49
50
51
52
53
# File 'app/services/retailer/extractors/rona.rb', line 43

def self.search_payload(sku)
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: render_value,
    parse: true,
    browser_instructions: browser_instructions(sku),
    parsing_instructions: parsing_instructions
  }.compact
end

.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ Hash

Discover and seed URLs for all Rona catalog items without URLs
Uses a single search for "WarmlyYours" and matches URLs to items

Parameters:

Returns:

  • (Hash)

    Summary with :total, :found, :not_found counts



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'app/services/retailer/extractors/rona.rb', line 187

def self.seed_catalog_urls(api: Retailer::OxylabsApi.new)
  rona_items = CatalogItem.where(catalog_id: CatalogConstants::RONA_CANADA)
                          .where(state: 'active')
                          .where(url: [nil, ''])
                          .includes(:item)

  results = { total: rona_items.count, found: 0, not_found: 0, errors: 0 }
  return results if rona_items.empty?

  # Get all WarmlyYours product URLs from Rona (single API call)
  Rails.logger.info '[Rona] Fetching all WarmlyYours URLs...'
  all_urls = fetch_all_product_urls(api)

  if all_urls.empty?
    Rails.logger.error '[Rona] No URLs found!'
    return results
  end

  Rails.logger.info "[Rona] Found #{all_urls.size} WarmlyYours URLs"

  # Match URLs to catalog items using article_id (third_party_part_number)
  rona_items.find_each do |catalog_item|
    article_id = catalog_item.third_party_part_number
    sku = catalog_item.item&.sku

    matched_url = match_url(article_id, sku, all_urls)

    if matched_url.present?
      catalog_item.update!(url: matched_url)
      results[:found] += 1
      Rails.logger.info "[Rona] Matched: #{sku} (#{article_id}) -> #{matched_url.truncate(70)}"
    else
      results[:not_found] += 1
      Rails.logger.debug { "[Rona] No match for: #{sku} (#{article_id})" }
    end
  rescue StandardError => e
    results[:errors] += 1
    Rails.logger.error "[Rona] Error matching #{sku}: #{e.message}"
  end

  Rails.logger.info "[Rona] Complete: #{results[:found]} found, #{results[:not_found]} not found"
  results
end

.sku_parts_match?(sku, url) ⇒ Boolean

Check if SKU components match URL patterns

Parameters:

  • sku (String)

    SKU like "TRT120-3.0X10" or "TWS6-GRD10BH"

  • url (String)

    URL like ".../warmlyyours-grande-342-in-brushed-stainless-steel-10-bar..."

Returns:

  • (Boolean)


282
283
284
285
286
287
288
289
290
291
# File 'app/services/retailer/extractors/rona.rb', line 282

def self.sku_parts_match?(sku, url)
  parts = sku.downcase.split(/[-_.]/)
  url_lower = url.downcase

  # Check if any significant part appears in URL
  significant_parts = parts.select { |p| p.length >= 4 }
  return false if significant_parts.empty?

  significant_parts.any? { |part| url_lower.include?(part) }
end

Instance Method Details

#discovered_urlString?

Returns the canonical URL if found (to be saved for future checks)

Returns:

  • (String, nil)


347
348
349
# File 'app/services/retailer/extractors/rona.rb', line 347

def discovered_url
  @canonical_url
end

#extract(check, content) ⇒ Object

==========================================================================
Instance Methods - Data Extraction



307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# File 'app/services/retailer/extractors/rona.rb', line 307

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = 'CAD'

  doc = parse_html(content)

  # Try to extract canonical URL for future use
  @canonical_url = extract_canonical_url(doc)

  # Determine page type
  if search_results_page?(content)
    extract_from_search_page(check, doc)
  else
    extract_from_product_page(check, doc, content)
  end
end

#extract_from_parsed(check, parsed_data) ⇒ Object

Extract from Oxylabs parsed response (when using browser_instructions)

Parameters:



329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# File 'app/services/retailer/extractors/rona.rb', line 329

def extract_from_parsed(check, parsed_data)
  check.scraper_source = source_name
  check.currency = 'CAD'

  price = parsed_data['current_price']
  check.price = price if price.is_a?(Numeric) && price.positive?

  availability = parsed_data['availability'].to_s
  check.product_available = availability.present? && availability.downcase.exclude?('unavailable')

  check.raw_title = parsed_data['product_title']&.truncate(255)

  product_url = parsed_data['product_url']
  @canonical_url = self.class.make_url_absolute(product_url) if product_url.present?
end