Class: Retailer::Extractors::Rona

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/rona.rb

Overview

Rona/RenoDepot data extractor.
Handles both search result pages and product detail pages.

Strategy for Rona (based on Playwright testing):

  1. First check uses stored URL if available (from previous successful extraction)
  2. If no URL stored, use browser automation to search by SKU
  3. Rona's search results are JS-rendered, requiring browser_instructions
  4. Extract canonical URL from successful response and store it in catalog_item.url
  5. Future checks use the stored direct URL for faster, more reliable results

URL Pattern: https://www.rona.ca/en/product/slug-article_id

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#canonical_urlObject (readonly)

Returns the value of attribute canonical_url.



16
17
18
# File 'app/services/retailer/extractors/rona.rb', line 16

def canonical_url
  @canonical_url
end

Class Method Details

.browser_instructions(search_term) ⇒ Array<Hash>

Browser instructions to search on Rona
Generated by Oxypilot based on site analysis

Parameters:

  • search_term (String)

    The term to search for

Returns:

  • (Array<Hash>)

    Browser instruction sequence



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'app/services/retailer/extractors/rona.rb', line 77

def self.browser_instructions(search_term)
  [
    # Type into search box
    {
      type: 'input',
      selector: {
        type: 'xpath',
        value: '//div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/label[1]/input[1]'
      },
      value: search_term,
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Click search button
    {
      type: 'click',
      selector: {
        type: 'xpath',
        value: '//body[1]/div[1]/div[3]/header[1]/div[1]/div[1]/form[1]/button[3]'
      },
      timeout_s: 5,
      wait_time_s: 1,
      on_error: 'error'
    },
    # Wait for results to load
    {
      type: 'wait',
      wait_time_s: 7
    }
  ]
end

.build_payload(url:) ⇒ Hash

Build Oxylabs payload for scraping a direct product URL

Parameters:

  • url (String)

    Full product URL

Returns:

  • (Hash)

    Oxylabs API payload



25
26
27
28
29
30
31
32
# File 'app/services/retailer/extractors/rona.rb', line 25

def self.build_payload(url:)
  {
    source: 'universal',
    url: url,
    geo_location: 'Canada',
    render: 'html'
  }
end

.discovery_payloadHash

Build payload to discover all WarmlyYours product URLs
Uses a single search and extracts all matching product URLs

Returns:

  • (Hash)

    Oxylabs API payload



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'app/services/retailer/extractors/rona.rb', line 54

def self.discovery_payload
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: 'html',
    parse: true,
    browser_instructions: browser_instructions('WarmlyYours'),
    parsing_instructions: {
      all_product_urls: {
        _fns: [
          { _fn: 'xpath', _args: [".//a[contains(@href, '/product/') and contains(@href, 'warmly')]/@href"] }
        ]
      }
    }
  }
end

.fetch_all_product_urls(api) ⇒ Array<String>

Fetch all WarmlyYours product URLs from Rona

Parameters:

Returns:

  • (Array<String>)

    Array of product URLs



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'app/services/retailer/extractors/rona.rb', line 230

def self.fetch_all_product_urls(api)
  result = api.request(discovery_payload)

  unless result.success?
    Rails.logger.error "[Rona] API error: #{result.error}"
    return []
  end

  data = result.data&.first
  parsed = data['content']
  return [] unless parsed.is_a?(Hash)

  urls = parsed['all_product_urls']
  return [] unless urls.is_a?(Array)

  # Deduplicate and make absolute
  urls.uniq.map { |url| make_url_absolute(url) }.compact
end

.make_url_absolute(url) ⇒ String?

Ensure URL is absolute (Rona sometimes returns relative URLs)

Parameters:

  • url (String)

    URL to make absolute

Returns:

  • (String, nil)


292
293
294
295
296
297
# File 'app/services/retailer/extractors/rona.rb', line 292

def self.make_url_absolute(url)
  return nil unless url.present?
  return url if url.start_with?('http')

  "https://www.rona.ca#{'/' unless url.start_with?('/')}#{url}"
end

.match_url(article_id, sku, urls) ⇒ String?

Find a matching URL for a catalog item
Primary match: article_id at end of URL (most reliable)
Secondary match: SKU patterns in URL slug (fallback)

Parameters:

  • article_id (String)

    The Rona article ID (third_party_part_number)

  • sku (String)

    The product SKU (e.g., "TRT120-3.0X10")

  • urls (Array<String>)

    Available URLs

Returns:

  • (String, nil)

    Matching URL or nil



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'app/services/retailer/extractors/rona.rb', line 257

def self.match_url(article_id, sku, urls)
  # Primary: Match by article ID (URLs end with -{article_id})
  if article_id.present?
    matched = urls.find { |url| url.end_with?("-#{article_id}") }
    return matched if matched
  end

  # Secondary: Try matching by SKU in URL slug
  return nil unless sku.present?

  sku_normalized = sku.downcase.gsub(/[^a-z0-9]/, '')

  urls.find do |url|
    url_normalized = url.downcase.gsub(/[^a-z0-9]/, '')
    url_normalized.include?(sku_normalized) || sku_parts_match?(sku, url)
  end
end

.parsing_instructionsHash

Parsing instructions for Rona product data
Uses XPath to extract structured data from rendered HTML

Returns:

  • (Hash)

    Oxylabs parsing_instructions format



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'app/services/retailer/extractors/rona.rb', line 114

def self.parsing_instructions
  {
    current_price: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='price-box__price__amount']",
            ".//div[contains(@class, 'price-box')]//span[@class='price-box__price__amount']",
            ".//div[contains(@data-cnstrc-item-name, '')]//span[@class='price-box__price__amount']"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' },
        { _fn: 'amount_from_string' }
      ]
    },
    availability: {
      _fns: [
        {
          _fn: 'xpath',
          _args: [
            ".//span[@class='plp-stock-message-new']",
            ".//div[@class='plp-stock-message']//span[@style]",
            ".//div[contains(@class, 'availability')]//span"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] },
        { _fn: 'join', _args: ' ' }
      ]
    },
    product_url: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//a[contains(@href, '/product/')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]/@href",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Product')]/@href"
          ]
        },
        { _fn: 'regex_search', _args: ['^\\s*(.[\\s\\S]*?)\\s*$', 1] }
      ]
    },
    product_title: {
      _fns: [
        {
          _fn: 'xpath_one',
          _args: [
            ".//div[contains(@class, 'product-tile')]//a[@class='product-tile-link']",
            ".//div[contains(@class, 'product-tile')]//a[contains(@data-eventaction, 'Click')]",
            ".//h2[contains(@class, 'product')]"
          ]
        },
        { _fn: 'xpath', _args: ['normalize-space(.)'] }
      ]
    }
  }
end

.search_payload(sku) ⇒ Hash

Build Oxylabs request payload for searching Rona by SKU
Uses browser_instructions to interact with JS-rendered search

Parameters:

  • sku (String)

    Product SKU to search for (e.g., "TRT120-3.0X10")

Returns:

  • (Hash)

    Complete Oxylabs API payload



39
40
41
42
43
44
45
46
47
48
49
# File 'app/services/retailer/extractors/rona.rb', line 39

def self.search_payload(sku)
  {
    source: 'universal',
    url: 'https://www.rona.ca/en',
    geo_location: 'Canada',
    render: 'html',
    parse: true,
    browser_instructions: browser_instructions(sku),
    parsing_instructions: parsing_instructions
  }
end

.seed_catalog_urls(api: Retailer::OxylabsApi.new) ⇒ Hash

Discover and seed URLs for all Rona catalog items without URLs
Uses a single search for "WarmlyYours" and matches URLs to items

Parameters:

Returns:

  • (Hash)

    Summary with :total, :found, :not_found counts



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'app/services/retailer/extractors/rona.rb', line 183

def self.seed_catalog_urls(api: Retailer::OxylabsApi.new)
  rona_items = CatalogItem.where(catalog_id: CatalogConstants::RONA_CANADA)
                          .where(state: 'active')
                          .where(url: [nil, ''])
                          .includes(:item)

  results = { total: rona_items.count, found: 0, not_found: 0, errors: 0 }
  return results if rona_items.empty?

  # Get all WarmlyYours product URLs from Rona (single API call)
  Rails.logger.info '[Rona] Fetching all WarmlyYours URLs...'
  all_urls = fetch_all_product_urls(api)

  if all_urls.empty?
    Rails.logger.error '[Rona] No URLs found!'
    return results
  end

  Rails.logger.info "[Rona] Found #{all_urls.size} WarmlyYours URLs"

  # Match URLs to catalog items using article_id (third_party_part_number)
  rona_items.find_each do |catalog_item|
    article_id = catalog_item.third_party_part_number
    sku = catalog_item.item&.sku

    matched_url = match_url(article_id, sku, all_urls)

    if matched_url.present?
      catalog_item.update!(url: matched_url)
      results[:found] += 1
      Rails.logger.info "[Rona] Matched: #{sku} (#{article_id}) -> #{matched_url.truncate(70)}"
    else
      results[:not_found] += 1
      Rails.logger.debug { "[Rona] No match for: #{sku} (#{article_id})" }
    end
  rescue StandardError => e
    results[:errors] += 1
    Rails.logger.error "[Rona] Error matching #{sku}: #{e.message}"
  end

  Rails.logger.info "[Rona] Complete: #{results[:found]} found, #{results[:not_found]} not found"
  results
end

.sku_parts_match?(sku, url) ⇒ Boolean

Check if SKU components match URL patterns

Parameters:

  • sku (String)

    SKU like "TRT120-3.0X10" or "TWS6-GRD10BH"

  • url (String)

    URL like ".../warmlyyours-grande-342-in-brushed-stainless-steel-10-bar..."

Returns:

  • (Boolean)


278
279
280
281
282
283
284
285
286
287
# File 'app/services/retailer/extractors/rona.rb', line 278

def self.sku_parts_match?(sku, url)
  parts = sku.downcase.split(/[-_.]/)
  url_lower = url.downcase

  # Check if any significant part appears in URL
  significant_parts = parts.select { |p| p.length >= 4 }
  return false if significant_parts.empty?

  significant_parts.any? { |part| url_lower.include?(part) }
end

Instance Method Details

#discovered_urlString?

Returns the canonical URL if found (to be saved for future checks)

Returns:

  • (String, nil)


343
344
345
# File 'app/services/retailer/extractors/rona.rb', line 343

def discovered_url
  @canonical_url
end

#extract(check, content) ⇒ Object

==========================================================================
Instance Methods - Data Extraction



303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
# File 'app/services/retailer/extractors/rona.rb', line 303

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = 'CAD'

  doc = parse_html(content)

  # Try to extract canonical URL for future use
  @canonical_url = extract_canonical_url(doc)

  # Determine page type
  if search_results_page?(content)
    extract_from_search_page(check, doc)
  else
    extract_from_product_page(check, doc, content)
  end
end

#extract_from_parsed(check, parsed_data) ⇒ Object

Extract from Oxylabs parsed response (when using browser_instructions)

Parameters:



325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
# File 'app/services/retailer/extractors/rona.rb', line 325

def extract_from_parsed(check, parsed_data)
  check.scraper_source = source_name
  check.currency = 'CAD'

  price = parsed_data['current_price']
  check.price = price if price.is_a?(Numeric) && price.positive?

  availability = parsed_data['availability'].to_s
  check.product_available = availability.present? && !availability.downcase.include?('unavailable')

  check.raw_title = parsed_data['product_title']&.truncate(255)

  product_url = parsed_data['product_url']
  @canonical_url = self.class.make_url_absolute(product_url) if product_url.present?
end