Class: Retailer::Extractors::HomeDepot

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/home_depot.rb

Overview

Home Depot data extractor (USA and Canada).
Uses JSON-LD and data-automation selectors.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.build_payload(url:, geo_location: nil) ⇒ Hash

Build Oxylabs payload for Home Depot product scraping
Uses 'universal' source with JS rendering for reliable results.
Enables redirect following to capture final canonical URL.

Parameters:

  • url (String)

    Full product URL

  • geo_location (String, nil) (defaults to: nil)

    ZIP code for delivery location

Returns:

  • (Hash)

    Oxylabs API payload



14
15
16
17
18
19
20
21
22
23
24
25
# File 'app/services/retailer/extractors/home_depot.rb', line 14

def self.build_payload(url:, geo_location: nil)
  payload = {
    source: 'universal',
    url: url,
    render: 'html',
    context: [
      { key: 'follow_redirects', value: true }
    ]
  }
  payload[:geo_location] = geo_location if geo_location.present?
  payload
end

.search_payload(query:, geo_location: nil) ⇒ Hash

Build payload for Home Depot search

Parameters:

  • query (String)

    Search query

  • geo_location (String, nil) (defaults to: nil)

    ZIP code

Returns:

  • (Hash)

    Oxylabs API payload



31
32
33
34
# File 'app/services/retailer/extractors/home_depot.rb', line 31

def self.search_payload(query:, geo_location: nil)
  url = "https://www.homedepot.com/s/#{CGI.escape(query)}"
  build_payload(url: url, geo_location: geo_location)
end

Instance Method Details

#extract(check, content) ⇒ Object



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'app/services/retailer/extractors/home_depot.rb', line 36

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = catalog.id == HOME_DEPOT_CANADA ? 'CAD' : 'USD'

  doc = parse_html(content)

  # Check availability
  check.product_available = doc.at_css('[data-automation="add-to-cart"]').present? ||
                            content.include?('Add to Cart') ||
                            !content.include?('Out of Stock')

  # JSON-LD structured data (most reliable)
  extract_json_ld_price(check, doc)

  # Fallback: Home Depot price selectors
  extract_from_selectors(check, doc) if check.price.blank?

  # Extract title
  check.raw_title = extract_title(doc)
end