Class: Retailer::Extractors::HomeDepot
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::HomeDepot
- Defined in:
- app/services/retailer/extractors/home_depot.rb
Overview
Home Depot data extractor (USA and Canada).
Uses JSON-LD and data-automation selectors.
Class Method Summary collapse
-
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Home Depot product scraping Uses 'universal' source with JS rendering for reliable results.
-
.search_payload(query:, geo_location: nil) ⇒ Hash
Build payload for Home Depot search.
Instance Method Summary collapse
Class Method Details
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Home Depot product scraping
Uses 'universal' source with JS rendering for reliable results.
Enables redirect following to capture final canonical URL.
14 15 16 17 18 19 20 21 22 23 24 25 |
# File 'app/services/retailer/extractors/home_depot.rb', line 14 def self.build_payload(url:, geo_location: nil) payload = { source: 'universal', url: url, render: 'html', context: [ { key: 'follow_redirects', value: true } ] } payload[:geo_location] = geo_location if geo_location.present? payload end |
.search_payload(query:, geo_location: nil) ⇒ Hash
Build payload for Home Depot search
31 32 33 34 |
# File 'app/services/retailer/extractors/home_depot.rb', line 31 def self.search_payload(query:, geo_location: nil) url = "https://www.homedepot.com/s/#{CGI.escape(query)}" build_payload(url: url, geo_location: geo_location) end |
Instance Method Details
#extract(check, content) ⇒ Object
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'app/services/retailer/extractors/home_depot.rb', line 36 def extract(check, content) return unless valid_html?(content) check.scraper_source = source_name check.currency = catalog.id == HOME_DEPOT_CANADA ? 'CAD' : 'USD' doc = parse_html(content) # Check availability check.product_available = doc.at_css('[data-automation="add-to-cart"]').present? || content.include?('Add to Cart') || !content.include?('Out of Stock') # JSON-LD structured data (most reliable) extract_json_ld_price(check, doc) # Fallback: Home Depot price selectors extract_from_selectors(check, doc) if check.price.blank? # Extract title check.raw_title = extract_title(doc) end |