Class: Retailer::Extractors::HomeDepot
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::HomeDepot
- Defined in:
- app/services/retailer/extractors/home_depot.rb
Overview
Home Depot data extractor (USA and Canada).
Uses JSON-LD and data-automation selectors.
Constant Summary collapse
- RENDER_REQUIRED =
SPA-heavy site; price markup is rendered client-side. Keep on.
true
Class Method Summary collapse
-
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Home Depot product scraping Uses 'universal' source with JS rendering for reliable results.
-
.search_payload(query:, geo_location: nil) ⇒ Hash
Build payload for Home Depot search.
Instance Method Summary collapse
Class Method Details
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Home Depot product scraping
Uses 'universal' source with JS rendering for reliable results.
Enables redirect following to capture final canonical URL.
17 18 19 20 21 22 23 24 25 26 27 28 |
# File 'app/services/retailer/extractors/home_depot.rb', line 17 def self.build_payload(url:, geo_location: nil) payload = { source: 'universal', url: url, render: render_value, context: [ { key: 'follow_redirects', value: true } ] }.compact payload[:geo_location] = geo_location if geo_location.present? payload end |
.search_payload(query:, geo_location: nil) ⇒ Hash
Build payload for Home Depot search
34 35 36 37 |
# File 'app/services/retailer/extractors/home_depot.rb', line 34 def self.search_payload(query:, geo_location: nil) url = "https://www.homedepot.com/s/#{CGI.escape(query)}" build_payload(url: url, geo_location: geo_location) end |
Instance Method Details
#extract(check, content) ⇒ Object
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'app/services/retailer/extractors/home_depot.rb', line 39 def extract(check, content) return unless valid_html?(content) check.scraper_source = source_name check.currency = catalog.id == HOME_DEPOT_CANADA ? 'CAD' : 'USD' doc = parse_html(content) # Check availability check.product_available = doc.at_css('[data-automation="add-to-cart"]').present? || content.include?('Add to Cart') || content.exclude?('Out of Stock') # JSON-LD structured data (most reliable) extract_json_ld_price(check, doc) # Fallback: Home Depot price selectors extract_from_selectors(check, doc) if check.price.blank? # Extract title check.raw_title = extract_title(doc) end |