Class: Retailer::Extractors::Walmart
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::Walmart
- Defined in:
- app/services/retailer/extractors/walmart.rb
Overview
Walmart data extractor (USA and Canada).
Walmart pages use Next.js with NEXT_DATA containing the authoritative
buy box information. We extract from there first, validating the item ID
matches our expected product. Falls back to JSON-LD if needed.
Class Method Summary collapse
-
.build_payload(url:) ⇒ Hash
Build Oxylabs payload for Walmart product scraping Uses 'universal' source with JS rendering.
Instance Method Summary collapse
Class Method Details
.build_payload(url:) ⇒ Hash
Build Oxylabs payload for Walmart product scraping
Uses 'universal' source with JS rendering.
15 16 17 18 19 20 21 |
# File 'app/services/retailer/extractors/walmart.rb', line 15 def self.build_payload(url:) { source: 'universal', url: url, render: 'html' } end |
Instance Method Details
#extract(check, content) ⇒ Object
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'app/services/retailer/extractors/walmart.rb', line 23 def extract(check, content) return unless valid_html?(content) check.scraper_source = source_name check.currency = catalog.id == WALMART_SELLER_CANADA ? 'CAD' : 'USD' doc = parse_html(content) # Primary: Extract from __NEXT_DATA__ (buy box data, most reliable) extract_from_next_data(check, doc) # Fallback: JSON-LD with item ID validation extract_walmart_json_ld_price(check, doc) if check.price.blank? # Fallback: schema.org itemprop extract_from_itemprop(check, doc) if check.price.blank? # Fallback: Walmart-specific selectors extract_from_selectors(check, doc) if check.price.blank? # Extract title if not already set check.raw_title ||= extract_title(doc) end |