Class: Retailer::Extractors::Walmart

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/walmart.rb

Overview

Walmart data extractor (USA and Canada).

Walmart pages use Next.js with NEXT_DATA containing the authoritative
buy box information. We extract from there first, validating the item ID
matches our expected product. Falls back to JSON-LD if needed.

Constant Summary collapse

RENDER_REQUIRED =

Walmart embeds buy-box pricing in NEXT_DATA which is server-rendered.
Strong candidate to set false in a follow-up after manual confirmation
that NEXT_DATA is present without JS execution.

true

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.build_payload(url:) ⇒ Hash

Build Oxylabs payload for Walmart product scraping
Uses 'universal' source with JS rendering.

Parameters:

  • url (String)

    Full product URL

Returns:

  • (Hash)

    Oxylabs API payload



20
21
22
23
24
25
26
# File 'app/services/retailer/extractors/walmart.rb', line 20

def self.build_payload(url:)
  {
    source: 'universal',
    url: url,
    render: render_value
  }.compact
end

Instance Method Details

#extract(check, content) ⇒ Object



28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'app/services/retailer/extractors/walmart.rb', line 28

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = catalog.id == WALMART_SELLER_CANADA ? 'CAD' : 'USD'

  doc = parse_html(content)

  # Primary: Extract from __NEXT_DATA__ (buy box data, most reliable)
  extract_from_next_data(check, doc)

  # Fallback: JSON-LD with item ID validation
  extract_walmart_json_ld_price(check, doc) if check.price.blank?

  # Fallback: schema.org itemprop
  extract_from_itemprop(check, doc) if check.price.blank?

  # Fallback: Walmart-specific selectors
  extract_from_selectors(check, doc) if check.price.blank?

  # Extract title if not already set
  check.raw_title ||= extract_title(doc)
end