Class: Retailer::Extractors::Wayfair

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/wayfair.rb

Overview

Wayfair data extractor.
Uses data-test-id attributes for reliable price extraction.

Wayfair Variant Handling:
When searching by internal SKU (e.g., TCT240-3.7W-749-FS), Wayfair redirects
to the parent product page with URL params like ?redir=SKU&piid=123,456.
The page initially shows the LOWEST variant price, then JavaScript updates
the selection based on URL parameters. We use browser_instructions to wait
for the variant selection to complete before extracting the price.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.browser_instructionsArray<Hash>

Browser instructions to wait for Wayfair's variant selection to complete.
Wayfair uses JavaScript to update pricing based on URL params (redir, piid).
We wait for the price element to stabilize after redirect/variant selection.
Reference: https://github.com/oxylabs/how-to-scrape-wayfair

Returns:

  • (Array<Hash>)

    Oxylabs browser instructions



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'app/services/retailer/extractors/wayfair.rb', line 42

def self.browser_instructions
  [
    # Wait for initial page load and price element to appear
    # Primary selector from data-test-id (most reliable)
    {
      type: 'wait_for_element',
      selector: {
        type: 'css',
        value: '[data-test-id="PriceDisplay"]'
      },
      timeout_s: 10
    },
    # Additional wait for variant selection JavaScript to complete
    # Wayfair's redirect/variant selection takes ~2-5 seconds
    { type: 'wait', wait_time_s: 5 }
  ]
end

.build_payload(url:, geo_location: nil) ⇒ Hash

Build Oxylabs payload for Wayfair product scraping
Uses 'universal' source with JS rendering and browser_instructions
to wait for variant-specific pricing to load.
Reference: https://github.com/oxylabs/how-to-scrape-wayfair

Parameters:

  • url (String)

    Full product URL

  • geo_location (String, nil) (defaults to: nil)

    Country for pricing (default: United States)

Returns:

  • (Hash)

    Oxylabs API payload



22
23
24
25
26
27
28
29
30
31
32
33
34
# File 'app/services/retailer/extractors/wayfair.rb', line 22

def self.build_payload(url:, geo_location: nil)
  {
    source: 'universal',
    url: url,
    render: 'html',
    user_agent_type: 'desktop_safari',
    geo_location: geo_location || 'United States',
    context: [
      { key: 'follow_redirects', value: true }
    ],
    browser_instructions: browser_instructions
  }
end

Instance Method Details

#extract(check, content) ⇒ Object



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'app/services/retailer/extractors/wayfair.rb', line 60

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = catalog.id == WAYFAIR_CANADA ? 'CAD' : 'USD'

  doc = parse_html(content)

  # Check availability
  check.product_available = doc.at_css('[data-test-id="AddToCartButton"]').present? ||
                            !content.include?('Out of Stock')

  # IMPORTANT: Scope price extraction to main product pricing section only.
  # Wayfair pages include sponsored ads with prices - we must ignore those.
  # The main product pricing is in a container with data-name="Pricing"
  pricing_section = find_main_pricing_section(doc)

  # Sale price: data-test-id="StandardPricingPrice-SALE" (when item is on sale)
  # Primary price: data-test-id="StandardPricingPrice-PRIMARY" (when not on sale)
  extract_current_price(check, pricing_section)

  # Original/was price: data-test-id="StandardPricingPrice-PREVIOUS"
  extract_previous_price(check, pricing_section)

  # Fallback: Collect all PriceDisplay elements within pricing section
  extract_fallback_prices(check, pricing_section) if check.price.blank?

  # Fallback: JSON-LD schema.org (page-wide, but structured data)
  extract_json_ld_price(check, doc) if check.price.blank?
end