Class: Retailer::Extractors::Wayfair
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::Wayfair
- Defined in:
- app/services/retailer/extractors/wayfair.rb
Overview
Wayfair data extractor.
Uses data-test-id attributes for reliable price extraction.
Wayfair Variant Handling:
When searching by internal SKU (e.g., TCT240-3.7W-749-FS), Wayfair redirects
to the parent product page with URL params like ?redir=SKU&piid=123,456.
The page initially shows the LOWEST variant price, then JavaScript updates
the selection based on URL parameters. We use browser_instructions to wait
for the variant selection to complete before extracting the price.
Class Method Summary collapse
-
.browser_instructions ⇒ Array<Hash>
Browser instructions to wait for Wayfair's variant selection to complete.
-
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Wayfair product scraping Uses 'universal' source with JS rendering and browser_instructions to wait for variant-specific pricing to load.
Instance Method Summary collapse
Class Method Details
.browser_instructions ⇒ Array<Hash>
Browser instructions to wait for Wayfair's variant selection to complete.
Wayfair uses JavaScript to update pricing based on URL params (redir, piid).
We wait for the price element to stabilize after redirect/variant selection.
Reference: https://github.com/oxylabs/how-to-scrape-wayfair
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'app/services/retailer/extractors/wayfair.rb', line 42 def self.browser_instructions [ # Wait for initial page load and price element to appear # Primary selector from data-test-id (most reliable) { type: 'wait_for_element', selector: { type: 'css', value: '[data-test-id="PriceDisplay"]' }, timeout_s: 10 }, # Additional wait for variant selection JavaScript to complete # Wayfair's redirect/variant selection takes ~2-5 seconds { type: 'wait', wait_time_s: 5 } ] end |
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Wayfair product scraping
Uses 'universal' source with JS rendering and browser_instructions
to wait for variant-specific pricing to load.
Reference: https://github.com/oxylabs/how-to-scrape-wayfair
22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'app/services/retailer/extractors/wayfair.rb', line 22 def self.build_payload(url:, geo_location: nil) { source: 'universal', url: url, render: 'html', user_agent_type: 'desktop_safari', geo_location: geo_location || 'United States', context: [ { key: 'follow_redirects', value: true } ], browser_instructions: browser_instructions } end |
Instance Method Details
#extract(check, content) ⇒ Object
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'app/services/retailer/extractors/wayfair.rb', line 60 def extract(check, content) return unless valid_html?(content) check.scraper_source = source_name check.currency = catalog.id == WAYFAIR_CANADA ? 'CAD' : 'USD' doc = parse_html(content) # Check availability check.product_available = doc.at_css('[data-test-id="AddToCartButton"]').present? || !content.include?('Out of Stock') # IMPORTANT: Scope price extraction to main product pricing section only. # Wayfair pages include sponsored ads with prices - we must ignore those. # The main product pricing is in a container with data-name="Pricing" pricing_section = find_main_pricing_section(doc) # Sale price: data-test-id="StandardPricingPrice-SALE" (when item is on sale) # Primary price: data-test-id="StandardPricingPrice-PRIMARY" (when not on sale) extract_current_price(check, pricing_section) # Original/was price: data-test-id="StandardPricingPrice-PREVIOUS" extract_previous_price(check, pricing_section) # Fallback: Collect all PriceDisplay elements within pricing section extract_fallback_prices(check, pricing_section) if check.price.blank? # Fallback: JSON-LD schema.org (page-wide, but structured data) extract_json_ld_price(check, doc) if check.price.blank? end |