Class: Retailer::Extractors::Costco

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/costco.rb

Overview

Costco data extractor (USA and Canada).
Costco Canada uses a React/MUI SPA that requires JS rendering and correct
geo_location to serve regional pricing.

Constant Summary collapse

GEO_LOCATIONS =

Oxylabs universal source needs country names (not postal codes) for
Costco to serve the correct regional site and pricing.

{
  'CAN' => 'Canada',
  'USA' => 'United States'
}.freeze

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.build_payload(url:, geo_location: nil) ⇒ Hash

Build Oxylabs payload for Costco product scraping.
Uses 'universal' source with JS rendering and browser_instructions
to wait for the React pricing component to hydrate.

Parameters:

  • url (String)

    Full product URL

  • geo_location (String, nil) (defaults to: nil)

    Country name for Oxylabs proxy (e.g. 'Canada')

Returns:

  • (Hash)

    Oxylabs API payload



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'app/services/retailer/extractors/costco.rb', line 22

def self.build_payload(url:, geo_location: nil)
  geo_location ||= geo_location_from_url(url)

  payload = {
    source: 'universal',
    url: url,
    render: 'html',
    browser_instructions: [
      {
        type: 'wait_for_element',
        selector: { type: 'css', value: '[data-testid="single-price-content"]' },
        timeout_s: 20,
        on_error: 'skip'
      },
      { type: 'wait', wait_time_s: 3 }
    ]
  }
  payload[:geo_location] = geo_location if geo_location.present?
  payload
end

.geo_location_from_url(url) ⇒ Object

Infer country from the Costco URL domain (.ca vs .com)



44
45
46
47
48
# File 'app/services/retailer/extractors/costco.rb', line 44

def self.geo_location_from_url(url)
  return 'Canada' if url.to_s.include?('costco.ca')

  'United States'
end

.search_payload(query:) ⇒ Hash

Build payload for Costco search

Parameters:

  • query (String)

    Search query

Returns:

  • (Hash)

    Oxylabs API payload



53
54
55
56
# File 'app/services/retailer/extractors/costco.rb', line 53

def self.search_payload(query:)
  url = "https://www.costco.com/CatalogSearch?dept=All&keyword=#{CGI.escape(query)}"
  build_payload(url: url)
end

Instance Method Details

#extract(check, content) ⇒ Object



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'app/services/retailer/extractors/costco.rb', line 58

def extract(check, content)
  return unless valid_html?(content)

  check.scraper_source = source_name
  check.currency = catalog.country_iso3 == 'USA' ? 'USD' : 'CAD'

  doc = parse_html(content)

  # JSON-LD structured data first (handles variant matching by SKU)
  extract_json_ld_price(check, doc)

  # Fallback: MUI data-testid selectors (new Costco Canada React design)
  extract_from_mui_price(check, doc) if check.price.blank?

  # Fallback: Costco price selectors
  extract_from_selectors(check, doc) if check.price.blank?

  # Fallback: aria-label with price
  extract_from_aria_labels(check, doc) if check.price.blank?

  # Extract regular price (before discounts) if there's a sale
  extract_regular_price(check, doc)

  # Availability: prefer MUI/JSON-LD signals over text matching
  check.product_available = check_costco_availability(doc, content)

  # Extract title
  check.raw_title = extract_title(doc)
end