Class: Retailer::Extractors::Costco
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::Costco
- Defined in:
- app/services/retailer/extractors/costco.rb
Overview
Costco data extractor (USA and Canada).
Costco Canada uses a React/MUI SPA that requires JS rendering and correct
geo_location to serve regional pricing.
Constant Summary collapse
- GEO_LOCATIONS =
Oxylabs universal source needs country names (not postal codes) for
Costco to serve the correct regional site and pricing. { 'CAN' => 'Canada', 'USA' => 'United States' }.freeze
Class Method Summary collapse
-
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Costco product scraping.
-
.geo_location_from_url(url) ⇒ Object
Infer country from the Costco URL domain (.ca vs .com).
-
.search_payload(query:) ⇒ Hash
Build payload for Costco search.
Instance Method Summary collapse
Class Method Details
.build_payload(url:, geo_location: nil) ⇒ Hash
Build Oxylabs payload for Costco product scraping.
Uses 'universal' source with JS rendering and browser_instructions
to wait for the React pricing component to hydrate.
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'app/services/retailer/extractors/costco.rb', line 22 def self.build_payload(url:, geo_location: nil) geo_location ||= geo_location_from_url(url) payload = { source: 'universal', url: url, render: 'html', browser_instructions: [ { type: 'wait_for_element', selector: { type: 'css', value: '[data-testid="single-price-content"]' }, timeout_s: 20, on_error: 'skip' }, { type: 'wait', wait_time_s: 3 } ] } payload[:geo_location] = geo_location if geo_location.present? payload end |
.geo_location_from_url(url) ⇒ Object
Infer country from the Costco URL domain (.ca vs .com)
44 45 46 47 48 |
# File 'app/services/retailer/extractors/costco.rb', line 44 def self.geo_location_from_url(url) return 'Canada' if url.to_s.include?('costco.ca') 'United States' end |
.search_payload(query:) ⇒ Hash
Build payload for Costco search
53 54 55 56 |
# File 'app/services/retailer/extractors/costco.rb', line 53 def self.search_payload(query:) url = "https://www.costco.com/CatalogSearch?dept=All&keyword=#{CGI.escape(query)}" build_payload(url: url) end |
Instance Method Details
#extract(check, content) ⇒ Object
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'app/services/retailer/extractors/costco.rb', line 58 def extract(check, content) return unless valid_html?(content) check.scraper_source = source_name check.currency = catalog.country_iso3 == 'USA' ? 'USD' : 'CAD' doc = parse_html(content) # JSON-LD structured data first (handles variant matching by SKU) extract_json_ld_price(check, doc) # Fallback: MUI data-testid selectors (new Costco Canada React design) extract_from_mui_price(check, doc) if check.price.blank? # Fallback: Costco price selectors extract_from_selectors(check, doc) if check.price.blank? # Fallback: aria-label with price extract_from_aria_labels(check, doc) if check.price.blank? # Extract regular price (before discounts) if there's a sale extract_regular_price(check, doc) # Availability: prefer MUI/JSON-LD signals over text matching check.product_available = check_costco_availability(doc, content) # Extract title check.raw_title = extract_title(doc) end |