Class: Retailer::Extractors::Amazon
- Inherits:
-
Base
- Object
- Base
- Retailer::Extractors::Amazon
- Defined in:
- app/services/retailer/extractors/amazon.rb
Overview
Amazon data extractor.
Handles JSON responses from Oxylabs amazon_product API.
Class Method Summary collapse
-
.build_payload(asin:, domain: 'com', geo_location: nil) ⇒ Hash
Build Oxylabs payload for Amazon product scraping Uses the specialized 'amazon_product' source with parsed JSON output.
-
.search_payload(query:, domain: 'com', geo_location: nil, pages: 1) ⇒ Hash
Build payload for Amazon search.
Instance Method Summary collapse
- #extract(check, content) ⇒ Object
-
#validate_product_identity(check, content, catalog_item) ⇒ Boolean
Override to provide ASIN-specific validation for Amazon JSON responses.
Class Method Details
.build_payload(asin:, domain: 'com', geo_location: nil) ⇒ Hash
Build Oxylabs payload for Amazon product scraping
Uses the specialized 'amazon_product' source with parsed JSON output.
14 15 16 17 18 19 20 21 22 23 |
# File 'app/services/retailer/extractors/amazon.rb', line 14 def self.build_payload(asin:, domain: 'com', geo_location: nil) payload = { source: 'amazon_product', query: asin, domain: domain, parse: true } payload[:geo_location] = geo_location if geo_location.present? payload end |
.search_payload(query:, domain: 'com', geo_location: nil, pages: 1) ⇒ Hash
Build payload for Amazon search
31 32 33 34 35 36 37 38 39 40 41 |
# File 'app/services/retailer/extractors/amazon.rb', line 31 def self.search_payload(query:, domain: 'com', geo_location: nil, pages: 1) payload = { source: 'amazon_search', query: query, domain: domain, pages: pages, parse: true } payload[:geo_location] = geo_location if geo_location.present? payload end |
Instance Method Details
#extract(check, content) ⇒ Object
43 44 45 46 47 48 49 50 51 52 |
# File 'app/services/retailer/extractors/amazon.rb', line 43 def extract(check, content) check.scraper_source = source_name check.currency = currency_for_catalog if content.is_a?(Hash) || (content.is_a?(String) && content.strip.start_with?('{')) extract_from_json(check, content) else extract_from_html(check, content) end end |
#validate_product_identity(check, content, catalog_item) ⇒ Boolean
Override to provide ASIN-specific validation for Amazon JSON responses.
Amazon returns parsed JSON with an 'asin' field - we compare directly.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'app/services/retailer/extractors/amazon.rb', line 61 def validate_product_identity(check, content, catalog_item) # For Amazon JSON responses, compare ASIN directly if content.is_a?(Hash) && content['asin'].present? our_asin = catalog_item.amazon_asin if our_asin.present? && content['asin'] != our_asin check.status = 'product_mismatch' check. = "ASIN mismatch: expected #{our_asin}, got #{content['asin']}" Rails.logger.warn "[#{source_name}] ASIN mismatch for catalog_item #{catalog_item.id}: #{check.}" return false end return true # ASIN matches or we don't have one to compare end # For HTML content, use base class identifier search super end |