Class: Retailer::Extractors::Amazon

Inherits:
Base
  • Object
show all
Defined in:
app/services/retailer/extractors/amazon.rb

Overview

Amazon data extractor.
Handles JSON responses from Oxylabs amazon_product API.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.build_payload(asin:, domain: 'com', geo_location: nil) ⇒ Hash

Build Oxylabs payload for Amazon product scraping
Uses the specialized 'amazon_product' source with parsed JSON output.

Parameters:

  • asin (String)

    Amazon ASIN

  • domain (String) (defaults to: 'com')

    Amazon domain (com, ca, co.uk, etc.)

  • geo_location (String, nil) (defaults to: nil)

    ZIP code or country code

Returns:

  • (Hash)

    Oxylabs API payload



14
15
16
17
18
19
20
21
22
23
# File 'app/services/retailer/extractors/amazon.rb', line 14

def self.build_payload(asin:, domain: 'com', geo_location: nil)
  payload = {
    source: 'amazon_product',
    query: asin,
    domain: domain,
    parse: true
  }
  payload[:geo_location] = geo_location if geo_location.present?
  payload
end

.search_payload(query:, domain: 'com', geo_location: nil, pages: 1) ⇒ Hash

Build payload for Amazon search

Parameters:

  • query (String)

    Search query

  • domain (String) (defaults to: 'com')

    Amazon domain

  • geo_location (String, nil) (defaults to: nil)

    ZIP code or country code

  • pages (Integer) (defaults to: 1)

    Number of pages to fetch

Returns:

  • (Hash)

    Oxylabs API payload



31
32
33
34
35
36
37
38
39
40
41
# File 'app/services/retailer/extractors/amazon.rb', line 31

def self.search_payload(query:, domain: 'com', geo_location: nil, pages: 1)
  payload = {
    source: 'amazon_search',
    query: query,
    domain: domain,
    pages: pages,
    parse: true
  }
  payload[:geo_location] = geo_location if geo_location.present?
  payload
end

Instance Method Details

#extract(check, content) ⇒ Object



43
44
45
46
47
48
49
50
51
52
# File 'app/services/retailer/extractors/amazon.rb', line 43

def extract(check, content)
  check.scraper_source = source_name
  check.currency = currency_for_catalog

  if content.is_a?(Hash) || (content.is_a?(String) && content.strip.start_with?('{'))
    extract_from_json(check, content)
  else
    extract_from_html(check, content)
  end
end

#validate_product_identity(check, content, catalog_item) ⇒ Boolean

Override to provide ASIN-specific validation for Amazon JSON responses.
Amazon returns parsed JSON with an 'asin' field - we compare directly.

Parameters:

  • check (CatalogItemRetailerProbe)

    The probe record to update

  • content (String, Hash)

    HTML string or parsed JSON data

  • catalog_item (CatalogItem)

    The catalog item being checked

Returns:

  • (Boolean)

    true if validation passed



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'app/services/retailer/extractors/amazon.rb', line 61

def validate_product_identity(check, content, catalog_item)
  # For Amazon JSON responses, compare ASIN directly
  if content.is_a?(Hash) && content['asin'].present?
    our_asin = catalog_item.amazon_asin
    if our_asin.present? && content['asin'] != our_asin
      check.status = 'product_mismatch'
      check.error_message = "ASIN mismatch: expected #{our_asin}, got #{content['asin']}"
      Rails.logger.warn "[#{source_name}] ASIN mismatch for catalog_item #{catalog_item.id}: #{check.error_message}"
      return false
    end
    return true # ASIN matches or we don't have one to compare
  end

  # For HTML content, use base class identifier search
  super
end