Retailer Price Monitoring System

Overview

The Retailer Price Monitoring system automatically tracks product prices and availability across third-party retail partners (Home Depot, Costco, Wayfair, Amazon, etc.). It detects MAP (Minimum Advertised Price) violations for vendor/1P catalogs and provides visibility into retailer pricing behavior.

Accessing the Feature

Retailer Probes Tab (Per Item)

Path: Catalog Items → [Select Item] → Retailer Probes tab
URL: https://crm.warmlyyours.me:3000/en-US/catalog_items/{id}?tab=retailer_probes

Only visible for catalog items in catalogs with external_price_check_enabled: true.

MAP Violations Search

Path: Catalogs → [Select Catalog] → "MAP Violations" link
Direct Search: ProductCatalogSearch with map_violation: true filter

Features

Automatic Price Checks

Daily automated checks via RetailerProbeWorker:

Feature Description
Price Extraction Captures current/sale price and original/was price
Availability Detection Detects "out of stock" and "unavailable" indicators
URL Validation Tracks if product pages are accessible
History Tracking Stores all checks in catalog_item_retailer_probes table

Manual "Probe Now" Button

Users with :update capability on a catalog item can trigger an immediate price check:

  • Button located on the Retailer Probes tab
  • Uses Oxylabs Realtime API for immediate results
  • Updates retail_price column on success

MAP Violation Detection

For vendor/1P catalogs (Home Depot, Costco, Wayfair, Lowe's, etc.):

Field Description
retailer_type :marketplace (3P) or :vendor (1P)
map_percentage Default 80% of MSRP (20% max discount)
map_price Calculated as msrp * map_percentage
map_violation true if retail_price < map_price

Catalog Show Page

For vendor catalogs, displays:

  • Retailer Type Badge: "Vendor (1P)" or "Marketplace (3P)"
  • MAP Percentage: e.g., "80% of MSRP"
  • MAP Violations Link: Quick search for active items in violation

Supported Retailers

Retailer Extractor Class Price Extraction
Amazon (all markets) Retailer::Extractors::Amazon Parsed JSON from amazon_product
Home Depot US/CA Retailer::Extractors::HomeDepot JSON-LD + HTML selectors
Costco CA Retailer::Extractors::Costco JSON-LD + aria-labels
Wayfair US/CA/DE Retailer::Extractors::Wayfair data-test-id attributes
Walmart US/CA Retailer::Extractors::Walmart __NEXT_DATA__ + JSON-LD
Lowe's US/CA Retailer::Extractors::Lowes JSON-LD + CSS selectors
Rona/RenoDepot Retailer::Extractors::Rona Browser automation + parsing
Build.com (Ferguson) Retailer::Extractors::BuildCom JSON-LD + URL discovery
Canadian Tire Retailer::Extractors::CanadianTire JSON-LD + CSS selectors
Houzz Retailer::Extractors::Houzz JSON-LD extraction
Best Buy Canada Retailer::Extractors::BestbuyCanada JSON-LD extraction

Technical Architecture

Clean Separation of Concerns

The system follows a clean architecture with clear separation:

┌─────────────────────────────────────────────────────────────┐
│                     PriceChecker                            │
│              (Orchestration - Fetch + Delegate)             │
│  • check(catalog_item)   • check_catalog(catalog)           │
│  • fetch_product(...)    • process_result(...)              │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ Delegates to
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              Extractors::Factory.for(catalog)               │
│                   (Returns correct extractor)               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                  Extractors (Per Retailer)                  │
│            (Payload Building + Data Extraction)             │
│                                                             │
│  Class Methods:                Instance Methods:            │
│  • build_payload(url:)         • extract(check, content)    │
│  • search_payload(query:)      • validate_product_identity  │
│  • discovery_payload           • discovered_url             │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ Uses
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     OxylabsApi                              │
│            (Pure HTTP Client - Transport Only)              │
│  • request(payload)      • submit_job(payload)              │
│  • job_status(id)        • job_results(id)                  │
│  • poll_for_results(id)  • submit_jobs_batch(payloads)      │
└─────────────────────────────────────────────────────────────┘

Extractor Factory Pattern

All retailers use the same code path through the factory:

# In PriceChecker#process_result - unified for ALL retailers
extractor = Retailer::Extractors::Factory.for(catalog_item.catalog)
extractor.extract(check, content)

# Validate product identity (prevents false positives from redirects)
unless extractor.validate_product_identity(check, content, catalog_item)
  check.price = nil  # Clear extracted data on mismatch
end

Each extractor handles its own extraction logic:

# Get the right extractor for a catalog
extractor = Retailer::Extractors::Factory.for(catalog)

# Build payload (class method - knows retailer API specifics)
payload = extractor.class.build_payload(url: product_url)

# Make request (API client - knows HTTP)
api = Retailer::OxylabsApi.new
result = api.request(payload)

# Extract data (instance method - knows HTML structure)
extractor.extract(check, Retailer::OxylabsApi.html_content(result))

Oxylabs Integration

Two integration methods depending on use case:

Method Use Case Endpoint
Realtime Single "Probe Now" requests Synchronous, blocks until result
Push-Pull Daily batch processing Async with webhook callback

Webhook Callback (Production)

POST https://api.warmlyyours.com/v1/oxylabs/results?token=<jwt>

Authentication via time-limited JWT token (24 hours):

  • Generated by Retailer::CallbackTokenService
  • Embedded in callback URL when submitting batch jobs
  • Validated by Api::V1::Oxylabs::WebhooksController

Development vs Production

Environment Batch Method Reason
Development Polling Local endpoints not reachable from Oxylabs
Production Webhook callback More efficient, no polling overhead

Database Schema

catalog_item_retailer_probes

Column Type Description
catalog_item_id bigint Foreign key to catalog_items
status string success, failed, not_found, pending
url string URL that was checked
price decimal Current/sale price extracted
regular_price decimal Original/was price (if on sale)
currency string Currency code (USD, CAD)
product_available boolean Availability status
page_accessible boolean Whether page loaded successfully
error_message text Error details if failed
raw_title string Product title from page
scraper_source string amazon, home_depot, wayfair, etc.
geo_location string ZIP code used for check
created_at timestamp When check was performed

catalogs (new columns)

Column Type Default Description
retailer_type integer 0 0=marketplace, 1=vendor
map_percentage decimal 0.80 MAP as percentage of MSRP
external_price_check_enabled boolean false Enable retailer probes

catalog_items

Column Description
retail_price Stores last successfully pulled retailer price
url Stored product URL (discovered or manual)

Services & Classes

Core Services

Service Purpose
Retailer::OxylabsApi Pure HTTP client for Oxylabs API
Retailer::PriceChecker Single-item price checks (realtime)
Retailer::BatchPriceChecker Batch processing (push-pull)
Retailer::UrlConstructor Builds product URLs per retailer
Retailer::CallbackTokenService JWT token generation/validation
Retailer::WebhookResultProcessor Processes webhook payloads

Extractor Classes

Extractor Key Features
Retailer::Extractors::Base Shared Nokogiri parsing, JSON-LD extraction, validate_product_identity
Retailer::Extractors::Factory Returns correct extractor for catalog ID
Retailer::Extractors::Amazon Handles JSON from amazon_product, ASIN validation override
Retailer::Extractors::HomeDepot data-automation + JSON-LD
Retailer::Extractors::Costco aria-labels + JSON-LD
Retailer::Extractors::Wayfair data-test-id selectors, URL discovery
Retailer::Extractors::Walmart __NEXT_DATA__ parsing + JSON-LD fallback
Retailer::Extractors::Lowes JSON-LD + CSS selectors
Retailer::Extractors::Rona Browser automation + URL discovery
Retailer::Extractors::BuildCom JSON-LD extraction, search page URL discovery
Retailer::Extractors::CanadianTire JSON-LD + CSS selectors
Retailer::Extractors::Houzz JSON-LD extraction
Retailer::Extractors::BestbuyCanada JSON-LD extraction
Retailer::Extractors::Generic Fallback for unknown retailers

Product Identity Validation

All extractors inherit validate_product_identity from Base class, which prevents false positives when retailers redirect to different products:

# In Retailer::Extractors::Base
def validate_product_identity(check, content, catalog_item)
  identifiers = collect_product_identifiers(catalog_item)
  # Checks SKU, UPC, third_party_part_number, third_party_sku, parent_sku
  # Returns false if none found in page content or URL
end

Amazon overrides this for direct ASIN comparison:

# In Retailer::Extractors::Amazon
def validate_product_identity(check, content, catalog_item)
  if content.is_a?(Hash) && content['asin'].present?
    return content['asin'] == catalog_item.amazon_asin
  end
  super  # Fall back to base class for HTML content
end

Workers

Worker Schedule Purpose
RetailerProbeWorker Daily Batch checks all enabled catalogs
OxylabsResultWorker On webhook Processes individual webhook results

Configuration

Oxylabs credentials stored in Rails credentials:

Heatwave::Configuration.fetch(:oxylabs, :api_username)
Heatwave::Configuration.fetch(:oxylabs, :api_password)

Catalog IDs defined in CatalogConstants:

CatalogConstants::HOME_DEPOT_USA      # => 1
CatalogConstants::AMAZON_SELLER_USA   # => 5
CatalogConstants::RONA_CA             # => 22
# etc.

Usage Examples

Trigger Manual Check

# Via worker (recommended)
RetailerProbeWorker.perform_async(catalog_item_id: 12345)

# Via service directly
checker = Retailer::PriceChecker.new
result = checker.check(CatalogItem.find(12345))

Check Entire Catalog

RetailerProbeWorker.perform_async(catalog_id: 18)

Check All Retailers

RetailerProbeWorker.perform_async

Find MAP Violations

# Via scope
ViewProductCatalog.vendor_catalogs.map_violations

# Via search
search = ProductCatalogSearch.create!(
  query_params: { map_violation_eq: true, catalog_item_state_in: ['active'] }
)

Rona URL Discovery

Rona requires URL discovery due to JS-rendered search results:

# Discover and save URLs for all Rona items without URLs
Retailer::Extractors::Rona.seed_catalog_urls

# Or via rake task
bundle exec rake retailer:discover_rona_urls

Security

  • Webhook endpoint requires valid JWT token
  • Tokens expire after 24 hours
  • Tokens are signed with Rails.application.secret_key_base
  • Invalid tokens return 401 Unauthorized

Adding a New Retailer

  1. Create extractor class in app/services/retailer/extractors/:
class Retailer::Extractors::NewRetailer < Retailer::Extractors::Base
  def self.build_payload(url:)
    { source: 'universal', url: url, render: 'html' }
  end

  def extract(check, content)
    return unless valid_html?(content)

    check.scraper_source = source_name
    check.currency = 'USD'  # or determine from catalog

    doc = parse_html(content)

    # Try JSON-LD first (most reliable)
    extract_json_ld_price(check, doc)

    # Fallback: retailer-specific selectors
    if check.price.blank?
      selectors = ['.price', '[data-price]', '[itemprop="price"]']
      extract_price_from_selectors(check, doc, selectors)
    end

    check.product_available = check_availability(content)
    check.raw_title = extract_title(doc)
  end
end
  1. Add to Factory in app/services/retailer/extractors/factory.rb:
def extractor_class(catalog_id)
  case catalog_id
  when NEW_RETAILER_CATALOG_ID
    Retailer::Extractors::NewRetailer
  # ... other cases
  end
end
  1. Add catalog constant in app/models/concerns/catalog_constants.rb if needed

  2. Add to CATALOG_RETAILER_TYPES in Retailer::PriceChecker for fetch method routing

  3. Add URL pattern to Retailer::UrlConstructor if retailer has predictable URL structure

  4. Override validate_product_identity if retailer returns data in a special format (like Amazon's JSON)