Class: Retailer::OxylabsApi

Inherits:
Object
  • Object
show all
Defined in:
app/services/retailer/oxylabs_api.rb

Overview

Oxylabs Web Scraper API client - Pure HTTP client for Oxylabs API.
All retailer-specific logic lives in Retailer::Extractors::* classes.

This class handles:

  • Realtime (synchronous) requests
  • Push-Pull (async) batch processing
  • Job polling and result retrieval

Examples:

Basic synchronous request

api = Retailer::OxylabsApi.new
result = api.request(source: 'universal', url: 'https://example.com', render: 'html')

Async batch processing

api = Retailer::OxylabsApi.new
job = api.submit_job(payload)
result = api.poll_for_results(job.job_id)

Defined Under Namespace

Classes: JobResult, Result

Constant Summary collapse

REALTIME_ENDPOINT =

Realtime endpoint (synchronous - waits for result)

'https://realtime.oxylabs.io/v1/queries'
ASYNC_ENDPOINT =

Push-Pull endpoints (asynchronous - submit jobs, retrieve later)

'https://data.oxylabs.io/v1/queries'
ASYNC_RESULTS_ENDPOINT =
  • /job_id/results
'https://data.oxylabs.io/v1/queries'
DEFAULT_TIMEOUT =

seconds

60
BATCH_TIMEOUT =

seconds for batch submissions

30

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ OxylabsApi

Returns a new instance of OxylabsApi.



48
49
50
51
52
53
# File 'app/services/retailer/oxylabs_api.rb', line 48

def initialize(options = {})
  @username = options[:username] || Heatwave::Configuration.fetch(:oxylabs, :api_username)
  @password = options[:password] || Heatwave::Configuration.fetch(:oxylabs, :api_password)
  @timeout = options[:timeout] || DEFAULT_TIMEOUT
  @logger = options[:logger] || Rails.logger
end

Class Method Details

.html_content(result) ⇒ String?

Extract the HTML content from a result

Parameters:

  • result (Result)

    API result

Returns:

  • (String, nil)


258
259
260
261
262
# File 'app/services/retailer/oxylabs_api.rb', line 258

def self.html_content(result)
  return nil unless result.success?

  result.data&.first&.dig('content').to_s
end

.parsed_content(result) ⇒ Hash?

Extract parsed content from a result (when parse: true)

Parameters:

  • result (Result)

    API result

Returns:

  • (Hash, nil)


267
268
269
270
271
272
# File 'app/services/retailer/oxylabs_api.rb', line 267

def self.parsed_content(result)
  return nil unless result.success?

  content = result.data&.first&.dig('content')
  content.is_a?(Hash) ? content : nil
end

Instance Method Details

#amazon_product(asin:, domain:, geo_location: nil, parse: false) ⇒ Result

Scrape Amazon product page by ASIN

Parameters:

  • asin (String)

    Amazon ASIN

  • domain (String)

    Amazon domain (e.g., 'amazon.com', 'amazon.ca')

  • geo_location (String, nil) (defaults to: nil)

    ZIP/postal code

  • parse (Boolean) (defaults to: false)

    Whether to use Oxylabs' built-in parser

Returns:



210
211
212
213
214
215
216
217
218
219
# File 'app/services/retailer/oxylabs_api.rb', line 210

def amazon_product(asin:, domain:, geo_location: nil, parse: false)
  payload = {
    source: 'amazon_product',
    domain: domain,
    query: asin,
    parse: parse
  }
  payload[:geo_location] = geo_location if geo_location.present?
  request(payload)
end

#amazon_url(url:, render: true) ⇒ Result

Scrape any Amazon page by URL (search, category, bestsellers, product, etc.)
Uses Oxylabs' dedicated Amazon infrastructure without requiring an ASIN.

Parameters:

  • url (String)

    Full Amazon URL

  • render (Boolean) (defaults to: true)

    Whether to render JavaScript (default: true)

Returns:



226
227
228
229
230
231
232
# File 'app/services/retailer/oxylabs_api.rb', line 226

def amazon_url(url:, render: true)
  request(
    source: 'amazon',
    url: url,
    render: render ? 'html' : nil
  )
end

#costco_product(url:, geo_location: nil) ⇒ Result

Scrape Costco product page

Parameters:

  • url (String)

    Product URL

Returns:



246
247
248
249
# File 'app/services/retailer/oxylabs_api.rb', line 246

def costco_product(url:, geo_location: nil)
  payload = Retailer::Extractors::Costco.build_payload(url: url, geo_location: geo_location)
  request(payload)
end

#home_depot_product(url:, geo_location: nil) ⇒ Result

Scrape Home Depot product page

Parameters:

  • url (String)

    Product URL

  • geo_location (String, nil) (defaults to: nil)

    ZIP code

Returns:



238
239
240
241
# File 'app/services/retailer/oxylabs_api.rb', line 238

def home_depot_product(url:, geo_location: nil)
  payload = Retailer::Extractors::HomeDepot.build_payload(url: url, geo_location: geo_location)
  request(payload)
end

#job_results(job_id) ⇒ Result

Retrieve job results (when status is 'done')

Parameters:

  • job_id (String)

    Job ID from submit_job

Returns:

  • (Result)

    Same format as realtime results



147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'app/services/retailer/oxylabs_api.rb', line 147

def job_results(job_id)
  @logger.info "[Oxylabs] Retrieving results for job: #{job_id}"

  response = http_client
             .timeout(@timeout)
             .basic_auth(user: @username, pass: @password)
             .get("#{ASYNC_RESULTS_ENDPOINT}/#{job_id}/results")

  handle_response(response)
rescue HTTP::Error => e
  @logger.error "[Oxylabs] Results retrieval error: #{e.message}"
  Result.new(success: false, data: nil, error: e.message, status_code: nil, raw_response: nil)
end

#job_status(job_id) ⇒ JobResult

Check job status

Parameters:

  • job_id (String)

    Job ID from submit_job

Returns:



132
133
134
135
136
137
138
139
140
141
142
# File 'app/services/retailer/oxylabs_api.rb', line 132

def job_status(job_id)
  response = http_client
             .timeout(BATCH_TIMEOUT)
             .basic_auth(user: @username, pass: @password)
             .get("#{ASYNC_ENDPOINT}/#{job_id}")

  handle_job_status(response, job_id)
rescue HTTP::Error => e
  @logger.error "[Oxylabs] Job status error: #{e.message}"
  JobResult.new(success: false, job_id: job_id, error: e.message, status: 'error')
end

#poll_for_results(job_id, max_attempts: 30, interval: 2) ⇒ Result

Poll for job completion and retrieve results

Parameters:

  • job_id (String)

    Job ID

  • max_attempts (Integer) (defaults to: 30)

    Maximum polling attempts (default: 30)

  • interval (Integer) (defaults to: 2)

    Seconds between polls (default: 2)

Returns:

  • (Result)

    Final results or error



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# File 'app/services/retailer/oxylabs_api.rb', line 166

def poll_for_results(job_id, max_attempts: 30, interval: 2)
  attempts = 0

  loop do
    attempts += 1
    status = job_status(job_id)

    case status.status
    when 'done'
      return job_results(job_id)
    when 'faulted'
      return Result.new(success: false, data: nil, error: 'Job faulted', status_code: nil, raw_response: nil)
    when 'pending'
      return Result.new(success: false, data: nil, error: 'Polling timeout', status_code: nil, raw_response: nil) if attempts >= max_attempts

      sleep(interval)
    else
      return Result.new(success: false, data: nil, error: "Unknown status: #{status.status}", status_code: nil, raw_response: nil)
    end
  end
end

#request(payload) ⇒ Result

Make a synchronous (realtime) request to the Oxylabs API
Blocks until result is returned. Best for single "Probe Now" requests.

Parameters:

  • payload (Hash)

    Request payload

Returns:



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'app/services/retailer/oxylabs_api.rb', line 63

def request(payload)
  @logger.info "[Oxylabs] Realtime request: #{payload.except(:context).to_json}"

  response = http_client
             .timeout(@timeout)
             .basic_auth(user: @username, pass: @password)
             .post(REALTIME_ENDPOINT, json: payload)

  handle_response(response)
rescue HTTP::Error => e
  @logger.error "[Oxylabs] HTTP Error: #{e.message}"
  Result.new(success: false, data: nil, error: e.message, status_code: nil, raw_response: nil)
rescue StandardError => e
  @logger.error "[Oxylabs] Error: #{e.class} - #{e.message}"
  Result.new(success: false, data: nil, error: e.message, status_code: nil, raw_response: nil)
end

#scrape_url(url:, render: true) ⇒ Result

Scrape any URL with universal source

Parameters:

  • url (String)

    URL to scrape

  • render (Boolean) (defaults to: true)

    Whether to render JavaScript

Returns:



196
197
198
199
200
201
202
# File 'app/services/retailer/oxylabs_api.rb', line 196

def scrape_url(url:, render: true)
  request(
    source: 'universal',
    url: url,
    render: render ? 'html' : nil
  )
end

#submit_job(payload, callback_url: nil) ⇒ JobResult

Submit a job asynchronously (Push-Pull method)
Returns immediately with a job_id for later retrieval.
Best for batch processing many items.

Parameters:

  • payload (Hash)

    Request payload (same as realtime)

  • callback_url (String, nil) (defaults to: nil)

    Optional webhook URL for result delivery

Returns:



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'app/services/retailer/oxylabs_api.rb', line 91

def submit_job(payload, callback_url: nil)
  # Add callback_url if provided (for webhook delivery instead of polling)
  payload = payload.merge(callback_url: callback_url) if callback_url.present?

  @logger.info "[Oxylabs] Submitting async job: #{payload.except(:context).to_json}"

  response = http_client
             .timeout(BATCH_TIMEOUT)
             .basic_auth(user: @username, pass: @password)
             .post(ASYNC_ENDPOINT, json: payload)

  handle_job_submission(response)
rescue HTTP::Error => e
  @logger.error "[Oxylabs] Async submit error: #{e.message}"
  JobResult.new(success: false, job_id: nil, error: e.message, status: 'error')
rescue StandardError => e
  @logger.error "[Oxylabs] Async error: #{e.class} - #{e.message}"
  JobResult.new(success: false, job_id: nil, error: e.message, status: 'error')
end

#submit_jobs_batch(payloads) ⇒ Array<JobResult>

Submit multiple jobs in parallel
Each payload may already contain its own callback_url for per-job tracking

Parameters:

  • payloads (Array<Hash>)

    Array of request payloads (may include callback_url)

Returns:

  • (Array<JobResult>)

    Array of job results with job_ids



115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'app/services/retailer/oxylabs_api.rb', line 115

def submit_jobs_batch(payloads)
  @logger.info "[Oxylabs] Submitting #{payloads.size} async jobs"

  # Submit all jobs in parallel using threads
  # callback_url is extracted from each payload if present
  threads = payloads.map do |payload|
    callback_url = payload[:callback_url]
    job_payload = payload.except(:callback_url)
    Thread.new { submit_job(job_payload, callback_url: callback_url) }
  end

  threads.map(&:value)
end