Class: Retailer::OxylabsApi
- Inherits:
-
Object
- Object
- Retailer::OxylabsApi
- Defined in:
- app/services/retailer/oxylabs_api.rb
Overview
Oxylabs Web Scraper API client - Pure HTTP client for Oxylabs API.
All retailer-specific logic lives in Retailer::Extractors::* classes.
This class handles:
- Realtime (synchronous) requests
- Push-Pull (async) batch processing
- Job polling and result retrieval
Defined Under Namespace
Constant Summary collapse
- REALTIME_ENDPOINT =
Realtime endpoint (synchronous - waits for result)
'https://realtime.oxylabs.io/v1/queries'- ASYNC_ENDPOINT =
Push-Pull endpoints (asynchronous - submit jobs, retrieve later)
'https://data.oxylabs.io/v1/queries'- ASYNC_RESULTS_ENDPOINT =
- /job_id/results
'https://data.oxylabs.io/v1/queries'- DEFAULT_TIMEOUT =
seconds
60- BATCH_TIMEOUT =
seconds for batch submissions
30
Class Method Summary collapse
-
.html_content(result) ⇒ String?
Extract the HTML content from a result.
-
.parsed_content(result) ⇒ Hash?
Extract parsed content from a result (when parse: true).
Instance Method Summary collapse
-
#amazon_product(asin:, domain:, geo_location: nil, parse: false) ⇒ Result
Scrape Amazon product page by ASIN.
-
#amazon_url(url:, render: true) ⇒ Result
Scrape any Amazon page by URL (search, category, bestsellers, product, etc.) Uses Oxylabs' dedicated Amazon infrastructure without requiring an ASIN.
-
#costco_product(url:, geo_location: nil) ⇒ Result
Scrape Costco product page.
-
#home_depot_product(url:, geo_location: nil) ⇒ Result
Scrape Home Depot product page.
-
#initialize(options = {}) ⇒ OxylabsApi
constructor
A new instance of OxylabsApi.
-
#job_results(job_id) ⇒ Result
Retrieve job results (when status is 'done').
-
#job_status(job_id) ⇒ JobResult
Check job status.
-
#poll_for_results(job_id, max_attempts: 30, interval: 2) ⇒ Result
Poll for job completion and retrieve results.
-
#request(payload) ⇒ Result
Make a synchronous (realtime) request to the Oxylabs API Blocks until result is returned.
-
#scrape_url(url:, render: true) ⇒ Result
Scrape any URL with universal source.
-
#submit_job(payload, callback_url: nil) ⇒ JobResult
Submit a job asynchronously (Push-Pull method) Returns immediately with a job_id for later retrieval.
-
#submit_jobs_batch(payloads) ⇒ Array<JobResult>
Submit multiple jobs in parallel Each payload may already contain its own callback_url for per-job tracking.
Constructor Details
#initialize(options = {}) ⇒ OxylabsApi
Returns a new instance of OxylabsApi.
48 49 50 51 52 53 |
# File 'app/services/retailer/oxylabs_api.rb', line 48 def initialize( = {}) @username = [:username] || Heatwave::Configuration.fetch(:oxylabs, :api_username) @password = [:password] || Heatwave::Configuration.fetch(:oxylabs, :api_password) @timeout = [:timeout] || DEFAULT_TIMEOUT @logger = [:logger] || Rails.logger end |
Class Method Details
.html_content(result) ⇒ String?
Extract the HTML content from a result
258 259 260 261 262 |
# File 'app/services/retailer/oxylabs_api.rb', line 258 def self.html_content(result) return nil unless result.success? result.data&.first&.dig('content').to_s end |
.parsed_content(result) ⇒ Hash?
Extract parsed content from a result (when parse: true)
267 268 269 270 271 272 |
# File 'app/services/retailer/oxylabs_api.rb', line 267 def self.parsed_content(result) return nil unless result.success? content = result.data&.first&.dig('content') content.is_a?(Hash) ? content : nil end |
Instance Method Details
#amazon_product(asin:, domain:, geo_location: nil, parse: false) ⇒ Result
Scrape Amazon product page by ASIN
210 211 212 213 214 215 216 217 218 219 |
# File 'app/services/retailer/oxylabs_api.rb', line 210 def amazon_product(asin:, domain:, geo_location: nil, parse: false) payload = { source: 'amazon_product', domain: domain, query: asin, parse: parse } payload[:geo_location] = geo_location if geo_location.present? request(payload) end |
#amazon_url(url:, render: true) ⇒ Result
Scrape any Amazon page by URL (search, category, bestsellers, product, etc.)
Uses Oxylabs' dedicated Amazon infrastructure without requiring an ASIN.
226 227 228 229 230 231 232 |
# File 'app/services/retailer/oxylabs_api.rb', line 226 def amazon_url(url:, render: true) request( source: 'amazon', url: url, render: render ? 'html' : nil ) end |
#costco_product(url:, geo_location: nil) ⇒ Result
Scrape Costco product page
246 247 248 249 |
# File 'app/services/retailer/oxylabs_api.rb', line 246 def costco_product(url:, geo_location: nil) payload = Retailer::Extractors::Costco.build_payload(url: url, geo_location: geo_location) request(payload) end |
#home_depot_product(url:, geo_location: nil) ⇒ Result
Scrape Home Depot product page
238 239 240 241 |
# File 'app/services/retailer/oxylabs_api.rb', line 238 def home_depot_product(url:, geo_location: nil) payload = Retailer::Extractors::HomeDepot.build_payload(url: url, geo_location: geo_location) request(payload) end |
#job_results(job_id) ⇒ Result
Retrieve job results (when status is 'done')
147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'app/services/retailer/oxylabs_api.rb', line 147 def job_results(job_id) @logger.info "[Oxylabs] Retrieving results for job: #{job_id}" response = http_client .timeout(@timeout) .basic_auth(user: @username, pass: @password) .get("#{ASYNC_RESULTS_ENDPOINT}/#{job_id}/results") handle_response(response) rescue HTTP::Error => e @logger.error "[Oxylabs] Results retrieval error: #{e.}" Result.new(success: false, data: nil, error: e., status_code: nil, raw_response: nil) end |
#job_status(job_id) ⇒ JobResult
Check job status
132 133 134 135 136 137 138 139 140 141 142 |
# File 'app/services/retailer/oxylabs_api.rb', line 132 def job_status(job_id) response = http_client .timeout(BATCH_TIMEOUT) .basic_auth(user: @username, pass: @password) .get("#{ASYNC_ENDPOINT}/#{job_id}") handle_job_status(response, job_id) rescue HTTP::Error => e @logger.error "[Oxylabs] Job status error: #{e.}" JobResult.new(success: false, job_id: job_id, error: e., status: 'error') end |
#poll_for_results(job_id, max_attempts: 30, interval: 2) ⇒ Result
Poll for job completion and retrieve results
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# File 'app/services/retailer/oxylabs_api.rb', line 166 def poll_for_results(job_id, max_attempts: 30, interval: 2) attempts = 0 loop do attempts += 1 status = job_status(job_id) case status.status when 'done' return job_results(job_id) when 'faulted' return Result.new(success: false, data: nil, error: 'Job faulted', status_code: nil, raw_response: nil) when 'pending' return Result.new(success: false, data: nil, error: 'Polling timeout', status_code: nil, raw_response: nil) if attempts >= max_attempts sleep(interval) else return Result.new(success: false, data: nil, error: "Unknown status: #{status.status}", status_code: nil, raw_response: nil) end end end |
#request(payload) ⇒ Result
Make a synchronous (realtime) request to the Oxylabs API
Blocks until result is returned. Best for single "Probe Now" requests.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'app/services/retailer/oxylabs_api.rb', line 63 def request(payload) @logger.info "[Oxylabs] Realtime request: #{payload.except(:context).to_json}" response = http_client .timeout(@timeout) .basic_auth(user: @username, pass: @password) .post(REALTIME_ENDPOINT, json: payload) handle_response(response) rescue HTTP::Error => e @logger.error "[Oxylabs] HTTP Error: #{e.}" Result.new(success: false, data: nil, error: e., status_code: nil, raw_response: nil) rescue StandardError => e @logger.error "[Oxylabs] Error: #{e.class} - #{e.}" Result.new(success: false, data: nil, error: e., status_code: nil, raw_response: nil) end |
#scrape_url(url:, render: true) ⇒ Result
Scrape any URL with universal source
196 197 198 199 200 201 202 |
# File 'app/services/retailer/oxylabs_api.rb', line 196 def scrape_url(url:, render: true) request( source: 'universal', url: url, render: render ? 'html' : nil ) end |
#submit_job(payload, callback_url: nil) ⇒ JobResult
Submit a job asynchronously (Push-Pull method)
Returns immediately with a job_id for later retrieval.
Best for batch processing many items.
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'app/services/retailer/oxylabs_api.rb', line 91 def submit_job(payload, callback_url: nil) # Add callback_url if provided (for webhook delivery instead of polling) payload = payload.merge(callback_url: callback_url) if callback_url.present? @logger.info "[Oxylabs] Submitting async job: #{payload.except(:context).to_json}" response = http_client .timeout(BATCH_TIMEOUT) .basic_auth(user: @username, pass: @password) .post(ASYNC_ENDPOINT, json: payload) handle_job_submission(response) rescue HTTP::Error => e @logger.error "[Oxylabs] Async submit error: #{e.}" JobResult.new(success: false, job_id: nil, error: e., status: 'error') rescue StandardError => e @logger.error "[Oxylabs] Async error: #{e.class} - #{e.}" JobResult.new(success: false, job_id: nil, error: e., status: 'error') end |
#submit_jobs_batch(payloads) ⇒ Array<JobResult>
Submit multiple jobs in parallel
Each payload may already contain its own callback_url for per-job tracking
115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'app/services/retailer/oxylabs_api.rb', line 115 def submit_jobs_batch(payloads) @logger.info "[Oxylabs] Submitting #{payloads.size} async jobs" # Submit all jobs in parallel using threads # callback_url is extracted from each payload if present threads = payloads.map do |payload| callback_url = payload[:callback_url] job_payload = payload.except(:callback_url) Thread.new { submit_job(job_payload, callback_url: callback_url) } end threads.map(&:value) end |