Class: Tracking::Tracker
- Inherits:
-
Object
- Object
- Tracking::Tracker
- Defined in:
- app/services/tracking/tracker.rb
Overview
Handles visitor tracking, visit creation, and source attribution.
Determines whether requests should be tracked and manages visit lifecycle.
Defined Under Namespace
Classes: TrackResult
Constant Summary collapse
- VISIT_DURATION =
how long before a visit is considered new
4.hours
Class Method Summary collapse
- .bot_request?(request) ⇒ Boolean
-
.client_side_bot_signal?(sig) ⇒ Boolean
Parse the client-side bot signature and determine if it indicates automation.
- .device_detector(user_agent, http_headers = {}) ⇒ Object
- .ensure_utf8(str) ⇒ Object
-
.extract_cloudflare_geo(request) ⇒ Object
Extract geo data from Cloudflare's visitor location headers These are set by Cloudflare's "Add visitor location headers" Managed Transform https://developers.cloudflare.com/rules/transform/managed-transforms/reference/#add-visitor-location-headers.
-
.find_source_from_request(params:, request: nil) ⇒ Source?
Find source from request params, parsing landing_page URL for tracking params This is used both for guest creation and visit tracking to ensure consistent source attribution.
-
.gdpr_country?(request) ⇒ Boolean
Is the user country subject to GDPR policy?.
- .geocode_visit(visit) ⇒ Object
-
.parse_bot_signature(sig) ⇒ Hash?
Parse a bot signature string into a hash of signal values.
-
.protected_by_gdpr?(request) ⇒ Boolean
Detect if the user is in a GDPR protected country and if they have requested do not track status.
-
.testing_mode? ⇒ Boolean
Check if we're in explicit testing mode (TRACK_VISITOR=y in development) This bypasses IP checks to allow local testing with simulated geo headers.
- .track_visit(party, request: nil, params: nil) ⇒ Object
- .track_visitor?(request: nil) ⇒ Boolean
- .warmlyyours_ip?(request) ⇒ Boolean
Class Method Details
.bot_request?(request) ⇒ Boolean
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
# File 'app/services/tracking/tracker.rb', line 331 def bot_request?(request) ua = request.user_agent.to_s # 1. Empty or suspiciously short user agent (real browsers always identify themselves) return true if ua.blank? || ua.length < 20 # 2. Custom bot strings (internal tools and known crawlers) # http.rb is our HTTP library used by cache_warmer.rb custom_bots = %w[PrintNodeClient Intercom curl Cloudflare-Healthchecks RSiteAuditor] custom_bots << Cache::SiteCrawler::USER_AGENT return true if custom_bots.any? { |s| s.in?(ua) } # 3. DeviceDetector gem (known bot user-agent database + client hints) return true if device_detector(ua, request.headers).bot? # 4. Headless browsers and automation frameworks return true if ua.match?(/HeadlessChrome|PhantomJS|Selenium|puppeteer|playwright|crawl4ai/i) # 5 & 6: Header heuristics — only reliable for top-level navigation requests. # XHR/fetch calls legitimately send Accept: */* or application/json and may # omit Accept-Language, so we gate these checks on Sec-Fetch-Mode: navigate. # Sec-Fetch-Mode is sent by all modern browsers (Chrome 76+, Firefox 90+, # Safari 16.4+, Edge 79+); its absence alone is not proof of a bot. if request.env['HTTP_SEC_FETCH_MODE'] == 'navigate' # 5. Missing Accept-Language header (real browsers always send this on navigation) return true if request.env['HTTP_ACCEPT_LANGUAGE'].blank? # 6. Non-browser Accept header patterns # Real browsers send a rich Accept header with text/html; bots often send only */* accept = request.env['HTTP_ACCEPT'].to_s return true if accept.present? && !accept.include?('text/html') && !accept.include?('application/xhtml') end # 7. Client-side bot signature (sent by primeval.js with globals.json) # collectBotSignature() in primeval.js probes navigator.webdriver, # plugin count, viewport dimensions, etc. and encodes them as a # compact string like "w0p5o1l1c1r1t1". We only flag as bot when # the signal is definitive (e.g., webdriver=1 or multiple zero signals). if request.respond_to?(:params) && (bot_sig = request.params['bot_sig'].presence) return true if client_side_bot_signal?(bot_sig) end false end |
.client_side_bot_signal?(sig) ⇒ Boolean
Parse the client-side bot signature and determine if it indicates automation.
The signature format is "w<0|1>p<0-9>o<0|1>l<0|1>c<0|1>r<0|1>t<0|1>" where:
w = navigator.webdriver (1 = Selenium/Puppeteer/Playwright)
p = navigator.plugins.length (0 = headless, capped at 9)
o = outerWidth/Height > 0 (0 = headless viewport)
l = navigator.languages (0 = empty, some bots)
c = window.chrome present (0 = may be absent in headless Chromium)
r = navigator.permissions (0 = absent in some automation)
t = timing resolution (0 = coarse/faked timer)
Returns true only for high-confidence bot signals to avoid false positives:
- webdriver=1 is definitive (only set by automation frameworks)
- 3+ zero signals together strongly indicate headless/automation
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 |
# File 'app/services/tracking/tracker.rb', line 393 def client_side_bot_signal?(sig) return false unless sig.is_a?(String) && sig.match?(/\Aw\dp\do\dl\dc\dr\dt\d\z/) parsed = parse_bot_signature(sig) return false unless parsed # Definitive: navigator.webdriver is only true under automation return true if parsed[:w] == 1 # Heuristic: multiple missing signals = likely headless/automation # Count how many probes returned 0 (excluding :p which is a count and :c # which is legitimately 0 on Firefox/Safari) zero_count = 0 zero_count += 1 if parsed[:o] == 0 # no outer viewport zero_count += 1 if parsed[:l] == 0 # no languages zero_count += 1 if parsed[:r] == 0 # no permissions API zero_count += 1 if parsed[:t] == 0 # coarse timer zero_count += 1 if parsed[:p] == 0 # no plugins # 3+ zero signals is suspicious enough to flag zero_count >= 3 end |
.device_detector(user_agent, http_headers = {}) ⇒ Object
317 318 319 320 321 322 323 324 |
# File 'app/services/tracking/tracker.rb', line 317 def device_detector(user_agent, http_headers = {}) require 'device_detector' # Ensure all header values are strings to prevent nil gsub errors in device_detector # Convert ActionDispatch::Http::Headers to a regular hash and transform values headers_hash = http_headers.respond_to?(:to_h) ? http_headers.to_h : (http_headers || {}) safe_headers = headers_hash.transform_values { |v| v.nil? ? '' : v.to_s } DeviceDetector.new(ensure_utf8(user_agent.to_s), safe_headers) end |
.ensure_utf8(str) ⇒ Object
461 462 463 464 465 |
# File 'app/services/tracking/tracker.rb', line 461 def ensure_utf8(str) return if str.blank? str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') end |
.extract_cloudflare_geo(request) ⇒ Object
Extract geo data from Cloudflare's visitor location headers
These are set by Cloudflare's "Add visitor location headers" Managed Transform
https://developers.cloudflare.com/rules/transform/managed-transforms/reference/#add-visitor-location-headers
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
# File 'app/services/tracking/tracker.rb', line 260 def extract_cloudflare_geo(request) return {} unless request geo = {} # Country code (e.g., "US", "CA", "GB") if (country_code = request.env['HTTP_CF_IPCOUNTRY'].presence) # Convert country code to full name for consistency with existing data country = ISO3166::Country[country_code] geo[:country] = country&.common_name || country&.iso_short_name || country_code end # Region/State (e.g., "California", "Ontario", "Quebec") geo[:region] = request.env['HTTP_CF_REGION'].presence # Region code (e.g., "CA", "QC", "ON", "TX") - useful for consent logic geo[:region_code] = request.env['HTTP_CF_REGION_CODE'].presence # City (e.g., "San Francisco", "Toronto") geo[:city] = request.env['HTTP_CF_IPCITY'].presence # Postal code geo[:postal_code] = request.env['HTTP_CF_POSTAL_CODE'].presence # Coordinates if (lat = request.env['HTTP_CF_IPLATITUDE'].presence) geo[:latitude] = lat.to_f end if (lon = request.env['HTTP_CF_IPLONGITUDE'].presence) geo[:longitude] = lon.to_f end Rails.logger.debug { "[Tracker] Cloudflare geo: #{geo.inspect}" } if geo.present? geo end |
.find_source_from_request(params:, request: nil) ⇒ Source?
Find source from request params, parsing landing_page URL for tracking params
This is used both for guest creation and visit tracking to ensure consistent source attribution.
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'app/services/tracking/tracker.rb', line 85 def find_source_from_request(params:, request: nil) source_lookup_params = params.to_h.with_indifferent_access # Parse landing_page URL to extract tracking params (referral_code, utm_*, etc.) if source_lookup_params[:landing_page].present? begin landing_uri = Addressable::URI.parse(source_lookup_params[:landing_page]) if landing_uri&.query landing_params = Rack::Utils.parse_nested_query(landing_uri.query) # Merge landing page query params, but don't override existing top-level params source_lookup_params = landing_params.with_indifferent_access.merge(source_lookup_params) end rescue StandardError => e Rails.logger.warn "[Tracker.find_source_from_request] Failed to parse landing_page URL: #{e.}" end end # Use params[:referrer] if available, otherwise fall back to request.referer referrer = source_lookup_params[:referrer].presence || request&.referer Source.find_from_params(source_lookup_params, referrer: referrer) end |
.gdpr_country?(request) ⇒ Boolean
Is the user country subject to GDPR policy?
447 448 449 450 451 452 453 454 455 456 457 458 459 |
# File 'app/services/tracking/tracker.rb', line 447 def gdpr_country?(request) # Priority: ENV override > Cloudflare header (free, instant) > Geocoder (MaxMind HTTP call) country_code = ENV['COUNTRY_CODE'] || request&.env&.[]('HTTP_CF_IPCOUNTRY') || begin request.location&.country_code rescue StandardError nil end res = ISO3166::Country[country_code]&.gdpr_compliant? Rails.logger.info "GDPR country detected: #{country_code}" if res res.to_b end |
.geocode_visit(visit) ⇒ Object
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 |
# File 'app/services/tracking/tracker.rb', line 296 def geocode_visit(visit) # Skip if we have complete geo data from Cloudflare headers # We consider "complete" as having coordinates OR having country + region + city has_coordinates = visit.latitude.present? && visit.longitude.present? has_location_details = visit.country.present? && visit.region.present? && visit.city.present? if has_coordinates || has_location_details Rails.logger.debug { '[Tracker] Skipping geocoder worker - geo data from Cloudflare headers' } return end # Fallback to async geocoding if Cloudflare headers weren't available or incomplete begin Rails.logger.debug { "VisitGeocoderWorker.perform_async(#{visit.id}) - fallback geocoding" } VisitGeocoderWorker.perform_async(visit.id) rescue StandardError => e Rails.logger.error "VisitGeocoderWorker.perform_async failed: #{e.}" ErrorReporting.error(e, visit_id: visit.id) end end |
.parse_bot_signature(sig) ⇒ Hash?
Parse a bot signature string into a hash of signal values.
Returns nil if the string doesn't match the expected format.
421 422 423 424 425 426 427 428 429 430 431 432 433 434 |
# File 'app/services/tracking/tracker.rb', line 421 def parse_bot_signature(sig) match = sig.match(/\Aw(\d)p(\d)o(\d)l(\d)c(\d)r(\d)t(\d)\z/) return nil unless match { w: match[1].to_i, p: match[2].to_i, o: match[3].to_i, l: match[4].to_i, c: match[5].to_i, r: match[6].to_i, t: match[7].to_i } end |
.protected_by_gdpr?(request) ⇒ Boolean
Detect if the user is in a GDPR protected country and if they have requested do not track status
437 438 439 440 441 442 443 444 |
# File 'app/services/tracking/tracker.rb', line 437 def protected_by_gdpr?(request) return false unless request return false unless gdpr_country?(request) # You are protected if you are gdpr and said not to track you or if you haven't consented (that's why we test for nil cookie) # Basically, only if the cookie is false explicitly can we track you request.['dnt2'] != 'false' end |
.testing_mode? ⇒ Boolean
Check if we're in explicit testing mode (TRACK_VISITOR=y in development)
This bypasses IP checks to allow local testing with simulated geo headers
23 24 25 |
# File 'app/services/tracking/tracker.rb', line 23 def testing_mode? Rails.env.development? && ENV['TRACK_VISITOR'].to_b end |
.track_visit(party, request: nil, params: nil) ⇒ Object
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'app/services/tracking/tracker.rb', line 114 def track_visit(party, request: nil, params: nil) unless track_visitor?(request: request) return TrackResult.new(track_visit: false, message: 'Tracking excluded for visitor') end # Extract referral code from landing uri, regexp method is faster and more bullet proof event_time = Time.current event_data = {} data = params.to_h.symbolize_keys.slice(:landing_page, :screen_width, :screen_height, :referrer) # Extract geo data from Cloudflare headers (instant, no worker needed) # See: https://developers.cloudflare.com/rules/transform/managed-transforms/reference/#add-visitor-location-headers data.merge!(extract_cloudflare_geo(request)) if request data[:user_id] = party.id data[:visitor_token] = party.uuid || SecureRandom.uuid # We can accept the url from a non-xhr request for tracking purpose data[:landing_page] ||= request.url unless request.xhr? if data[:referrer].present? && (ruri = begin Addressable::URI.parse(data[:referrer]) rescue StandardError nil end) if ruri.host.present? if ruri.host.match?(/warmlyyours\.\w{2,3}$/) # Cleanup the referring domain if the page comes from ourselves. data[:referring_domain] = nil else # Feed through publicsuffix data[:referring_domain] ||= PublicSuffix.domain(ruri.host) end end end if data[:landing_page].present? && (uri = begin Addressable::URI.parse(data[:landing_page]) rescue StandardError nil end) event_data[:url] = data[:landing_page] event_data[:page] = uri.path merged_params = Rack::Utils.parse_nested_query uri.query merged_params = (merged_params || {}).with_indifferent_access data[:referral_code] = merged_params[:referral_code].presence || merged_params[:rc].presence data[:gclid] ||= merged_params[:gclid].presence data[:gbraid] ||= merged_params[:gbraid].presence data[:wbraid] ||= merged_params[:wbraid].presence data[:search_keyword] ||= merged_params[:keyword].presence else Rails.logger.info '[track_visit] no landing_page data' return TrackResult.new(track_visit: false, message: 'landing_page data is missing, cannot track') end data[:dnt] = protected_by_gdpr?(request) if request data[:session_id] = request.session&.id data[:gclid] ||= request.params[:gclid] || request.session[:gclid] data[:gbraid] ||= request.params[:gbraid] || request.session[:gbraid] data[:wbraid] ||= request.params[:wbraid] || request.session[:wbraid] data[:utm_campaign] ||= merged_params[:utm_campaign] data[:utm_medium] ||= merged_params[:utm_medium] data[:utm_source] ||= merged_params[:utm_source] data[:utm_term] ||= merged_params[:utm_term] if merged_params[:_vsrefdom] == 'googlecpc' || data[:gclid].present? || data[:gbraid].present? || data[:wbraid].present? data[:utm_medium] ||= 'cpc' data[:utm_source] ||= 'googleppc' data[:referring_domain] = 'google.com' else data[:referring_domain] ||= PublicSuffix.domain(merged_params[:_vsrefdom]) end data[:search_keyword] ||= request.params[:keyword].presence data[:user_agent] = ensure_utf8(request.user_agent) data[:locale] = I18n.locale.to_s if (dd = device_detector(data[:user_agent], request.headers)) && dd.known? device_type = case dd.device_type when 'smartphone' 'Mobile' when 'tv' 'TV' else dd.device_type.try(:titleize) end data[:browser] = dd.name data[:os] = dd.os_name data[:device_type] = device_type end end data[:ip] = if Rails.env.development? NetworkConstants::REAL_FAKE_IP else request.remote_ip end # Find a visit that happened within 4 hours or create a new one visit = party.visits.where(Visit[:started_at].gteq(VISIT_DURATION.ago)).first if visit # create an event visit_event = visit.visit_events.new visit_event.user_id = party.id visit_event.name = '$view' # It's always view so this can be removed at some point, this is for legacy visit_event.time = event_time visit_event.properties = event_data visit_event.save # It is possible that a visit was created but a redirection occured before any of the data was recorded in the visit, so we will update it visit.referral_code ||= data[:referral_code] visit.gclid ||= data[:gclid] visit.utm_medium ||= data[:utm_medium] visit.utm_source ||= data[:utm_source] visit.utm_campaign ||= data[:utm_campaign] visit.utm_term ||= data[:utm_term] visit.utm_id ||= data[:utm_id]&.squish visit.referring_domain ||= data[:referring_domain] visit.search_keyword ||= data[:search_keyword] visit.dnt ||= data[:dnt] visit.user_agent ||= data[:user_agent] visit.browser ||= data[:browser] visit.os ||= data[:os] visit.device_type ||= data[:device_type] visit.ip ||= data[:ip] visit.gbraid ||= data[:gbraid] visit.save if visit.changed? geocode_visit(visit) TrackResult.new(track_visit: true, visit: visit, visit_event: visit_event) else # create a new visit, while our token and uuid are no longer needed # let's populate them for now until we can remove from the db data[:started_at] = event_time data[:visitor_token] = SecureRandom.uuid visit = party.visits.create(data) geocode_visit(visit) TrackResult.new(track_visit: true, visit: visit) end rescue StandardError => e msg = "!!! Exception during track visitor: #{e}" Rails.logger.error msg ErrorReporting.error(e) TrackResult.new(track_visit: false, message: msg) end |
.track_visitor?(request: nil) ⇒ Boolean
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'app/services/tracking/tracker.rb', line 27 def track_visitor?(request: nil) # In development, tracking system is disabled unless TRACK_VISITOR=y is set # This allows developers to test tracking by enabling it explicitly if Rails.env.development? && !ENV['TRACK_VISITOR'].to_b Rails.logger.info '[track_visitor] skip because development mode (set TRACK_VISITOR=y to enable)' return false end # Employee masquerade sessions ("Login as this user") must never write # to the Visit table — they would pollute funnels and attribution with # employee-driven traffic attributed to the customer. CurrentScope.true_account_id # is set by ApplicationController#stamp_impersonation_context whenever # pretender's `account_impersonated?` is true. if CurrentScope.true_account_id.present? Rails.logger.info "[track_visitor] skip because masquerade session active (true_account_id=#{CurrentScope.true_account_id})" return false end if request # Non www requests are always ignored unless request.subdomain&.match?(/^www/) Rails.logger.info "[track_visitor] skip because subdomain #{request.subdomain} is not www" return false end # WarmlyYours IPs excluded in production, but allowed in testing mode # This allows local testing with simulated Cloudflare headers unless testing_mode? if warmlyyours_ip?(request) Rails.logger.info "[track_visitor] skip because warmlyyours ip detected: #{request.remote_ip}" return false end end # Bots excluded (even in testing mode - use a real browser to test) if bot_request?(request) Rails.logger.info "[track_visitor] skip because bot request detected, user agent: #{request.user_agent}" return false end # GDPR check - in testing mode, we still respect this for accurate testing if protected_by_gdpr?(request) Rails.logger.info '[track_visitor] skip because gdpr country and gdpr cookie dnt set to true or gdpr cookie is not set' return false end end Rails.logger.info '[track_visitor] TESTING MODE enabled (TRACK_VISITOR=y) - tracking allowed' if testing_mode? # If none of the above, we track true end |
.warmlyyours_ip?(request) ⇒ Boolean
326 327 328 329 |
# File 'app/services/tracking/tracker.rb', line 326 def warmlyyours_ip?(request) ip = request.remote_ip IpDetector.warmlyyours_ip?(ip) end |