Class: Tracking::Tracker

Inherits:
Object
  • Object
show all
Defined in:
app/services/tracking/tracker.rb

Overview

Handles visitor tracking, visit creation, and source attribution.
Determines whether requests should be tracked and manages visit lifecycle.

Defined Under Namespace

Classes: TrackResult

Constant Summary collapse

VISIT_DURATION =

how long before a visit is considered new

4.hours

Class Method Summary collapse

Class Method Details

.bot_request?(request) ⇒ Boolean

Returns:

  • (Boolean)


331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
# File 'app/services/tracking/tracker.rb', line 331

def bot_request?(request)
  ua = request.user_agent.to_s

  # 1. Empty or suspiciously short user agent (real browsers always identify themselves)
  return true if ua.blank? || ua.length < 20

  # 2. Custom bot strings (internal tools and known crawlers)
  #    http.rb is our HTTP library used by cache_warmer.rb
  custom_bots = %w[PrintNodeClient Intercom curl Cloudflare-Healthchecks RSiteAuditor]
  custom_bots << Cache::SiteCrawler::USER_AGENT
  return true if custom_bots.any? { |s| s.in?(ua) }

  # 3. DeviceDetector gem (known bot user-agent database + client hints)
  return true if device_detector(ua, request.headers).bot?

  # 4. Headless browsers and automation frameworks
  return true if ua.match?(/HeadlessChrome|PhantomJS|Selenium|puppeteer|playwright|crawl4ai/i)

  # 5 & 6: Header heuristics — only reliable for top-level navigation requests.
  # XHR/fetch calls legitimately send Accept: */* or application/json and may
  # omit Accept-Language, so we gate these checks on Sec-Fetch-Mode: navigate.
  # Sec-Fetch-Mode is sent by all modern browsers (Chrome 76+, Firefox 90+,
  # Safari 16.4+, Edge 79+); its absence alone is not proof of a bot.
  if request.env['HTTP_SEC_FETCH_MODE'] == 'navigate'
    # 5. Missing Accept-Language header (real browsers always send this on navigation)
    return true if request.env['HTTP_ACCEPT_LANGUAGE'].blank?

    # 6. Non-browser Accept header patterns
    #    Real browsers send a rich Accept header with text/html; bots often send only */*
    accept = request.env['HTTP_ACCEPT'].to_s
    return true if accept.present? && !accept.include?('text/html') && !accept.include?('application/xhtml')
  end

  # 7. Client-side bot signature (sent by primeval.js with globals.json)
  #    collectBotSignature() in primeval.js probes navigator.webdriver,
  #    plugin count, viewport dimensions, etc. and encodes them as a
  #    compact string like "w0p5o1l1c1r1t1". We only flag as bot when
  #    the signal is definitive (e.g., webdriver=1 or multiple zero signals).
  if request.respond_to?(:params) && (bot_sig = request.params['bot_sig'].presence)
    return true if client_side_bot_signal?(bot_sig)
  end

  false
end

.client_side_bot_signal?(sig) ⇒ Boolean

Parse the client-side bot signature and determine if it indicates automation.

The signature format is "w<0|1>p<0-9>o<0|1>l<0|1>c<0|1>r<0|1>t<0|1>" where:
w = navigator.webdriver (1 = Selenium/Puppeteer/Playwright)
p = navigator.plugins.length (0 = headless, capped at 9)
o = outerWidth/Height > 0 (0 = headless viewport)
l = navigator.languages (0 = empty, some bots)
c = window.chrome present (0 = may be absent in headless Chromium)
r = navigator.permissions (0 = absent in some automation)
t = timing resolution (0 = coarse/faked timer)

Returns true only for high-confidence bot signals to avoid false positives:

  • webdriver=1 is definitive (only set by automation frameworks)
  • 3+ zero signals together strongly indicate headless/automation

Parameters:

  • sig (String)

    The compact signature string

Returns:

  • (Boolean)


393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
# File 'app/services/tracking/tracker.rb', line 393

def client_side_bot_signal?(sig)
  return false unless sig.is_a?(String) && sig.match?(/\Aw\dp\do\dl\dc\dr\dt\d\z/)

  parsed = parse_bot_signature(sig)
  return false unless parsed

  # Definitive: navigator.webdriver is only true under automation
  return true if parsed[:w] == 1

  # Heuristic: multiple missing signals = likely headless/automation
  # Count how many probes returned 0 (excluding :p which is a count and :c
  # which is legitimately 0 on Firefox/Safari)
  zero_count = 0
  zero_count += 1 if parsed[:o] == 0 # no outer viewport
  zero_count += 1 if parsed[:l] == 0 # no languages
  zero_count += 1 if parsed[:r] == 0 # no permissions API
  zero_count += 1 if parsed[:t] == 0 # coarse timer
  zero_count += 1 if parsed[:p] == 0 # no plugins

  # 3+ zero signals is suspicious enough to flag
  zero_count >= 3
end

.device_detector(user_agent, http_headers = {}) ⇒ Object



317
318
319
320
321
322
323
324
# File 'app/services/tracking/tracker.rb', line 317

def device_detector(user_agent, http_headers = {})
  require 'device_detector'
  # Ensure all header values are strings to prevent nil gsub errors in device_detector
  # Convert ActionDispatch::Http::Headers to a regular hash and transform values
  headers_hash = http_headers.respond_to?(:to_h) ? http_headers.to_h : (http_headers || {})
  safe_headers = headers_hash.transform_values { |v| v.nil? ? '' : v.to_s }
  DeviceDetector.new(ensure_utf8(user_agent.to_s), safe_headers)
end

.ensure_utf8(str) ⇒ Object



461
462
463
464
465
# File 'app/services/tracking/tracker.rb', line 461

def ensure_utf8(str)
  return if str.blank?

  str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

.extract_cloudflare_geo(request) ⇒ Object

Extract geo data from Cloudflare's visitor location headers
These are set by Cloudflare's "Add visitor location headers" Managed Transform
https://developers.cloudflare.com/rules/transform/managed-transforms/reference/#add-visitor-location-headers



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# File 'app/services/tracking/tracker.rb', line 260

def extract_cloudflare_geo(request)
  return {} unless request

  geo = {}

  # Country code (e.g., "US", "CA", "GB")
  if (country_code = request.env['HTTP_CF_IPCOUNTRY'].presence)
    # Convert country code to full name for consistency with existing data
    country = ISO3166::Country[country_code]
    geo[:country] = country&.common_name || country&.iso_short_name || country_code
  end

  # Region/State (e.g., "California", "Ontario", "Quebec")
  geo[:region] = request.env['HTTP_CF_REGION'].presence

  # Region code (e.g., "CA", "QC", "ON", "TX") - useful for consent logic
  geo[:region_code] = request.env['HTTP_CF_REGION_CODE'].presence

  # City (e.g., "San Francisco", "Toronto")
  geo[:city] = request.env['HTTP_CF_IPCITY'].presence

  # Postal code
  geo[:postal_code] = request.env['HTTP_CF_POSTAL_CODE'].presence

  # Coordinates
  if (lat = request.env['HTTP_CF_IPLATITUDE'].presence)
    geo[:latitude] = lat.to_f
  end
  if (lon = request.env['HTTP_CF_IPLONGITUDE'].presence)
    geo[:longitude] = lon.to_f
  end

  Rails.logger.debug { "[Tracker] Cloudflare geo: #{geo.inspect}" } if geo.present?
  geo
end

.find_source_from_request(params:, request: nil) ⇒ Source?

Find source from request params, parsing landing_page URL for tracking params
This is used both for guest creation and visit tracking to ensure consistent source attribution.

Parameters:

  • params (Hash, ActionController::Parameters)

    Request params (may include landing_page, referrer)

  • request (ActionDispatch::Request, nil) (defaults to: nil)

    Optional request object for fallback referrer

Returns:

  • (Source, nil)

    The matched source or nil



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'app/services/tracking/tracker.rb', line 85

def find_source_from_request(params:, request: nil)
  source_lookup_params = params.to_h.with_indifferent_access

  # Parse landing_page URL to extract tracking params (referral_code, utm_*, etc.)
  if source_lookup_params[:landing_page].present?
    begin
      landing_uri = Addressable::URI.parse(source_lookup_params[:landing_page])
      if landing_uri&.query
        landing_params = Rack::Utils.parse_nested_query(landing_uri.query)
        # Merge landing page query params, but don't override existing top-level params
        source_lookup_params = landing_params.with_indifferent_access.merge(source_lookup_params)
      end
    rescue StandardError => e
      Rails.logger.warn "[Tracker.find_source_from_request] Failed to parse landing_page URL: #{e.message}"
    end
  end

  # Use params[:referrer] if available, otherwise fall back to request.referer
  referrer = source_lookup_params[:referrer].presence || request&.referer

  Source.find_from_params(source_lookup_params, referrer: referrer)
end

.gdpr_country?(request) ⇒ Boolean

Is the user country subject to GDPR policy?

Returns:

  • (Boolean)


447
448
449
450
451
452
453
454
455
456
457
458
459
# File 'app/services/tracking/tracker.rb', line 447

def gdpr_country?(request)
  # Priority: ENV override > Cloudflare header (free, instant) > Geocoder (MaxMind HTTP call)
  country_code = ENV['COUNTRY_CODE'] ||
    request&.env&.[]('HTTP_CF_IPCOUNTRY') ||
    begin
      request.location&.country_code
    rescue StandardError
      nil
    end
  res = ISO3166::Country[country_code]&.gdpr_compliant?
  Rails.logger.info "GDPR country detected: #{country_code}" if res
  res.to_b
end

.geocode_visit(visit) ⇒ Object



296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
# File 'app/services/tracking/tracker.rb', line 296

def geocode_visit(visit)
  # Skip if we have complete geo data from Cloudflare headers
  # We consider "complete" as having coordinates OR having country + region + city
  has_coordinates = visit.latitude.present? && visit.longitude.present?
  has_location_details = visit.country.present? && visit.region.present? && visit.city.present?

  if has_coordinates || has_location_details
    Rails.logger.debug { '[Tracker] Skipping geocoder worker - geo data from Cloudflare headers' }
    return
  end

  # Fallback to async geocoding if Cloudflare headers weren't available or incomplete
  begin
    Rails.logger.debug { "VisitGeocoderWorker.perform_async(#{visit.id}) - fallback geocoding" }
    VisitGeocoderWorker.perform_async(visit.id)
  rescue StandardError => e
    Rails.logger.error "VisitGeocoderWorker.perform_async failed: #{e.message}"
    ErrorReporting.error(e, visit_id: visit.id)
  end
end

.parse_bot_signature(sig) ⇒ Hash?

Parse a bot signature string into a hash of signal values.
Returns nil if the string doesn't match the expected format.

Parameters:

  • sig (String)

    e.g. "w0p5o1l1c1r1t1"

Returns:

  • (Hash, nil)

    e.g. { w: 0, p: 5, o: 1, l: 1, c: 1, r: 1, t: 1 }



421
422
423
424
425
426
427
428
429
430
431
432
433
434
# File 'app/services/tracking/tracker.rb', line 421

def parse_bot_signature(sig)
  match = sig.match(/\Aw(\d)p(\d)o(\d)l(\d)c(\d)r(\d)t(\d)\z/)
  return nil unless match

  {
    w: match[1].to_i,
    p: match[2].to_i,
    o: match[3].to_i,
    l: match[4].to_i,
    c: match[5].to_i,
    r: match[6].to_i,
    t: match[7].to_i
  }
end

.protected_by_gdpr?(request) ⇒ Boolean

Detect if the user is in a GDPR protected country and if they have requested do not track status

Returns:

  • (Boolean)


437
438
439
440
441
442
443
444
# File 'app/services/tracking/tracker.rb', line 437

def protected_by_gdpr?(request)
  return false unless request
  return false unless gdpr_country?(request)

  # You are protected if you are gdpr and said not to track you or if you haven't consented (that's why we test for nil cookie)
  # Basically, only if the cookie is false explicitly can we track you
  request.cookies['dnt2'] != 'false'
end

.testing_mode?Boolean

Check if we're in explicit testing mode (TRACK_VISITOR=y in development)
This bypasses IP checks to allow local testing with simulated geo headers

Returns:

  • (Boolean)


23
24
25
# File 'app/services/tracking/tracker.rb', line 23

def testing_mode?
  Rails.env.development? && ENV['TRACK_VISITOR'].to_b
end

.track_visit(party, request: nil, params: nil) ⇒ Object



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'app/services/tracking/tracker.rb', line 114

def track_visit(party, request: nil, params: nil)
  unless track_visitor?(request: request)
    return TrackResult.new(track_visit: false, message: 'Tracking excluded for visitor')
  end
  # Extract referral code from landing uri, regexp method is faster and more bullet proof
  event_time = Time.current
  event_data = {}

  data = params.to_h.symbolize_keys.slice(:landing_page, :screen_width, :screen_height, :referrer)

  # Extract geo data from Cloudflare headers (instant, no worker needed)
  # See: https://developers.cloudflare.com/rules/transform/managed-transforms/reference/#add-visitor-location-headers
  data.merge!(extract_cloudflare_geo(request)) if request

  data[:user_id] = party.id
  data[:visitor_token] = party.uuid || SecureRandom.uuid

  # We can accept the url from a non-xhr request for tracking purpose
  data[:landing_page] ||= request.url unless request.xhr?

  if data[:referrer].present? && (ruri = begin
    Addressable::URI.parse(data[:referrer])
  rescue StandardError
    nil
  end)
    if ruri.host.present?
      if ruri.host.match?(/warmlyyours\.\w{2,3}$/)
        # Cleanup the referring domain if the page comes from ourselves.
        data[:referring_domain] = nil
      else
        # Feed through publicsuffix
        data[:referring_domain] ||= PublicSuffix.domain(ruri.host)
      end
    end
  end

  if data[:landing_page].present? && (uri = begin
    Addressable::URI.parse(data[:landing_page])
  rescue StandardError
    nil
  end)
    event_data[:url] = data[:landing_page]
    event_data[:page] = uri.path
    merged_params = Rack::Utils.parse_nested_query uri.query
    merged_params = (merged_params || {}).with_indifferent_access
    data[:referral_code] = merged_params[:referral_code].presence || merged_params[:rc].presence
    data[:gclid] ||= merged_params[:gclid].presence
    data[:gbraid] ||= merged_params[:gbraid].presence
    data[:wbraid] ||= merged_params[:wbraid].presence
    data[:search_keyword] ||= merged_params[:keyword].presence
  else
    Rails.logger.info '[track_visit] no landing_page data'
    return TrackResult.new(track_visit: false, message: 'landing_page data is missing, cannot track')
  end

  data[:dnt] = protected_by_gdpr?(request)

  if request
    data[:session_id] = request.session&.id
    data[:gclid] ||= request.params[:gclid] || request.session[:gclid]
    data[:gbraid] ||= request.params[:gbraid] || request.session[:gbraid]
    data[:wbraid] ||= request.params[:wbraid] || request.session[:wbraid]
    data[:utm_campaign] ||= merged_params[:utm_campaign]
    data[:utm_medium] ||= merged_params[:utm_medium]
    data[:utm_source] ||= merged_params[:utm_source]
    data[:utm_term] ||= merged_params[:utm_term]

    if merged_params[:_vsrefdom] == 'googlecpc' || data[:gclid].present? || data[:gbraid].present? || data[:wbraid].present?
      data[:utm_medium] ||= 'cpc'
      data[:utm_source] ||= 'googleppc'
      data[:referring_domain] = 'google.com'
    else
      data[:referring_domain] ||= PublicSuffix.domain(merged_params[:_vsrefdom])
    end
    data[:search_keyword] ||= request.params[:keyword].presence
    data[:user_agent] = ensure_utf8(request.user_agent)
    data[:locale] = I18n.locale.to_s
    if (dd = device_detector(data[:user_agent], request.headers)) && dd.known?
      device_type =
        case dd.device_type
        when 'smartphone'
          'Mobile'
        when 'tv'
          'TV'
        else
          dd.device_type.try(:titleize)
        end
      data[:browser] = dd.name
      data[:os] = dd.os_name
      data[:device_type] = device_type
    end
  end
  data[:ip] = if Rails.env.development?
                NetworkConstants::REAL_FAKE_IP
              else
                request.remote_ip
              end
  # Find a visit that happened within 4 hours or create a new one
  visit = party.visits.where(Visit[:started_at].gteq(VISIT_DURATION.ago)).first
  if visit
    # create an event
    visit_event = visit.visit_events.new
    visit_event.user_id = party.id
    visit_event.name = '$view' # It's always view so this can be removed at some point, this is for legacy
    visit_event.time = event_time
    visit_event.properties = event_data
    visit_event.save
    # It is possible that a visit was created but a redirection occured before any of the data was recorded in the visit, so we will update it
    visit.referral_code ||= data[:referral_code]
    visit.gclid ||= data[:gclid]
    visit.utm_medium ||= data[:utm_medium]
    visit.utm_source ||= data[:utm_source]
    visit.utm_campaign ||= data[:utm_campaign]
    visit.utm_term ||= data[:utm_term]
    visit.utm_id ||= data[:utm_id]&.squish
    visit.referring_domain ||= data[:referring_domain]
    visit.search_keyword ||= data[:search_keyword]
    visit.dnt ||= data[:dnt]
    visit.user_agent ||= data[:user_agent]
    visit.browser ||= data[:browser]
    visit.os ||= data[:os]
    visit.device_type ||= data[:device_type]
    visit.ip ||= data[:ip]
    visit.gbraid ||= data[:gbraid]
    visit.save if visit.changed?
    geocode_visit(visit)
    TrackResult.new(track_visit: true, visit: visit, visit_event: visit_event)
  else
    # create a new visit, while our token and uuid are no longer needed
    # let's populate them for now until we can remove from the db
    data[:started_at] = event_time
    data[:visitor_token] = SecureRandom.uuid
    visit = party.visits.create(data)
    geocode_visit(visit)
    TrackResult.new(track_visit: true, visit: visit)
  end
rescue StandardError => e
  msg = "!!! Exception during track visitor: #{e}"
  Rails.logger.error msg
  ErrorReporting.error(e)
  TrackResult.new(track_visit: false, message: msg)
end

.track_visitor?(request: nil) ⇒ Boolean

Returns:

  • (Boolean)


27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'app/services/tracking/tracker.rb', line 27

def track_visitor?(request: nil)
  # In development, tracking system is disabled unless TRACK_VISITOR=y is set
  # This allows developers to test tracking by enabling it explicitly
  if Rails.env.development? && !ENV['TRACK_VISITOR'].to_b
    Rails.logger.info '[track_visitor] skip because development mode (set TRACK_VISITOR=y to enable)'
    return false
  end

  # Employee masquerade sessions ("Login as this user") must never write
  # to the Visit table — they would pollute funnels and attribution with
  # employee-driven traffic attributed to the customer. CurrentScope.true_account_id
  # is set by ApplicationController#stamp_impersonation_context whenever
  # pretender's `account_impersonated?` is true.
  if CurrentScope..present?
    Rails.logger.info "[track_visitor] skip because masquerade session active (true_account_id=#{CurrentScope.})"
    return false
  end

  if request
    # Non www requests are always ignored
    unless request.subdomain&.match?(/^www/)
      Rails.logger.info "[track_visitor] skip because subdomain #{request.subdomain} is not www"
      return false
    end

    # WarmlyYours IPs excluded in production, but allowed in testing mode
    # This allows local testing with simulated Cloudflare headers
    unless testing_mode?
      if warmlyyours_ip?(request)
        Rails.logger.info "[track_visitor] skip because warmlyyours ip detected: #{request.remote_ip}"
        return false
      end
    end

    # Bots excluded (even in testing mode - use a real browser to test)
    if bot_request?(request)
      Rails.logger.info "[track_visitor] skip because bot request detected, user agent: #{request.user_agent}"
      return false
    end

    # GDPR check - in testing mode, we still respect this for accurate testing
    if protected_by_gdpr?(request)
      Rails.logger.info '[track_visitor] skip because gdpr country and gdpr cookie dnt set to true or gdpr cookie is not set'
      return false
    end
  end

  Rails.logger.info '[track_visitor] TESTING MODE enabled (TRACK_VISITOR=y) - tracking allowed' if testing_mode?

  # If none of the above, we track
  true
end

.warmlyyours_ip?(request) ⇒ Boolean

Returns:

  • (Boolean)


326
327
328
329
# File 'app/services/tracking/tracker.rb', line 326

def warmlyyours_ip?(request)
  ip = request.remote_ip
  IpDetector.warmlyyours_ip?(ip)
end