Class: ImageFingerprintWorker

Inherits:
Object
  • Object
show all
Includes:
Sidekiq::Worker, Workers::StatusBroadcastable
Defined in:
app/workers/image_fingerprint_worker.rb

Overview

Background worker for generating perceptual hash fingerprints for images.

Two modes of operation:

  1. No arguments: Find all images missing fingerprints and bulk enqueue
  2. With image_id: Process that single image's fingerprint

Three-tier strategy for fingerprint generation (most efficient first):

  1. Check if pHash is already stored in the asset JSONB field
  2. Fetch pHash from ImageKit's metadata API (fast, no download needed)
  3. Calculate pHash locally using phash-rb gem (requires download)

Both ImageKit and phash-rb use DCT-based pHash (pHash 0.9.6),
so fingerprints are comparable regardless of source.

The fingerprint enables:

  • Exact duplicate detection (hamming distance = 0)
  • Near-duplicate detection (hamming distance <= 15)
  • Robust to resizing, compression, minor color changes

Examples:

Bulk enqueue all missing (scheduled nightly)

ImageFingerprintWorker.perform_async

Queue single image

ImageFingerprintWorker.perform_async(123)

Queue with force regeneration

ImageFingerprintWorker.perform_async(123, force: true)

See Also:

Constant Summary collapse

DUPLICATE_THRESHOLD =

Recommended threshold for duplicate detection (hamming distance)
15 bits is the standard threshold for pHash duplicate detection

15

Instance Attribute Summary

Attributes included from Workers::StatusBroadcastable

#broadcast_status_updates

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Workers::StatusBroadcastable::Overrides

#at, #store, #total

Class Method Details

.duplicate?(hash1, hash2, threshold: DUPLICATE_THRESHOLD) ⇒ Boolean

Check if two images are duplicates based on fingerprints

Parameters:

  • hash1 (Integer, String)

    First fingerprint

  • hash2 (Integer, String)

    Second fingerprint

  • threshold (Integer) (defaults to: DUPLICATE_THRESHOLD)

    Maximum hamming distance for duplicates

Returns:

  • (Boolean)

    True if images are duplicates



79
80
81
# File 'app/workers/image_fingerprint_worker.rb', line 79

def self.duplicate?(hash1, hash2, threshold: DUPLICATE_THRESHOLD)
  hamming_distance(hash1, hash2) <= threshold
end

.hamming_distance(hash1, hash2) ⇒ Integer

Calculate Hamming distance between two fingerprints
Works with hex strings (both ImageKit pHash and local pHash)
Lower distance = more similar images

Parameters:

  • hash1 (Integer, String)

    First fingerprint (hex string or integer)

  • hash2 (Integer, String)

    Second fingerprint (hex string or integer)

Returns:

  • (Integer)

    Number of differing bits (0-64)



66
67
68
69
70
# File 'app/workers/image_fingerprint_worker.rb', line 66

def self.hamming_distance(hash1, hash2)
  h1 = hash1.is_a?(String) ? hash1.to_i(16) : hash1.to_i
  h2 = hash2.is_a?(String) ? hash2.to_i(16) : hash2.to_i
  (h1 ^ h2).to_s(2).count('1')
end

Instance Method Details

#perform(image_id = nil, options = {}) ⇒ Object

Parameters:

  • image_id (Integer, nil) (defaults to: nil)

    The Image record ID, or nil to bulk enqueue all missing

  • options (Hash) (defaults to: {})

    Options

Options Hash (options):

  • :force (Boolean)

    Force regeneration even if fingerprint exists

  • :redirect_to (String)

    URL to redirect to after completion

  • :limit (Integer)

    Max images to process (bulk mode only)



50
51
52
53
54
55
56
# File 'app/workers/image_fingerprint_worker.rb', line 50

def perform(image_id = nil, options = {})
  if image_id.nil?
    bulk_enqueue(options.with_indifferent_access)
  else
    process_image(image_id, options.with_indifferent_access)
  end
end