Class: ImageFingerprintWorker
- Inherits:
-
Object
- Object
- ImageFingerprintWorker
- Includes:
- Sidekiq::Worker, Workers::StatusBroadcastable
- Defined in:
- app/workers/image_fingerprint_worker.rb
Overview
Background worker for generating perceptual hash fingerprints for images.
Two modes of operation:
- No arguments: Find all images missing fingerprints and bulk enqueue
- With image_id: Process that single image's fingerprint
Three-tier strategy for fingerprint generation (most efficient first):
- Check if pHash is already stored in the asset JSONB field
- Fetch pHash from ImageKit's metadata API (fast, no download needed)
- Calculate pHash locally using phash-rb gem (requires download)
Both ImageKit and phash-rb use DCT-based pHash (pHash 0.9.6),
so fingerprints are comparable regardless of source.
The fingerprint enables:
- Exact duplicate detection (hamming distance = 0)
- Near-duplicate detection (hamming distance <= 15)
- Robust to resizing, compression, minor color changes
Constant Summary collapse
- DUPLICATE_THRESHOLD =
Recommended threshold for duplicate detection (hamming distance)
15 bits is the standard threshold for pHash duplicate detection 15
Instance Attribute Summary
Attributes included from Workers::StatusBroadcastable
Class Method Summary collapse
-
.duplicate?(hash1, hash2, threshold: DUPLICATE_THRESHOLD) ⇒ Boolean
Check if two images are duplicates based on fingerprints.
-
.hamming_distance(hash1, hash2) ⇒ Integer
Calculate Hamming distance between two fingerprints Works with hex strings (both ImageKit pHash and local pHash) Lower distance = more similar images.
Instance Method Summary collapse
Methods included from Workers::StatusBroadcastable::Overrides
Class Method Details
.duplicate?(hash1, hash2, threshold: DUPLICATE_THRESHOLD) ⇒ Boolean
Check if two images are duplicates based on fingerprints
79 80 81 |
# File 'app/workers/image_fingerprint_worker.rb', line 79 def self.duplicate?(hash1, hash2, threshold: DUPLICATE_THRESHOLD) hamming_distance(hash1, hash2) <= threshold end |
.hamming_distance(hash1, hash2) ⇒ Integer
Calculate Hamming distance between two fingerprints
Works with hex strings (both ImageKit pHash and local pHash)
Lower distance = more similar images
66 67 68 69 70 |
# File 'app/workers/image_fingerprint_worker.rb', line 66 def self.hamming_distance(hash1, hash2) h1 = hash1.is_a?(String) ? hash1.to_i(16) : hash1.to_i h2 = hash2.is_a?(String) ? hash2.to_i(16) : hash2.to_i (h1 ^ h2).to_s(2).count('1') end |
Instance Method Details
#perform(image_id = nil, options = {}) ⇒ Object
50 51 52 53 54 55 56 |
# File 'app/workers/image_fingerprint_worker.rb', line 50 def perform(image_id = nil, = {}) if image_id.nil? bulk_enqueue(.with_indifferent_access) else process_image(image_id, .with_indifferent_access) end end |