Class: ImageDuplicatePair
- Inherits:
-
ApplicationRecord
- Object
- ActiveRecord::Base
- ApplicationRecord
- ImageDuplicatePair
- Defined in:
- app/models/image_duplicate_pair.rb
Overview
== Schema Information
Table name: image_duplicate_pairs
Database name: primary
id :bigint not null, primary key
hamming_distance :integer not null
status :string default("pending"), not null
created_at :datetime not null
updated_at :datetime not null
image_a_id :bigint not null
image_b_id :bigint not null
Indexes
index_image_duplicate_pairs_on_hamming_distance (hamming_distance)
index_image_duplicate_pairs_on_image_a_id_and_image_b_id (image_a_id,image_b_id) UNIQUE
index_image_duplicate_pairs_on_image_b_id (image_b_id)
index_image_duplicate_pairs_on_status (status)
Constant Summary collapse
- STATUSES =
Status values
%w[pending reviewed false_positive merged].freeze
Constants included from Schedulable
Schedulable::SIMPLE_FORM_OPTIONS
Instance Attribute Summary collapse
-
#hamming_distance ⇒ Object
readonly
Validations.
- #image_a_id ⇒ Object readonly
- #status ⇒ Object readonly
Belongs to collapse
-
#image_a ⇒ Image
Associations.
- #image_b ⇒ Image
Class Method Summary collapse
-
.build_clusters(threshold: 10) ⇒ Array<Set<Integer>>
Group pairs into clusters for display Returns groups of image IDs that are all duplicates of each other.
-
.bulk_upsert_or_update(pairs_data) ⇒ Integer
Bulk upsert pairs - inserts new pairs or updates existing ones in a single query.
-
.exact_matches ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are exact matches.
-
.false_positives ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are false positives.
-
.find_or_create_pair(image_1, image_2, distance:) ⇒ ImageDuplicatePair
Find or create a pair, ensuring consistent ordering (smaller id first).
-
.merged ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are merged.
-
.pending ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are pending.
-
.recent ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are recent.
-
.reviewed ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are reviewed.
-
.within_threshold ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are within threshold.
Instance Method Summary collapse
-
#mark_false_positive! ⇒ Object
Mark this pair as a false positive (not actually duplicates).
-
#mark_merged! ⇒ Object
Mark this pair as merged (one image was merged into the other).
-
#mark_reviewed! ⇒ Object
Mark this pair as reviewed (not false positive, just seen).
Methods inherited from ApplicationRecord
ransackable_associations, ransackable_attributes, ransackable_scopes, ransortable_attributes, #to_relation
Methods included from Schedulable
Methods included from Models::AfterCommittable
Methods included from Models::EventPublishable
Instance Attribute Details
#hamming_distance ⇒ Object (readonly)
Validations
Validations:
29 |
# File 'app/models/image_duplicate_pair.rb', line 29 validates :hamming_distance, presence: true |
#image_a_id ⇒ Object (readonly)
30 |
# File 'app/models/image_duplicate_pair.rb', line 30 validates :image_a_id, uniqueness: { scope: :image_b_id } |
#status ⇒ Object (readonly)
35 |
# File 'app/models/image_duplicate_pair.rb', line 35 validates :status, inclusion: { in: STATUSES } |
Class Method Details
.build_clusters(threshold: 10) ⇒ Array<Set<Integer>>
Group pairs into clusters for display
Returns groups of image IDs that are all duplicates of each other
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'app/models/image_duplicate_pair.rb', line 111 def self.build_clusters(threshold: 10) pairs = within_threshold(threshold).pending.pluck(:image_a_id, :image_b_id) return [] if pairs.empty? # Union-Find algorithm parent = {} find_root = lambda do |x| parent[x] ||= x parent[x] = find_root.call(parent[x]) if parent[x] != x parent[x] end union = lambda do |x, y| px = find_root.call(x) py = find_root.call(y) parent[px] = py if px != py end pairs.each { |id1, id2| union.call(id1, id2) } # Group by root groups = Hash.new { |h, k| h[k] = Set.new } parent.keys.each do |id| root = find_root.call(id) groups[root] << id end groups.values.select { |cluster| cluster.size > 1 } end |
.bulk_upsert_or_update(pairs_data) ⇒ Integer
Bulk upsert pairs - inserts new pairs or updates existing ones in a single query.
Uses PostgreSQL's ON CONFLICT to efficiently handle duplicates.
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'app/models/image_duplicate_pair.rb', line 75 def self.bulk_upsert_or_update(pairs_data) return 0 if pairs_data.blank? now = Time.current # Prepare records with canonical ordering (smaller id first) records = pairs_data.map do |pair| image_a_id, image_b_id = [pair[:id1], pair[:id2]].sort { image_a_id: image_a_id, image_b_id: image_b_id, hamming_distance: pair[:distance], status: 'pending', created_at: now, updated_at: now } end # Use upsert_all for efficient bulk insert/update # On conflict, update hamming_distance (distance may change if fingerprints recalculated) # Note: Rails automatically handles updated_at, so we only specify hamming_distance upsert_all( records, unique_by: %i[image_a_id image_b_id], update_only: %i[hamming_distance] ) records.size end |
.exact_matches ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are exact matches. Active Record Scope
43 |
# File 'app/models/image_duplicate_pair.rb', line 43 scope :exact_matches, -> { where(hamming_distance: 0) } |
.false_positives ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are false positives. Active Record Scope
40 |
# File 'app/models/image_duplicate_pair.rb', line 40 scope :false_positives, -> { where(status: 'false_positive') } |
.find_or_create_pair(image_1, image_2, distance:) ⇒ ImageDuplicatePair
Find or create a pair, ensuring consistent ordering (smaller id first)
54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'app/models/image_duplicate_pair.rb', line 54 def self.find_or_create_pair(image_1, image_2, distance:) id_1 = image_1.is_a?(Image) ? image_1.id : image_1 id_2 = image_2.is_a?(Image) ? image_2.id : image_2 # Ensure consistent ordering image_a_id, image_b_id = [id_1, id_2].sort find_or_create_by!(image_a_id: image_a_id, image_b_id: image_b_id) do |pair| pair.hamming_distance = distance end rescue ActiveRecord::RecordNotUnique # Race condition - another process created it find_by!(image_a_id: image_a_id, image_b_id: image_b_id) end |
.merged ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are merged. Active Record Scope
41 |
# File 'app/models/image_duplicate_pair.rb', line 41 scope :merged, -> { where(status: 'merged') } |
.pending ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are pending. Active Record Scope
38 |
# File 'app/models/image_duplicate_pair.rb', line 38 scope :pending, -> { where(status: 'pending') } |
.recent ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are recent. Active Record Scope
45 |
# File 'app/models/image_duplicate_pair.rb', line 45 scope :recent, -> { order(created_at: :desc) } |
.reviewed ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are reviewed. Active Record Scope
39 |
# File 'app/models/image_duplicate_pair.rb', line 39 scope :reviewed, -> { where(status: 'reviewed') } |
.within_threshold ⇒ ActiveRecord::Relation<ImageDuplicatePair>
A relation of ImageDuplicatePairs that are within threshold. Active Record Scope
44 |
# File 'app/models/image_duplicate_pair.rb', line 44 scope :within_threshold, ->(threshold) { where(hamming_distance: ..threshold) } |
Instance Method Details
#image_a ⇒ Image
Associations
25 |
# File 'app/models/image_duplicate_pair.rb', line 25 belongs_to :image_a, class_name: 'Image' |
#image_b ⇒ Image
26 |
# File 'app/models/image_duplicate_pair.rb', line 26 belongs_to :image_b, class_name: 'Image' |
#mark_false_positive! ⇒ Object
Mark this pair as a false positive (not actually duplicates)
148 149 150 |
# File 'app/models/image_duplicate_pair.rb', line 148 def mark_false_positive! update!(status: 'false_positive') end |
#mark_merged! ⇒ Object
Mark this pair as merged (one image was merged into the other)
153 154 155 |
# File 'app/models/image_duplicate_pair.rb', line 153 def mark_merged! update!(status: 'merged') end |
#mark_reviewed! ⇒ Object
Mark this pair as reviewed (not false positive, just seen)
143 144 145 |
# File 'app/models/image_duplicate_pair.rb', line 143 def mark_reviewed! update!(status: 'reviewed') end |