With the rapid growth of digital images on the Internet, sharing, editing, and re-distribution have become extremely easy due to advanced image editing tools. As a result, protecting the rights of original image creators has become increasingly important, motivating the need for reliable image copy detection techniques.
This project presents a robust image copy detection framework that balances robustness and discrimination in image hashing. The system combines deep global features and moment-based local features, followed by efficient large-scale retrieval using FAISS (Facebook AI Similarity Search).
The proposed method is evaluated on the UCID dataset and further validated on the large-scale COCO dataset, showing significant improvements over a baseline research approach.
- Used to evaluate the perceptual robustness of the proposed scheme.
- Contains 1,338 original images.
- Each original image is treated as a reference for generating manipulated copies.
- Used for large-scale testing and scalability analysis.
- Contains 40,000 images.
Preprocessing is performed using preprocess_images.py.
- Images are resized to 224 × 224 using bilinear interpolation.
- Gaussian Low-Pass Filtering (GLF) is applied to smooth images, reduce noise, and remove high-frequency details.
Image manipulations are performed using manipulate.py to simulate real-world distortions.
- Speckle Noise (SN): Variance range 0.001 – 0.01
- Salt-and-Pepper Noise (SPN): Density range 0.001 – 0.01
- Gamma Correction (GC)
- Brightness Adjustment (BA)
- Gaussian Low-Pass Filtering (GLF)
- JPEG Compression (JC)
- Watermark Embedding (WE)
- Mirroring
- Rotation
Each manipulated image is treated as a near-duplicate of its original image.
An effective image hashing system must satisfy two key properties:
- Robustness: Slightly modified versions of the same image should produce similar hashes.
- Discrimination: Completely different images should produce very different hashes.
Balancing these two properties is the central challenge addressed in this work.
To study the robustness–discrimination trade-off, a baseline method from an existing research paper was implemented.
- Global Features: VGG16-based deep features (40 features)
- Local Features: Meixner Moments (16 features)
- Total Hash Length: 56 features per image
- Precision: 0.50
- Recall: 0.98
Although recall is high, the low precision indicates poor discrimination and a high false-positive rate.
To improve robustness while maintaining discrimination, a hybrid feature extraction approach is introduced.
- Convolutional Autoencoder
- Latent space dimension: 256 features
Local features are extracted using multiple moment-based descriptors:
-
Meixner Polynomials
-
Krawtchouk Moments
-
Tchebichef Moments
-
Total local features: 2,940
- Global features: 256
- Local features: 2,940
- Total feature length: 3,196
Image retrieval is performed using FAISS (Facebook AI Similarity Search) for efficient near-duplicate detection.
- Feature vectors are L2-normalized
- Cosine similarity (inner product) is used
- IVF-KMeans indexing is applied
- Database descriptors are clustered using k-means
- Each descriptor is assigned to its nearest centroid
- During retrieval, only the closest cluster(s) are searched
- Similarity scores are sorted and filtered using a threshold (≈ 0.65)
- Search time: < 1 second per query
- Precision
- Recall
(Optimal configuration: Autoencoder + Multiple Moments)
- Precision: 1.000
- Recall: 0.8616
(Optimal configuration: Autoencoder + Meixner + Krawtchouk + Tchebichef)
- Precision: 1.000
- Recall: 0.9685
A dataset-level threshold analysis was conducted on the UCID dataset by varying the cosine similarity threshold from 0.50 to 0.95.
- Precision increases with higher thresholds
- Recall decreases slightly at higher thresholds
- Best trade-off achieved at threshold = 0.65
At threshold = 0.65:
- Precision: 1.000
- Recall: 0.8196
- F1-score: 0.9008 (maximum)
- F1-score increases sharply from threshold 0.50 → 0.65
- Peak F1-score occurs at 0.65
- Beyond this point, recall drops while precision remains perfect
This confirms 0.65 as the optimal similarity threshold.
| Model | Hash Generation Time |
|---|---|
| VGG16 + Meixner (Baseline) | ~11 hours |
| Autoencoder + Multiple Moments | ~10.5 hours |
| Autoencoder + Meixner | ~4 hours |
| Autoencoder + (Meixner + Krawtchouk) | ~5 hours |
| Autoencoder + (Krawtchouk + Tchebichef) | ~5 hours |
The Autoencoder + Meixner configuration provides the best balance between accuracy and computational efficiency.
The proposed image copy detection framework effectively balances robustness and discrimination by combining deep global features with moment-based local features. The use of FAISS with IVF-KMeans enables fast and scalable retrieval, making the system suitable for large-scale real-world applications.