Many aspects of JPEG compression have been successfully employed in the domain of photo forensics. In particular, artifacts introduced by the choice of the rounding operator used to quantize the DCT coefficients can be used to localize tampering and identify specific encoders.
Following the research in [1], this work aims to provide a Python implementation of an expectation maximization (EM) algorithm to localize inconsistencies in these artifacts that arise from a variety of image manipulations. The resulting output map is computed as described in [2].
Based on a research by S. Agarwal and H. Farid (see [1]).
This algorithm has been added as a feature to IEViewer [9], providing a neat graphical interface to an otherwise nonintuitive script; in this version it employs the default settings, and does not use ground truth maps. Check it out before cloning this repository if you are only interested in a basic usage.
As a Python 3 application, this project has a few basic requirements in order to be up and running. In order to install them, the pip
package installer is recommended, as it allows for the automatic installation of all requirements. Nonetheless, the latter have been listed in order to simplify an eventual manual installation.
It is assumed that Python 3 is already installed on the desired system.
-
Download the repository and navigate to its folder.
-
Install the requirements using
pip
from a terminal:pip install --upgrade -r requirements.txt
The following Python packages are required in order to run this program. Please note that versions are to be intended as minimum, or the latest compatible.
Package | Version |
---|---|
Python | 3.8 |
argparse | latest |
decimal | latest |
Matplotlib | 3.4.3 |
NumPy | 1.20.3 |
OpenCV | latest |
os | latest |
pandas | 1.3.3 |
Pillow | 8.2.0 |
random | latest |
scikit-learn | 1.0 |
sys | latest |
time | latest |
tqdm | 4.62.3 |
This project has been written and tested using Python 3.8 on a Windows 10 Pro machine.
Run from a terminal specifying the path to the image to be analyzed, as follows:
python3 main.py "path/to/image/image_file.jpg"
Optional arguments:
--win_size
: window size in pixel (default:64
), must be a multiple of 8;--stop_threshold
: expectation-maximization algorithm stop threshold (default:1e-3
);--prob_r_b_in_c1
: expectation-maximization algorithm probability of r conditioned by b belonging to C1 (default:0.5
);--interpolate
: interpolate missing pixel values, aka NaNs generated from divisions in the EM algorithm, using the function from [3], otherwise replace them with0.5
(default:False
). Warning: slows down the program significantly;--show
: show the resulting output map (default:True
);--save
: save the resulting output map in theresults
folder (default:False
);--roc_type
: choose the ROC function to use, betweencustom
andsklearn
(default:sklearn
);--show_roc_plot
: show the plot of the ROC curve (default:False
);--save_roc_plot
: save the plot of the ROC curve in theresults
folder (default:False
);--show_diff_plot
: show the plot of the difference between successive estimates of template c (default:False
);--save_diff_plot
: save the plot of the difference between successive estimates of template c in theresults
folder (default:False
);--verbose
: show progress in terminal (default:True
).
Example call with optional arguments:
python3 main.py "images/my_photo.jpg" --win_size=256 --stop_threshold=1e-2 --save=True
This script generates manipulated images and their respective ground truth masks from a given directory (path/to/images/
) in four subdirectories (path/to/images/manip_jpeg
, path/to/images/manip_png
, path/to/images/manip_jpeg/ground_truth
and path/to/images/manip_png/ground_truth
), as described in [1].
Specifically, for every original image the script generates 80 manipulated images (ground truth masks excluded), one for each:
- manipulation type:
- region size: 512 px, 256 px, 128 px and 64 px;
- JPEG quality: a random quality chosen from each of the ranges [60, 70], [71, 80], [81, 90] and [91, 100];
- save format: PNG and JPEG (OpenCV
imwrite()
function [6]).
The script can be run with:
python3 manipulation.py "path/to/images/"
This script generates the plots shown in figures 6 and 7 of [1] (except for figure 7(d)) using images manipulated as explained in the same paper. It can be used to either analyze images in a given directory and save the results as CSV files in a results
subfolder, or to create the plots from existing CSV files.
Along with the figures mentioned, this code also creates three additional plots showing the mean ROC curve by dimples strength, assuming that this information is available.
In order to analyze all images and generate CSV result files, the script can be run with:
python3 results.py True --dir_path="path/to/images/"
Optional arguments:
--win_size
: window size in pixel (default:64
). Agarwal & Farid use64
,128
and256
, for three different sets of experiments [1].
As mentioned, the script can also be used to create plots from existing results, assuming they have been generated with the previous command and exist as CSV files in the results
subfolder:
python3 results.py False --res_path="path/to/results/results_file.csv"
Optional arguments:
--show_plots
: show the results' plots (default:True
);--save_plots
: save the results' plots in the results' folder (default:True
).
All ROC curves in this project have been calculated with a specially optimized version of the function from [8], in order to get a fixed number of thresholds and easily calculate the average ROC curve.
This script parses an Amped Authenticate HTML report [7] containing information about the dimples' strength of an image dataset, and saves its contents to a CSV file (results/report.csv
) for easier indexing. Only selects images containing dimples stronger than 15 with offset [0, 0] are selected.
After the creation of the CSV report, the program can be used to randomly select n images for each of three dimples strength ranges, in order to provide new dataset partitions for further data insight:
- low dimple strength: [15, 30];
- medium dimple strength: [31, 45];
- high dimple strength: >= 45.
Note: This is a highly situational script, and as such has not been optimized for command line execution: variables must be inserted manually into the code before execution. It has only been included for completeness' sake.
This project has been successfully tested on the following platforms:
- Windows 10 Pro.
All tests were generated using a dataset kindly provided by ing. Marco Fontani (Amped Software) through prof. Alessandro Piva (Università degli Studi di Firenze).
Average ROC & AUC by manipulation size.
AUC by manipulation type.
AUC by EM algorithm window size.
AUC by JPEG quality.
Low strength dimples average ROC & AUC by manipulation size.
Medium strength dimples average ROC & AUC by manipulation size.
High strength dimples average ROC & AUC by manipulation size.
[1] Shruti Agarwal and Hany Farid. 2020. Photo Forensics From Rounding Artifacts. In Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec '20). Association for Computing Machinery, New York, NY, USA, 103–114, DOI:10.1145/3369412.3395059
[2] Shruti Agarwal and Hany Farid. 2017. Photo Forensics from JPEG Dimples. 2017 IEEE Workshop on Information Forensics and Security (WIFS), pp. 1-6, DOI:10.1109/WIFS.2017.8267641
[3] Sam De Meyer, interpolate missing values 2d python, 2021
[4] OpenCV, Inpainting, OpenCV Documentation
[5] Alexandru Telea, An image inpainting technique based on the fast marching method, Journal of graphics tools, 9(1):23–34, 2004, DOI:10.1080/10867651.2004.10487596
[6] OpenCV, imwrite(), OpenCV Documentation
[7] Amped Software, Amped Authenticate, 09.2021
[8] Flavia Giammarino, How to calculate TPR and FPR in Python without using sklearn?, 2020
[9] Paula Mihalcea, IEViewer, 2021
This work is licensed under a Creative Commons “Attribution-NonCommercial-ShareAlike 4.0 International” license. More details are available in the LICENSE file. All rights regarding the theory behind the EM algorithm reserved to the original paper's authors.