Photo Forensics from Rounding Artifacts: a Python implementation

Author: Paula Mihalcea

Università degli Studi di Firenze

Many aspects of JPEG compression have been successfully employed in the domain of photo forensics. In particular, artifacts introduced by the choice of the rounding operator used to quantize the DCT coefficients can be used to localize tampering and identify specific encoders.

Following the research in [1], this work aims to provide a Python implementation of an expectation maximization (EM) algorithm to localize inconsistencies in these artifacts that arise from a variety of image manipulations. The resulting output map is computed as described in [2].

Based on a research by S. Agarwal and H. Farid (see [1]).

GUI

This algorithm has been added as a feature to IEViewer [9], providing a neat graphical interface to an otherwise nonintuitive script; in this version it employs the default settings, and does not use ground truth maps. Check it out before cloning this repository if you are only interested in a basic usage.

Installation

As a Python 3 application, this project has a few basic requirements in order to be up and running. In order to install them, the pip package installer is recommended, as it allows for the automatic installation of all requirements. Nonetheless, the latter have been listed in order to simplify an eventual manual installation.

It is assumed that Python 3 is already installed on the desired system.

Download the repository and navigate to its folder.
Install the requirements using pip from a terminal:
```
pip install --upgrade -r requirements.txt
```

Requirements

The following Python packages are required in order to run this program. Please note that versions are to be intended as minimum, or the latest compatible.

Package	Version
Python	3.8
argparse	latest
decimal	latest
Matplotlib	3.4.3
NumPy	1.20.3
OpenCV	latest
os	latest
pandas	1.3.3
Pillow	8.2.0
random	latest
scikit-learn	1.0
sys	latest
time	latest
tqdm	4.62.3

Testing

This project has been written and tested using Python 3.8 on a Windows 10 Pro machine.

Usage

Main script

Run from a terminal specifying the path to the image to be analyzed, as follows:

python3 main.py "path/to/image/image_file.jpg"

Optional arguments:

--win_size: window size in pixel (default: 64), must be a multiple of 8;
--stop_threshold: expectation-maximization algorithm stop threshold (default: 1e-3);
--prob_r_b_in_c1: expectation-maximization algorithm probability of r conditioned by b belonging to C₁ (default: 0.5);
--interpolate: interpolate missing pixel values, aka NaNs generated from divisions in the EM algorithm, using the function from [3], otherwise replace them with 0.5 (default: False). Warning: slows down the program significantly;
--show: show the resulting output map (default: True);
--save: save the resulting output map in the results folder (default: False);
--roc_type: choose the ROC function to use, between custom and sklearn (default: sklearn);
--show_roc_plot: show the plot of the ROC curve (default: False);
--save_roc_plot: save the plot of the ROC curve in the results folder (default: False);
--show_diff_plot: show the plot of the difference between successive estimates of template c (default: False);
--save_diff_plot: save the plot of the difference between successive estimates of template c in the results folder (default: False);
--verbose: show progress in terminal (default: True).

Example call with optional arguments:

python3 main.py "images/my_photo.jpg" --win_size=256 --stop_threshold=1e-2 --save=True

Manipulation script

This script generates manipulated images and their respective ground truth masks from a given directory (path/to/images/) in four subdirectories (path/to/images/manip_jpeg, path/to/images/manip_png, path/to/images/manip_jpeg/ground_truth and path/to/images/manip_png/ground_truth), as described in [1].

Specifically, for every original image the script generates 80 manipulated images (ground truth masks excluded), one for each:

manipulation type:
- copy-move;
- median filter: 3x3 OpenCV median filter;
- rotation: random rotation of 10 to 80 degrees;
- content-aware fill: OpenCV inpaint() function [4] with Telea method [5];
region size: 512 px, 256 px, 128 px and 64 px;
JPEG quality: a random quality chosen from each of the ranges [60, 70], [71, 80], [81, 90] and [91, 100];
save format: PNG and JPEG (OpenCV imwrite() function [6]).

The script can be run with:

python3 manipulation.py "path/to/images/"

Results script

This script generates the plots shown in figures 6 and 7 of [1] (except for figure 7(d)) using images manipulated as explained in the same paper. It can be used to either analyze images in a given directory and save the results as CSV files in a results subfolder, or to create the plots from existing CSV files.

Along with the figures mentioned, this code also creates three additional plots showing the mean ROC curve by dimples strength, assuming that this information is available.

In order to analyze all images and generate CSV result files, the script can be run with:

python3 results.py True --dir_path="path/to/images/"

Optional arguments:

--win_size: window size in pixel (default: 64). Agarwal & Farid use 64, 128 and 256, for three different sets of experiments [1].

As mentioned, the script can also be used to create plots from existing results, assuming they have been generated with the previous command and exist as CSV files in the results subfolder:

python3 results.py False --res_path="path/to/results/results_file.csv"

Optional arguments:

--show_plots: show the results' plots (default: True);
--save_plots: save the results' plots in the results' folder (default: True).

All ROC curves in this project have been calculated with a specially optimized version of the function from [8], in order to get a fixed number of thresholds and easily calculate the average ROC curve.

Amped report parsing script

This script parses an Amped Authenticate HTML report [7] containing information about the dimples' strength of an image dataset, and saves its contents to a CSV file (results/report.csv) for easier indexing. Only selects images containing dimples stronger than 15 with offset [0, 0] are selected.

After the creation of the CSV report, the program can be used to randomly select n images for each of three dimples strength ranges, in order to provide new dataset partitions for further data insight:

low dimple strength: [15, 30];
medium dimple strength: [31, 45];
high dimple strength: >= 45.

Note: This is a highly situational script, and as such has not been optimized for command line execution: variables must be inserted manually into the code before execution. It has only been included for completeness' sake.

Testing

This project has been successfully tested on the following platforms:

Windows 10 Pro.

All tests were generated using a dataset kindly provided by ing. Marco Fontani (Amped Software) through prof. Alessandro Piva (Università degli Studi di Firenze).

Results

Average ROC & AUC by manipulation size.

AUC by manipulation type.

AUC by EM algorithm window size.

AUC by JPEG quality.

Low strength dimples average ROC & AUC by manipulation size.

Medium strength dimples average ROC & AUC by manipulation size.

High strength dimples average ROC & AUC by manipulation size.

Bibliography

[1] Shruti Agarwal and Hany Farid. 2020. Photo Forensics From Rounding Artifacts. In Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec '20). Association for Computing Machinery, New York, NY, USA, 103–114, DOI:10.1145/3369412.3395059

[2] Shruti Agarwal and Hany Farid. 2017. Photo Forensics from JPEG Dimples. 2017 IEEE Workshop on Information Forensics and Security (WIFS), pp. 1-6, DOI:10.1109/WIFS.2017.8267641

[3] Sam De Meyer, interpolate missing values 2d python, 2021

[4] OpenCV, Inpainting, OpenCV Documentation

[5] Alexandru Telea, An image inpainting technique based on the fast marching method, Journal of graphics tools, 9(1):23–34, 2004, DOI:10.1080/10867651.2004.10487596

[6] OpenCV, imwrite(), OpenCV Documentation

[7] Amped Software, Amped Authenticate, 09.2021

[8] Flavia Giammarino, How to calculate TPR and FPR in Python without using sklearn?, 2020

[9] Paula Mihalcea, IEViewer, 2021

License

This work is licensed under a Creative Commons “Attribution-NonCommercial-ShareAlike 4.0 International” license. More details are available in the LICENSE file. All rights regarding the theory behind the EM algorithm reserved to the original paper's authors.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
results		results
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
em.py		em.py
main.py		main.py
manipulation.py		manipulation.py
parse_amped_report.py		parse_amped_report.py
postprocessing.py		postprocessing.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
results.py		results.py
slides.pdf		slides.pdf
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Photo Forensics from Rounding Artifacts: a Python implementation

Author: Paula Mihalcea

Università degli Studi di Firenze

Contents

GUI

Installation

Requirements

Testing

Usage

Main script

Manipulation script

Results script

Amped report parsing script

Testing

Results

Bibliography

License

About

Releases

Packages

Languages

License

PaulaMihalcea/Photo-Forensics-from-Rounding-Artifacts

Folders and files

Latest commit

History

Repository files navigation

Photo Forensics from Rounding Artifacts: a Python implementation

Author: Paula Mihalcea

Università degli Studi di Firenze

Contents

GUI

Installation

Requirements

Testing

Usage

Main script

Manipulation script

Results script

Amped report parsing script

Testing

Results

Bibliography

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages