This project provides tools to compare image using different image metrics and color space
- Compute Image Quality Assessment Metrics: Assessment quality with multiple full and no reference metrics
- Image Difference: Generate thresholded difference images to highlight significant differences between two images
- Heatmaps Generation: Generate metric maps visualizing the spatial distribution of metric values across the image
It supports:
- 18 full-reference
- 5 no-reference
- image diffs in 8 different color spaces with flexible thresholding
The full reference and no reference metrics are from these python packages:
Some example comparison databases are available here: https://lanl.github.io/libra/
- Clone Repository
git clone https://github.com/lanl/libra
- Install Dependencies
pip install opencv-python-headless numpy matplotlib scikit-image torch piq pyiqa ImageHash
Note: some dependencies are not available through conda. We recommend using virtual environments for now.
A command line interface is provided, that is accessive as follows:
python src/app.py -h
There are three modes to use the tool:
- using a JSON file
- using the command line interface
- as a library
The JSON configuration file should contain the following keys:
- reference_image_path (str): Path to the reference image.
- distorted_image_path (str): Path to the distorted image.
- output_directory (str): Path to the output directory where the CSV file and metric maps will be saved.
- output_filename (str, optional): Name of the output CSV file (default: "metrics.csv").
- generate_metrics (bool, optional): Flag to generate metrics (default: False).
- generate_maps (bool, optional): Flag to generate metric maps (default: False).
- generate_image_difference (bool, optional): Flag to generate thresholded difference images (default: False).
- difference_threshold (int, optional): Threshold value for generating thresholded difference images (default: 10).
- metrics (list of str, optional): List of metrics to compute.
- color_spaces (list of str, optional): List of color spaces to use for computing metrics (default: ["RGB"]).
- map_window_size (int, optional): Window size for computing metric maps (default: 161).
- map_step_size (int, optional): Step size for computing metric maps (default: 50).
Here is an example of a JSON configuration, also available in the samples folder:
{
"reference_image_path": "tests/data/test/orig.png",
"distorted_image_path": "tests/data/test/compressed.png",
"output_directory": "test_output",
"output_filename": "metrics.csv",
"generate_maps": true,
"generate_metrics": true,
"generate_image_difference": true,
"difference_threshold": 10,
"metrics": ["PSNR", "SSIM", "VSI", "GMSD", "MSE", "DSS"],
"color_spaces": ["RGB", "HSV", "LAB"],
"map_window_size": 161,
"map_step_size": 50
}
It can be run from the home directory as follows:
python src/app.py -j samples/sample_input.json
The command line interface is useful for quick comparisons between two images. It can be used e.g. as
python src/main.py -r tests/data/test/orig.png -c tests/data/test/compressed.png -m SSIM -p
Refer to the example.ipynb notebook in the samples folder
This example evaluates the visualization quality of an isotropic turbulence dataset subjected to tensor compression with a maximum Peak Signal-to-Noise Ratio (PSNR) of 40. The assessment focuses on how effectively the tensor compression retains the visual fidelity of the turbulence data.
References
Dataset: https://klacansky.com/open-scivis-datasets/\
Compression Technique: https://github.com/rballester/tthresh
Reference Image ![]() |
Compressed Image (PSNR: 40) ![]() |
Color Space | Description |
---|---|
RGB | Standard color space with three primary colors: Red, Green, and Blue. Commonly used in digital images and displays. |
HSV | Stands for Hue, Saturation, and Value. Often used in image processing and computer vision because it separates color. |
HLS | Stands for Hue, Lightness, and Saturation. Similar to HSV but with a different way of representing colors. |
LAB | Consists of three components: Lightness (L*), a* (green to red), and b* (blue to yellow). Mimics human vision. |
XYZ | A linear color space derived from the CIE 1931 color matching functions. Basis for many other color spaces. |
LUV | Similar to LAB but with a different chromaticity component. Used in color difference calculations and image analysis. |
YCbCr | Color space used in video compression. Separates the image into luminance (Y) and chrominance (Cb and Cr) components. |
YUV | Used in analog television and some digital video formats. Separates image into luminance (Y) and chrominance (U and V). |
- AUTUMN
- BONE
- JET
- WINTER
- RAINBOW
- OCEAN
- SUMMER
- SPRING
- COOL
- HSV
- PINK
- HOTb
- PARULA
- MAGMA
- INFERNO
- PLASMA
- VIRIDIS
- CVIRIDIS
- TWILIGHT
- TWILIGHT_SHILFTED
- TURBO
- DEEPGREEN
Metric | Python Package | Description | Value Ranges |
---|---|---|---|
MSE | libra | Measures the average squared difference between the reference and test images. | Range: [0, ∞). Lower MSE indicates higher similarity. |
SSIM | piq | Assesses the structural similarity between images considering luminance, contrast, and structure. | Range: [-1, 1]. Higher values indicate better similarity. |
PSNR | piq | Represents the ratio between the maximum possible power of a signal and the power of corrupting noise. | Range: [0, ∞) dB. Higher values indicate better image quality. |
FSIM | piq | Evaluates image quality based on feature similarity considering phase congruency and gradient magnitude. | Range: [0, 1]. Higher values indicate better feature similarity. |
MS-SSIM | piq | Extension of SSIM that evaluates image quality at multiple scales. | Range: [0, 1]. Higher values indicate better structural similarity. |
VSI | piq | Measures image quality based on visual saliency. | Range: [0, 1]. Higher values indicate better visual similarity. |
SR-SIM | piq | Assesses image quality using spectral residual information. | Range: [0, 1]. Higher values indicate better visual similarity. |
MS-GMSD | piq | Evaluates image quality based on gradient magnitude similarity across multiple scales. | Range: [0, ∞). Lower values indicate higher similarity. |
LPIPS | piq | Uses deep learning models to assess perceptual similarity. | Range: [0, 1]. Lower values indicate higher similarity. |
PieAPP | piq | Deep learning-based metric for perceptual image quality. | Range: [0, 1]. Lower values indicate higher quality. |
DISTS | piq | Combines deep learning features to evaluate image quality based on structure and texture similarity. | Range: [0, 1]. Lower values indicate higher similarity. |
MDSI | piq | Measures image quality based on mean deviation similarity index. | Range: [0, ∞). Lower values indicate better quality. |
DSS | piq | Computes image quality using a detailed similarity structure. | Range: [0, 1]. Higher values indicate better similarity. |
IW-SSIM | piq | Information-weighted SSIM that emphasizes important regions in images. | Range: [0, 1]. Higher values indicate better structural similarity. |
VIFp | piq | Measures image quality based on visual information fidelity. | Range: [0, 1]. Higher values indicate better preservation of information. |
GMSD | piq | Gradient Magnitude Similarity Deviation metric for assessing image quality. | Range: [0, ∞). Lower values indicate higher similarity. |
HaarPSI | piq | Uses Haar wavelet-based perceptual similarity index to evaluate image quality. | Range: [0, 1]. Higher values indicate better perceptual similarity. |
pHash | ImageHash | Generates a compact hash value that represents the perceptual content of an image. | Range: [0, ∞). Higher values indicate worse perceptual similarity. |
Metric | Python Package | Description | Value Ranges |
---|---|---|---|
BRISQUE | pyiqa | Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) uses natural scene statistics to measure image quality. | Range: [0, 100]. Lower values indicate better quality. |
CLIP-IQA | piq | Image quality metric that utilizes the CLIP model to assess the visual quality of images based on their similarity to predefined text prompts. | Range: [0, 1]. Higher values indicate better quality. |
NIQE | pyiqa | Natural Image Quality Evaluator. It assesses image quality based on statistical features derived from natural scene statistics. | Range: [0, 100]. Lower values indicate better quality. |
MUSIQ | pyiqa | Multi-Scale Image Quality. An advanced metric that evaluates image quality across multiple scales to better capture perceptual quality. | Range: [0, 1]. Higher values indicate better quality. |
NIMA | pyiqa | Neural Image Assessment. A deep learning-based model that predicts the aesthetic and technical quality of images. | Range: [0, 10]. Higher values indicate better quality. |
This cinema database shows the results of a small lossy compression study.
Yanni Etchi, Daoce Wang, Pascal Grosset, Terece L. Turton, James Ahrens, and David Rogers. 2025. An Exploration of How Volume Rendering is Impacted by Lossy Data Reduction. In Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '24). IEEE Press, 250–259. https://lanl.github.io/libra/
This material is based upon work supported by:
- the Computational Systems and Software Environments subprogram of Los Alamos National Laboratory’s Advanced Simulation and Computing program (NNSA/DOE) under the Visualization Research & Development initiative
- the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program.