Skip to content

GBeurier/nirs4all

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

496 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
NIRS4ALL Logo CIRAD Logo

NIRS4ALL

A comprehensive Python library for Near-Infrared Spectroscopy data analysis

PyPI version Python 3.11+ License: CeCILL-2.1 Code style: ruff

DocumentationInstallationQuick StartExamplesContributing


Overview

NIRS4ALL bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation. Built for researchers and practitioners working with Near-Infrared Spectroscopy data.

Performance Heatmap

Key Features

  • NIRS-Specific Preprocessing — SNV, MSC, Savitzky-Golay, Norris-Williams, wavelet denoise, OSC/EPO, and 30+ spectral transforms
  • Advanced PLS Models — AOM-PLS, POP-PLS, OPLS, DiPLS, MBPLS, and 15+ PLS variants with automatic operator selection
  • Multi-Backend ML — Seamless integration with scikit-learn, TensorFlow, PyTorch, and JAX
  • Declarative Pipelines — Define complex workflows with simple, readable syntax
  • Parallel Execution — Multi-core pipeline variant execution via joblib
  • Hyperparameter Tuning — Built-in Optuna integration for automated optimization
  • Rich Visualizations — Performance heatmaps, candlestick plots, SHAP explanations
  • Model Deployment — Export trained pipelines as portable .n4a bundles
  • sklearn CompatibleNIRSPipeline wrapper for SHAP, cross-validation, and more
Performance Heatmap Performance Distribution Regression Scatter Plot
Advanced visualization capabilities for model performance analysis

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With ML Backends

# TensorFlow
pip install nirs4all[tensorflow]

# PyTorch
pip install nirs4all[torch]

# JAX
pip install nirs4all[jax]

# All frameworks
pip install nirs4all[all]

# All frameworks with GPU support
pip install nirs4all[all-gpu]

Conda

Coming soon! We're working with conda-forge to make NIRS4ALL available through conda. In the meantime, use pip install nirs4all or docker.

# Available soon:
# conda install -c conda-forge nirs4all

Docker

docker pull ghcr.io/gbeurier/nirs4all:latest
docker run -v $(pwd):/workspace ghcr.io/gbeurier/nirs4all python my_script.py

Development Installation

git clone https://github.com/GBeurier/nirs4all.git
cd nirs4all
pip install -e ".[dev]"

Verify Installation

nirs4all --test-install      # Check dependencies
nirs4all --test-integration  # Run integration tests
nirs4all --version           # Check version

Quick Start

Simple API (Recommended)

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression

# Define your pipeline
pipeline = [
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},
    ShuffleSplit(n_splits=3, test_size=0.25),
    {"model": PLSRegression(n_components=10)}
]

# Train and evaluate
result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/your/data",
    name="MyPipeline",
    verbose=1
)

# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

# Export for deployment
result.export("exports/best_model.n4a")

Session for Multiple Runs

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor

with nirs4all.session(verbose=1, save_artifacts=True) as s:
    # Compare models with shared configuration
    pls_result = nirs4all.run(
        pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
        dataset="data/wheat.csv",
        name="PLS",
        session=s
    )

    rf_result = nirs4all.run(
        pipeline=[MinMaxScaler(), RandomForestRegressor(n_estimators=100)],
        dataset="data/wheat.csv",
        name="RandomForest",
        session=s
    )

    print(f"PLS: {pls_result.best_rmse:.4f} | RF: {rf_result.best_rmse:.4f}")

sklearn Integration with SHAP

import nirs4all
from nirs4all.sklearn import NIRSPipeline
import shap

# Train with nirs4all
result = nirs4all.run(pipeline, dataset)

# Wrap for sklearn compatibility
pipe = NIRSPipeline.from_result(result)

# Use with SHAP
explainer = shap.Explainer(pipe.predict, X_background)
shap_values = explainer(X_test)
shap.summary_plot(shap_values)

Pipeline Syntax

NIRS4ALL uses a declarative syntax for defining pipelines:

from nirs4all.operators.transforms import SNV, SavitzkyGolay, FirstDerivative

pipeline = [
    # Preprocessing
    MinMaxScaler(),
    SNV(),
    SavitzkyGolay(window_length=11, polyorder=2),

    # Target scaling
    {"y_processing": MinMaxScaler()},

    # Cross-validation
    ShuffleSplit(n_splits=5, test_size=0.2),

    # Models to compare
    {"model": PLSRegression(n_components=10)},
    {"model": RandomForestRegressor(n_estimators=100)},

    # Neural network with training parameters
    {
        "model": nicon,
        "name": "NICON-CNN",
        "train_params": {"epochs": 100, "patience": 20}
    }
]

Advanced Features

# Feature augmentation - generate preprocessing combinations
{
    "feature_augmentation": {
        "_or_": [SNV, FirstDerivative, SavitzkyGolay],
        "size": [1, (1, 2)],
        "count": 5
    }
}

# Hyperparameter optimization
{
    "model": PLSRegression(),
    "finetune_params": {
        "n_trials": 50,
        "model_params": {"n_components": ("int", 1, 30)}
    }
}

# Branching for parallel preprocessing paths
{
    "branch": [
        [SNV(), PLSRegression(n_components=10)],
        [MSC(), RandomForestRegressor()]
    ]
}

# Merge branch outputs (stacking)
{"merge": "predictions"}

Available Transforms

NIRS-Specific Preprocessing

Transform Description
SNV / StandardNormalVariate Standard Normal Variate normalization
RNV / RobustStandardNormalVariate Robust Normal Variate (outlier-resistant)
MSC / MultiplicativeScatterCorrection Multiplicative Scatter Correction
SavitzkyGolay Smoothing and derivative computation
FirstDerivative / SecondDerivative Spectral derivatives
NorrisWilliams Gap derivative with segment smoothing
WaveletDenoise Multi-level wavelet denoising with thresholding
OSC Orthogonal Signal Correction (DOSC)
EPO External Parameter Orthogonalization
Detrend Remove linear/polynomial trends
Gaussian Gaussian smoothing
Haar Haar wavelet decomposition

Signal Processing

Transform Description
Baseline Baseline correction (ALS, AirPLS, ArPLS, IModPoly, SNIP, etc.)
ReflectanceToAbsorbance Convert R to A using Beer-Lambert
ToAbsorbance / FromAbsorbance Signal type conversion
KubelkaMunk Kubelka-Munk transform
Resampler Wavelength interpolation
CARS / MCUVE Feature selection methods

Built-in NIRS Models

Model Description
AOMPLSRegressor / AOMPLSClassifier Adaptive Operator-Mixture PLS — auto-selects best preprocessing
POPPLSRegressor / POPPLSClassifier Per-Operator-Per-component PLS via PRESS
PLSDA PLS Discriminant Analysis
OPLS / OPLSDA Orthogonal PLS
MBPLS Multi-Block PLS
DiPLS Domain-Invariant PLS
IKPLS Improved Kernel PLS
FCKPLS Fractional Convolution Kernel PLS

Splitting Methods

Splitter Description
KennardStoneSplitter Kennard-Stone algorithm
SPXYSplitter Sample set Partitioning based on X and Y
SPXYFold / SPXYGFold SPXY-based K-Fold cross-validation (with group support)
KMeansSplitter K-means clustering based split
KBinsStratifiedSplitter Binned stratification for continuous targets

See Preprocessing Guide for complete reference.


Examples

The examples/ directory is organized by topic:

User Examples (examples/user/)

Category Examples
Getting Started Hello world, basic regression, classification, visualization
Data Handling Multi-source, data loading, metadata
Preprocessing SNV, MSC, derivatives, custom transforms
Models Multi-model, hyperparameter tuning, stacking, PLS variants
Cross-Validation KFold, group splits, nested CV
Deployment Export, prediction, workspace management
Explainability SHAP basics, sklearn integration, feature selection

Reference Examples (examples/reference/)

Complete syntax reference and advanced pipeline patterns.

Run examples:

cd examples
./run.sh              # Run all
./run.sh -i 1         # Run by index
./run.sh -n "U01*"    # Run by pattern

Documentation

Section Description
User Guide Preprocessing, API migration, augmentation
API Reference Module-level API, sklearn integration, data handling
Specifications Pipeline syntax, config format, metrics
Explanations SHAP, resampling, SNV theory

Full documentation: nirs4all.readthedocs.io


Research Applications

NIRS4ALL has been used in published research:

Houngbo, M. E., et al. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.


Citation

If you use NIRS4ALL in your research, please cite:

@software{beurier2025nirs4all,
  author = {Gregory Beurier and Denis Cornet and Lauriane Rouan},
  title = {NIRS4ALL: Open spectroscopy for everyone},
  url = {https://github.com/GBeurier/nirs4all},
  version = {0.7.1},
  year = {2026},
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under the CeCILL-2.1 License — a French free software license compatible with GPL.


Acknowledgments

  • CIRAD for supporting this research
  • The open-source scientific Python community

Made for the spectroscopy community

About

A library for Near Infrared Sprectroscopy prediction. NIRS ML. Rebuild of the Pinard package.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors