MLGuard is a lightweight machine learning experiment management and data validation toolkit designed to help data scientists quickly validate datasets, detect potential data leakage, and benchmark multiple machine learning models.
It provides a simple interface to run experiments and compare models with minimal setup.
When working with machine learning pipelines, developers often face problems such as:
- Undetected data leakage
- Repeated model experimentation scripts
- Difficulty comparing models quickly
- Poor experiment organization
MLGuard helps solve these problems by providing:
- Automated data leakage detection
- Automatic problem type detection
- Built-in experiment manager
- Simple model comparison tools
Detects potential target leakage by analyzing correlations between features and the target variable.
Runs multiple machine learning models automatically and evaluates their performance.
Compares model performance using appropriate evaluation metrics.
Detects whether the task is:
- Regression
- Classification
Install from PyPI:
pip install mlguardlabsInstall locally for development:
pip install -e .from sklearn.datasets import fetch_california_housing
from mlguard import ExperimentManager
data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target
exp = ExperimentManager()
exp.fit(X, y)
print(exp.compare())Example output:
Running leakage detection...
No leakage detected
Detected problem type: regression
[('rf', 0.50), ('ridge', 0.74)]
from sklearn.datasets import load_iris
from mlguard import ExperimentManager
data = load_iris(as_frame=True)
X = data.data
y = data.target
exp = ExperimentManager()
exp.fit(X, y)
print(exp.compare())from mlguard import LeakageDetector
detector = LeakageDetector()
detector.fit(X, y)
print(detector.report())This prints features that have unusually high correlation with the target.
MLGuard runs a lightweight validation and experimentation pipeline:
Dataset
↓
Leakage Detection
↓
Problem Type Detection
↓
Model Experiments
↓
Model Evaluation
↓
Model Comparison
This allows users to quickly answer:
Which model works best for my dataset?
mlguard/
│
├── experiments
│ └── experiment_manager.py
│
├── inspection
│ └── leakage_detector.py
│
├── metrics
│ ├── regression_metrics.py
│ └── classification_metrics.py
│
├── utils
│ └── problem_type.py
│
├── examples
├── tests
Typical machine learning workflow using MLGuard:
EDA
↓
Feature Engineering
↓
Feature Selection
↓
MLGuard Experiment Manager
↓
Best Model Selection
↓
Hyperparameter Tuning
↓
Final Model
MLGuard helps simplify the experimentation stage.
Future planned features include:
- Cross-validation experiment engine
- Dataset inspector
- Automatic preprocessing pipelines
- Feature validation tools
- Experiment tracking
Contributions are welcome!
Steps:
- Fork the repository
- Create a feature branch
- Submit a pull request
MIT License © 2026 Tarun M