sklearn-pmml-model

A library to effortlessly import models trained on different platforms and with programming languages into scikit-learn in Python. First export your model to PMML (widely supported). Next, load the exported PMML file with this library, and use the class as any other scikit-learn estimator.

Installation

The easiest way is to use pip:

$ pip install sklearn-pmml-model

Status

The library currently supports the following models:

Model	Classification	Regression	Categorical features
Decision Trees	✅	✅	✅¹
Random Forests	✅	✅	✅¹
Gradient Boosting	✅	✅	✅¹
Linear Regression	✅	✅	✅³
Ridge	✅²	✅	✅³
Lasso	✅²	✅	✅³
ElasticNet	✅²	✅	✅³
Gaussian Naive Bayes	✅		✅³
Support Vector Machines	✅	✅	✅³
Nearest Neighbors	✅	✅
Neural Networks	✅	✅

_{¹ Categorical feature support using slightly modified internals, based on scikit-learn#12866.}

_{² These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using PMMLLogisticRegression for regression models and PMMLRidgeClassifier for general regression models.}

_{³ By one-hot encoding categorical features automatically.}

Example

A minimal working example (using this PMML file) is shown below:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.ensemble import PMMLForestClassifier
from sklearn_pmml_model.auto_detect import auto_detect_estimator

# Prepare the data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)

# Specify the model type for the least overhead...
#clf = PMMLForestClassifier(pmml="models/randomForest.pmml")

# ...or simply let the library auto-detect the model type
clf = auto_detect_estimator(pmml="models/randomForest.pmml")

# Use the model as any other scikit-learn model
clf.predict(Xte)
clf.score(Xte, yte)

More examples can be found in the subsequent packages: tree, ensemble, linear_model, naive_bayes, svm, neighbors and neural_network.

Benchmark

Depending on the data set and model, sklearn-pmml-model is between 1 and 10 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of sklearn. Source code for this benchmark can be found in the corresponding jupyter notebook.

Running times (load + predict, in seconds)

		Linear model	Naive Bayes	Decision tree	Random Forest	Gradient boosting
Wine	`PyPMML`	0.013038	0.005674	0.005587	0.032734	0.034649
	`sklearn-pmml-model`	0.00404	0.004059	0.000964	0.030008	0.032949
Breast cancer	`PyPMML`	0.009838	0.01153	0.009367	0.058941	0.031196
	`sklearn-pmml-model`	0.010749	0.008481	0.001106	0.044021	0.013411

Improvement

		Linear model	Naive Bayes	Decision tree	Random Forest	Gradient boosting
Wine	Improvement	3.23×	1.40×	5.80×	1.09×	1.05×
Breast cancer	Improvement	0.91×	1.36×	8.47×	1.34×	2.33×

Benchmark ran on: 24 september 2024 17:19

Development

Prerequisites

Tests can be run using Py.test. Grab a local copy of the source:

$ git clone http://github.com/iamDecode/sklearn-pmml-model
$ cd sklearn-pmml-model

create a virtual environment and activating it:

$ python3 -m venv venv
$ source venv/bin/activate

and install the dependencies:

$ pip install -r requirements.txt

The final step is to build the Cython extensions:

$ python setup.py build_ext --inplace

Testing

You can execute tests with py.test by running:

$ python setup.py pytest

Contributing

Feel free to make a contribution. Please read CONTRIBUTING.md for more details.

License

This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

sklearn-pmml-model

Installation

Status

Example

Benchmark

Running times (load + predict, in seconds)

Improvement

Development

Prerequisites

Testing

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

sklearn-pmml-model

Installation

Status

Example

Benchmark

Running times (load + predict, in seconds)

Improvement

Development

Prerequisites

Testing

Contributing

License