JPMML-SkLearn

Java library and command-line application for converting Scikit-Learn models to PMML.

Features

Supported Estimator and Transformer types:
- Clustering:
  - cluster.KMeans
  - cluster.MiniBatchKMeans
- Matrix Decomposition:
  - decomposition.PCA
  - decomposition.IncrementalPCA
- Discriminant Analysis:
  - discriminant_analysis.LinearDiscriminantAnalysis
- Dummies:
  - dummy.DummyClassifier
  - dummy.DummyRegressor
- Ensemble Methods:
- Feature Extraction:
- Feature Selection:
  - feature_selection.GenericUnivariateSelect (only via sklearn2pmml.SelectorProxy)
  - feature_selection.RFE (only via sklearn2pmml.SelectorProxy)
  - feature_selection.RFECV (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFdr (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFpr (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFromModel (either directly or via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFwe (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectKBest (either directly or via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectPercentile (only via sklearn2pmml.SelectorProxy)
  - feature_selection.VarianceThreshold (only via sklearn2pmml.SelectorProxy)
- Generalized Linear Models:
- Naive Bayes:
  - naive_bayes.GaussianNB
- Nearest Neighbors:
  - neighbors.KNeighborsClassifier
  - neighbors.KNeighborsRegressor
- Pipelines:
  - pipeline.FeatureUnion
  - pipeline.Pipeline
- Neural network models:
  - neural_network.MLPClassifier
  - neural_network.MLPRegressor
- Preprocessing and Normalization:
- Support Vector Machines:
- Decision Trees:
Supported third-party Estimator and Transformer types:
- LightGBM:
  - lightgbm.LGBMClassifier
  - lightgbm.LGBMRegressor
- SkLearn2PMML:
  - sklearn2pmml.EstimatorProxy
  - sklearn2pmml.PMMLPipeline
  - sklearn2pmml.SelectorProxy
  - sklearn2pmml.decoration.CategoricalDomain
  - sklearn2pmml.decoration.ContinuousDomain
  - sklearn2pmml.preprocessing.PMMLLabelBinarizer
  - sklearn2pmml.preprocessing.PMMLLabelEncoder
- Sklearn-Pandas:
  - sklearn_pandas.CategoricalImputer
  - sklearn_pandas.DataFrameMapper
- XGBoost:
  - xgboost.XGBClassifier
  - xgboost.XGBRegressor
Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.

Prerequisites

The Python side of operations

Python 2.7, 3.4 or newer.
scikit-learn 0.16.0 or newer.
sklearn-pandas 0.0.10 or newer.
sklearn2pmml 0.14.0 or newer.

Python installation can be validated as follows:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml

print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

The JPMML-SkLearn side of operations

Java 1.7 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.3-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

Use Python to train a model.
Serialize the model in pickle data format to a file in a local filesystem.
Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Load data to a pandas.DataFrame object:

import pandas

iris_df = pandas.read_csv("Iris.csv")

First, instantiate a sklearn_pandas.DataFrameMapper object, which performs data column-wise feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

iris_mapper = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])

Second, instantiate any number of Transformer and Selector objects, which perform dataset-wise feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris_pca = PCA(n_components = 3)
iris_selector = SelectKBest(k = 2)

Third, instantiate an Estimator object:

from sklearn.tree import DecisionTreeClassifier

iris_classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combine the above objects into a sklearn2pmml.PMMLPipeline object, and run the experiment:

from sklearn2pmml import PMMLPipeline

iris_pipeline = PMMLPipeline([
    ("mapper", iris_mapper),
    ("pca", iris_pca),
    ("selector", iris_selector),
    ("estimator", iris_classifier)
])
iris_pipeline.fit(iris_df, iris_df["Species"])

Store the fitted sklearn2pmml.PMMLPipeline object in pickle data format:

from sklearn.externals import joblib

joblib.dump(iris_pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar target/converter-executable-1.3-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar target/converter-executable-1.3-SNAPSHOT.jar --help

License

JPMML-SkLearn is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.

Additional information

Please contact info@openscoring.io

Name		Name	Last commit message	Last commit date
Latest commit History 350 Commits
src		src
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JPMML-SkLearn

Features

Prerequisites

The Python side of operations

The JPMML-SkLearn side of operations

Installation

Usage

The Python side of operations

The JPMML-SkLearn side of operations

License

Additional information

About

Releases

Packages

Languages

License

mohitbadwal/jpmml-sklearn

Folders and files

Latest commit

History

Repository files navigation

JPMML-SkLearn

Features

Prerequisites

The Python side of operations

The JPMML-SkLearn side of operations

Installation

Usage

The Python side of operations

The JPMML-SkLearn side of operations

License

Additional information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages