Java library and command-line application for converting Scikit-Learn models to PMML.
- Supported Estimator and Transformer types:
- Clustering:
- Matrix Decomposition:
- Discriminant Analysis:
- Dummies:
- Ensemble Methods:
ensemble.AdaBoostRegressor
ensemble.BaggingClassifier
ensemble.BaggingRegressor
ensemble.ExtraTreesClassifier
ensemble.ExtraTreesRegressor
ensemble.GradientBoostingClassifier
ensemble.GradientBoostingRegressor
ensemble.IsolationForest
ensemble.RandomForestClassifier
ensemble.RandomForestRegressor
ensemble.VotingClassifier
- Feature Extraction:
- Feature Selection:
feature_selection.GenericUnivariateSelect
(only viasklearn2pmml.SelectorProxy
)feature_selection.RFE
(only viasklearn2pmml.SelectorProxy
)feature_selection.RFECV
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFdr
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFpr
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFromModel
(either directly or viasklearn2pmml.SelectorProxy
)feature_selection.SelectFwe
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectKBest
(either directly or viasklearn2pmml.SelectorProxy
)feature_selection.SelectPercentile
(only viasklearn2pmml.SelectorProxy
)feature_selection.VarianceThreshold
(only viasklearn2pmml.SelectorProxy
)
- Generalized Linear Models:
linear_model.ElasticNet
linear_model.ElasticNetCV
linear_model.Lasso
linear_model.LassoCV
linear_model.LinearRegression
linear_model.LogisticRegression
linear_model.LogisticRegressionCV
linear_model.Ridge
linear_model.RidgeCV
linear_model.RidgeClassifier
linear_model.RidgeClassifierCV
linear_model.SGDClassifier
linear_model.SGDRegressor
- Naive Bayes:
- Nearest Neighbors:
- Pipelines:
- Neural network models:
- Preprocessing and Normalization:
preprocessing.Binarizer
preprocessing.FunctionTransformer
preprocessing.Imputer
preprocessing.LabelBinarizer
preprocessing.LabelEncoder
preprocessing.MaxAbsScaler
preprocessing.MinMaxScaler
preprocessing.OneHotEncoder
preprocessing.PolynomialFeatures
preprocessing.RobustScaler
preprocessing.StandardScaler
- Support Vector Machines:
- Decision Trees:
- Supported third-party Estimator and Transformer types:
- LightGBM:
lightgbm.LGBMClassifier
lightgbm.LGBMRegressor
- SkLearn2PMML:
sklearn2pmml.EstimatorProxy
sklearn2pmml.PMMLPipeline
sklearn2pmml.SelectorProxy
sklearn2pmml.decoration.CategoricalDomain
sklearn2pmml.decoration.ContinuousDomain
sklearn2pmml.preprocessing.PMMLLabelBinarizer
sklearn2pmml.preprocessing.PMMLLabelEncoder
- Sklearn-Pandas:
sklearn_pandas.CategoricalImputer
sklearn_pandas.DataFrameMapper
- XGBoost:
- LightGBM:
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
- Python 2.7, 3.4 or newer.
scikit-learn
0.16.0 or newer.sklearn-pandas
0.0.10 or newer.sklearn2pmml
0.14.0 or newer.
Python installation can be validated as follows:
import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml
print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)
- Java 1.7 or newer.
Enter the project root directory and build using Apache Maven:
mvn clean install
The build produces an executable uber-JAR file target/converter-executable-1.3-SNAPSHOT.jar
.
A typical workflow can be summarized as follows:
- Use Python to train a model.
- Serialize the model in
pickle
data format to a file in a local filesystem. - Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.
Load data to a pandas.DataFrame
object:
import pandas
iris_df = pandas.read_csv("Iris.csv")
First, instantiate a sklearn_pandas.DataFrameMapper
object, which performs data column-wise feature engineering and selection work:
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain
iris_mapper = DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])
Second, instantiate any number of Transformer
and Selector
objects, which perform dataset-wise feature engineering and selection work:
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris_pca = PCA(n_components = 3)
iris_selector = SelectKBest(k = 2)
Third, instantiate an Estimator
object:
from sklearn.tree import DecisionTreeClassifier
iris_classifier = DecisionTreeClassifier(min_samples_leaf = 5)
Combine the above objects into a sklearn2pmml.PMMLPipeline
object, and run the experiment:
from sklearn2pmml import PMMLPipeline
iris_pipeline = PMMLPipeline([
("mapper", iris_mapper),
("pca", iris_pca),
("selector", iris_selector),
("estimator", iris_classifier)
])
iris_pipeline.fit(iris_df, iris_df["Species"])
Store the fitted sklearn2pmml.PMMLPipeline
object in pickle
data format:
from sklearn.externals import joblib
joblib.dump(iris_pipeline, "pipeline.pkl.z", compress = 9)
Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.
Converting the pipeline pickle file pipeline.pkl.z
to a PMML file pipeline.pmml
:
java -jar target/converter-executable-1.3-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml
Getting help:
java -jar target/converter-executable-1.3-SNAPSHOT.jar --help
JPMML-SkLearn is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.
Please contact info@openscoring.io