The FIL backend is designed to accelerate inference for tree-based models. If the model you are trying to deploy is not tree-based, consider using one of Triton's other backends.
The FIL backend supports most XGBoost and LightGBM models using their native serialization formats. The FIL backend also supports the following model types from Scikit-Learn and cuML using Treelite's checkpoint serialization format:
- GradientBoostingClassifier
- GradientBoostingRegressor
- IsolationForest
- RandomForestRegressor
- ExtraTreesClassifier
- ExtraTreesRegressor
In addition, the FIL backend can perform inference on tree models from any framework if they are first exported to Treelite's checkpoint serialization format.
The FIL backend currently supports the following serialization formats:
- XGBoost JSON (Version < 1.7)
- XGBoost Binary
- LightGBM Text
- Treelite binary checkpoint
The FIL backend does not support direct ingestion of Pickle files. The pickled model must be converted to one of the above formats before it can be used in Triton.
Until version 3.0 of Treelite, Treelite offered no backward compatibility for its checkpoint format even among minor releases. Therefore, the version of Treelite used to save a checkpoint had to exactly match the version used in the FIL backend. Starting with version 3.0, Treelite supports checkpoint output from any version of Treelite starting with 2.7 until the next major release.
XGBoost's JSON format also changes periodically between minor versions, and older versions of Treelite used in the FIL backend may not support those changes.
The compatibility matrix for Treelite and XGBoost with the FIL backend is shown below:
Triton Version | Supported Treelite Version(s) | Supported XGBoost JSON Version(s) |
---|---|---|
21.08 | 1.3.0 | <1.6 |
21.09-21.10 | 2.0.0 | <1.6 |
21.11-22.02 | 2.1.0 | <1.6 |
22.03-22.06 | 2.3.0 | <1.6 |
22.07 | 2.4.0 | <1.7 |
22.08-24.02 | 2.4.0; >=3.0.0,<4.0.0 | <1.7 |
24.03+ | 3.9.0; >=4.0.0,<5.0.0 | 1.7+ |
The FIL backend currently does not support any multi-output regression models.
While the FIL backend can load double-precision models, it performs all computations in single-precision mode. This can lead to slight differences in model output for frameworks like LightGBM which natively use double precision. Support for double-precision execution is planned for an upcoming release.
As of version 21.11, the FIL backend includes support for models with categorical features (e.g. some XGBoost and LightGBM ) models. These models can be deployed just like any other model, but it is worth remembering that (as with any other inference pipeline which includes categorical features), care must be taken to ensure that the categorical encoding used during inference matches that used during training. If the data passed through at inference time does not contain all of the categories used during training, there is no way to reconstruct the correct mapping of features, so some record must be made of the complete set of categories used during training. With that record, categorical columns can be appropriately converted to float32 columns, and submitted to Triton as with any other input.
For a fully-worked example of using a model with categorical features, check out the introductory fraud detection notebook.