For users experiencing the "Tensor in" & "Tensor out" approach to Deep Learning Inference, getting started with Triton can lead to many questions. The goal of this repository is to familiarize users with Triton's features and provide guides and examples to ease migration. For a feature by feature explanation, refer to the Triton Inference Server documentation.
Overview Video | Conceptual Guide: Deploying Models |
---|
The focus of these examples is to demonstrate deployment for models trained with various frameworks. These are quick demonstrations made with an understanding that the user is somewhat familiar with Triton.
PyTorch Model | TensorFlow Model | ONNX Model | TensorRT Accelerated Model | vLLM Model | OpenVINO Model |
---|
The table below contains some popular models that are supported in our tutorials
Example Models | Tutorial Link |
---|---|
Llama-2-7B | TensorRT-LLM Tutorial |
Persimmon-8B | HuggingFace Transformers Tutorial |
Falcon-7B | HuggingFace Transformers Tutorial |
LLaVA-v1.5-7B | TensorRT-LLM Tutorial |
Note: This is not an exhausitive list of what Triton supports, just what is included in the tutorials.
This repository contains the following resources:
- Conceptual Guide: This guide focuses on building a conceptual understanding of the general challenges faced whilst building inference infrastructure and how to best tackle these challenges with Triton Inference Server.
- Quick Deploy: These are a set of guides about deploying a model from your preferred framework to the Triton Inference Server. These guides assume a basic understanding of the Triton Inference Server. It is recommended to review the getting started material for a complete understanding.
- HuggingFace Guide: The focus of this guide is to walk the user through different methods in which a HuggingFace model can be deployed using the Triton Inference Server.
- Feature Guides: This folder is meant to house Triton's feature-specific examples.
- Migration Guide: Migrating from an existing solution to Triton Inference Server? Get an understanding of the general architecture that might best fit your use case.
- Agentic Workflow Guide: This guide provides a set of tutorials designed to help you deploy AI agents efficiently using the Triton Inference Server.
The Triton Inference Server GitHub organization contains multiple repositories housing different features of the Triton Inference Server. The following is not a complete description of all the repositories, but just a simple guide to build intuitive understanding.
- Server is the main Triton Inference Server Repository.
- Client contains the libraries and examples needed to create Triton Clients
- Backend contains the core scripts and utilities to build a new Triton Backend. Any repository containing the word "backend" is either a framework backend or an example for how to create a backend.
- Tools like Model Analyzer and Model Navigator provide the tooling to either measure performance, or to simplify model acceleration.
Open an issue and specify details for adding a request for an example. Want to make a contribution? Open a pull request and tag an Admin.