TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control
TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torchscripted models.
- Serving Quick Start - Basic server usage tutorial
- Model Archive Quick Start - Tutorial that shows you how to package a model archive file.
- Installation - Installation procedures
- Model loading - How to load a model in TorchServe?
- Serving Models - Explains how to use TorchServe
- REST API - Specification on the API endpoint for TorchServe
- gRPC API - TorchServe supports gRPC APIs for both inference and management calls
- Packaging Model Archive - Explains how to package model archive file, use
model-archiver
. - Inference API - How to check for the health of a deployed model and get inferences
- Management API - How to manage and scale models
- Logging - How to configure logging
- Metrics - How to configure metrics
- Prometheus and Grafana metrics - How to configure metrics API with Prometheus formatted metrics in a Grafana dashboard
- Captum Explanations - Built in support for Captum explanations for both text and images
- Batch inference with TorchServe - How to create and serve a model with batch inference in TorchServe
- Workflows - How to create workflows to compose Pytorch models and Python functions in sequential and parallel pipelines
- Image Classifier - This handler takes an image and returns the name of object in that image
- Text Classifier - This handler takes a text (string) as input and returns the classification text based on the model vocabulary
- Object Detector - This handler takes an image and returns list of detected classes and bounding boxes respectively
- Image Segmenter- This handler takes an image and returns output shape as [CL H W], CL - number of classes, H - height and W - width
- Deploying LLMs - How to easily deploy LLMs using TorchServe
- HuggingFace Language Model - This handler takes an input sentence and can return sequence classifications, token classifications or Q&A answers
- Multi Modal Framework - Build and deploy a classifier that combines text, audio and video input data
- Dual Translation Workflow -
- Model Zoo - List of pre-trained model archives ready to be served for inference with TorchServe.
- Examples - Many examples of how to package and deploy models with TorchServe
- Workflow Examples - Examples of how to compose models in a workflow with TorchServe
- Resnet50 HPU compile - An example of how to run the model in compile mode with the HPU device
- Advanced configuration - Describes advanced TorchServe configurations.
- A/B test models - A/B test your models for regressions before shipping them to production
- Custom Service - Describes how to develop custom inference services.
- Encrypted model serving - S3 server side model encryption via KMS
- Snapshot serialization - Serialize model artifacts to AWS Dynamo DB
- Benchmarking and Profiling - Use JMeter or Apache Bench to benchmark your models and TorchServe itself
- TorchServe on Kubernetes - Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service
- mlflow-torchserve - Deploy mlflow pipeline models into TorchServe
- Kubeflow pipelines - Kubeflow pipelines and Google Vertex AI Managed pipelines
- NVIDIA MPS - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU