Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
-
Updated
Feb 19, 2025 - Python
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Add bisenetv2. My implementation of BiSeNet
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
OpenAI compatible API for TensorRT LLM triton backend
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
Serving Inside Pytorch
ClearML - Model-Serving Orchestration and Repository Solution
The Triton backend for the ONNX Runtime.
Deploy stable diffusion model with onnx/tenorrt + tritonserver
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Deploy DL/ ML inference pipelines with minimal extra code.
Анализ трафика на круговом движении с использованием компьютерного зрения
Compare multiple optimization methods on triton to imporve model service performance
Diffusion Model for Voice Conversion
Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.
Tiny configuration for Triton Inference Server
Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.
Provides an ensemble model to deploy a YoloV8 ONNX model to Triton
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Add a description, image, and links to the triton-inference-server topic page so that developers can more easily learn about it.
To associate your repository with the triton-inference-server topic, visit your repo's landing page and select "manage topics."