Welcome to TensorRT-LLM's documentation!

.. toctree::
   :maxdepth: 1
   :caption: Contents:

   architecture.md
   gpt_runtime.md
   batch_manager.md
   inference_request.md
   gpt_attention.md
   precision.md
   build_from_source.md
   performance.md
   2023-05-19-how-to-debug.md
   2023-05-17-how-to-add-a-new-model.md
   graph-rewriting.md
   memory.md
   workflow.md
   checkpoint.md
   lora.md
   perf_best_practices.md
   performance_analysis.md

Python API

:doc:`tensorrt_llm.layers <python-api/tensorrt_llm.layers>`
:doc:`tensorrt_llm.functional <python-api/tensorrt_llm.functional>`
:doc:`tensorrt_llm.models <python-api/tensorrt_llm.models>`
:doc:`tensorrt_llm.plugin <python-api/tensorrt_llm.plugin>`
:doc:`tensorrt_llm.quantization <python-api/tensorrt_llm.quantization>`
:doc:`tensorrt_llm.runtime <python-api/tensorrt_llm.runtime>`

.. toctree::
   :maxdepth: 2
   :caption: Python API
   :hidden:

   python-api/tensorrt_llm.layers
   python-api/tensorrt_llm.functional
   python-api/tensorrt_llm.models
   python-api/tensorrt_llm.plugin
   python-api/tensorrt_llm.quantization
   python-api/tensorrt_llm.runtime

C++ API

:doc:`cpp/runtime <_cpp_gen/runtime>`

.. toctree::
   :maxdepth: 2
   :caption: C++ API
   :hidden:

   _cpp_gen/runtime

Indices and tables

:ref:`genindex`
:ref:`modindex`
:ref:`search`

Blogs

.. toctree::
   :maxdepth: 2
   :caption: Blogs
   :hidden:

   blogs/H100vsA100.md
   blogs/H200launch.md
   blogs/Falcon180B-H200.md
   blogs/quantization-in-TRT-LLM.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

Welcome to TensorRT-LLM's documentation!

Python API

C++ API

Indices and tables

Blogs

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

Welcome to TensorRT-LLM's documentation!

Python API

C++ API

Indices and tables

Blogs