.. toctree:: :maxdepth: 1 :caption: Contents: architecture.md gpt_runtime.md batch_manager.md inference_request.md gpt_attention.md precision.md build_from_source.md performance.md 2023-05-19-how-to-debug.md 2023-05-17-how-to-add-a-new-model.md graph-rewriting.md memory.md workflow.md checkpoint.md lora.md perf_best_practices.md performance_analysis.md
- :doc:`tensorrt_llm.layers <python-api/tensorrt_llm.layers>`
- :doc:`tensorrt_llm.functional <python-api/tensorrt_llm.functional>`
- :doc:`tensorrt_llm.models <python-api/tensorrt_llm.models>`
- :doc:`tensorrt_llm.plugin <python-api/tensorrt_llm.plugin>`
- :doc:`tensorrt_llm.quantization <python-api/tensorrt_llm.quantization>`
- :doc:`tensorrt_llm.runtime <python-api/tensorrt_llm.runtime>`
.. toctree:: :maxdepth: 2 :caption: Python API :hidden: python-api/tensorrt_llm.layers python-api/tensorrt_llm.functional python-api/tensorrt_llm.models python-api/tensorrt_llm.plugin python-api/tensorrt_llm.quantization python-api/tensorrt_llm.runtime
.. toctree:: :maxdepth: 2 :caption: C++ API :hidden: _cpp_gen/runtime
.. toctree:: :maxdepth: 2 :caption: Blogs :hidden: blogs/H100vsA100.md blogs/H200launch.md blogs/Falcon180B-H200.md blogs/quantization-in-TRT-LLM.md