Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 1.38 KB

README.md

File metadata and controls

28 lines (17 loc) · 1.38 KB

LLMs

The Ryzen AI Software includes support for deploying quantized LLMs on the NPU using an eager execution mode, simplifying the model ingestion process. Instead of compiling and executing as a complete graph, the model is processed on an operator-by-operator basis. Compute-intensive operations, such as GEMM/MATMUL, are dynamically offloaded to the NPU, while the remaining operators are executed on the CPU. Eager mode for LLMs is supported in both PyTorch and the ONNX Runtime.

A general-purpose flow can be found here: LLMs on RyzenAI with Pytorch

  • Applicability: prototyping and early development with a broad set of LLMs
  • Performance: functional support only, not to be used for benchmarking
  • Supported platforms: PHX, HPT, STX (and onwards)
  • Supported frameworks: Pytorch
  • Supported models: Many

A set of performance-optimized models is available upon request on the AMD secure download site: Optimized LLMs on RyzenAI

  • Applicability: benchmarking and deployment of specific LLMs
  • Performance: highly optimized
  • Supported platforms: STX (and onwards)
  • Supported frameworks: Pytorch and ONNX Runtime
  • Supported models: Llama2, Llama3, Qwen1.5

This is an early access flow, and expected to be upgraded in upcoming release.

Run LLMs