Skip to content

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - February 25, 2026

26 Feb 04:46

Choose a tag to compare

AWS Neuron SDK Release Notes - v2.28.0

Release Date: February 25, 2026


Today we are releasing AWS Neuron SDK 2.28.0. This release enhances Neuron Explorer with system profiling, Tensor Viewer, and Database Viewer for comprehensive performance analysis. NxD Inference adds support for Qwen2/Qwen3 VL vision language models, Flux.1 inpainting capabilities, and Eagle3 speculative decoding. The NKI Library expands with 9 new kernels including RoPE, MoE operations, and experimental kernels for attention and cross entropy. NKI (Beta 2) introduces LNC multi-core support with intra-LNC collectives and new APIs. Kubernetes users gain Neuron DRA Driver support for advanced resource allocation.


Developer Tools and Profiling

Neuron Explorer Enhancements — Added system profiling support with drill-down navigation to device profiles. New Tensor Viewer helps identify memory bottlenecks by displaying tensor names, shapes, sizes, and memory usage. Database Viewer provides an interactive interface for querying profiling data using SQL or natural language. Profile Manager now supports tag-based organization and search. A migration guide from Neuron Profiler/Profiler 2.0 is now available.

nccom-test Improvements — Enhanced data integrity checks use pseudo-random data patterns for better corruption detection. Added support for alltoallv collective operation for benchmarking variable-sized all-to-all communication patterns.


Inference Updates

NxD Inference 0.8.16251 — Added support for vision language models including Qwen2 VL (Qwen2-VL-7B-Instruct) and Qwen3 VL (Qwen3-VL-8B-Thinking) for processing text and image inputs (Beta). Pixtral model support improved with batch size 32 and sequence length 10240 on Trn2 with vLLM V1. Flux.1 model gains new functionality for in-paint, out-paint, canny edge detection, and depth-based image generation (Beta).

vLLM Neuron Plugin 0.4.0 — Multi-LoRA serving enhancements enable streaming LoRA adapters via vLLM's load_adapter API with dynamic runtime loading. Users can now run the base model alone when multi-LoRA serving is enabled. Added Eagle3 speculative decoding support for Llama 3.1 8B. Updated to support vLLM v0.13.0 and PyTorch 2.9.


NKI Library

9 New Kernels — The NKI Library expands from 7 to 16 documented kernel APIs. New core kernels include:

  • RoPE — Rotary Position Embedding
  • Router Top-K — Expert selection for MoE
  • MoE CTE — Context Encoding
  • MoE TKG — Token Generation
  • Cumsum — Cumulative sum

New experimental kernels include:

  • Attention Block TKG — Fused attention for token generation
  • Cross Entropy — Forward and backward passes
  • Depthwise Conv1D
  • Blockwise MM Backward — For MoE training

Enhanced Quantization Support — Existing kernels receive FP8 and MX quantization support across QKV, MLP, and Output Projection kernels. QKV kernel adds fused FP8 KV cache quantization and block-based KV cache layout. MLP kernel adds gate/up projection clamping and fp16 support for TKG mode. Attention CTE kernel adds strided Q slicing for context parallelism.

Improved Utilities — TensorView gains rearrange method for dimension reordering and has_dynamic_access for runtime-dependent addressing checks. SbufManager provides hierarchical tree-formatted allocation logging with new query methods for SBUF utilization. New utilities include rmsnorm_mx_quantize_tkg, interleave_copy, LncSubscriptable, and TreeLogger.


Neuron Kernel Interface (NKI)

NKI Beta 2 (0.2.0) — This release includes LNC multi-core support for LNC=2, enabling kernels to leverage multiple NeuronCores within a logical NeuronCore. The compiler now tracks shared_hbm tensors and canonicalizes LNC kernel outputs. Users can declare tensors private to a single NeuronCore using private_hbm memory type.

New nki.collectives Module — Enables collective communication across multiple NeuronCores with operations including:

  • all_reduce
  • all_gather
  • reduce_scatter
  • all_to_all
  • collective_permute variants
  • rank_id

New APIs and Features — New nki.isa APIs include nonzero_with_count for sparse computation and exponential for element-wise operations. New float8_e4m3fn dtype supports FP8 workloads. Language features include no_reorder blocks for instruction ordering control, __call__ special method support, tensor.view method for reshaping, and shared constants as string arguments.

API Improvementsdma_transpose now supports indirect addressing, dma_copy adds the unique_indices parameter, and register_alloc accepts optional tensor arguments for pre-filling. The compiler no longer truncates diagnostic output.


Kubernetes Support

Neuron DRA Driver — Introduced Neuron Dynamic Resource Allocation (DRA) Driver enabling advanced resource allocation using the Kubernetes DRA API for flexible and efficient Neuron device management. The DRA API provides topology-aware scheduling, atomic resource allocation, and per-workload configuration. Neuron Helm Charts now include DRA Driver support.


PyTorch Framework (torch-neuronx)

Transition to Native PyTorch Support — Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 is the last version using PyTorch/XLA. Users will need to update their scripts when upgrading to PyTorch 2.10 or later.


For the full component-level release notes, see the Neuron 2.28.0 Component Release Notes.

Neuron SDK Release - January 14, 2025

15 Jan 18:48

Choose a tag to compare

AWS Neuron SDK Release Notes - v2.27.1

Release Date: January 14, 2026

Release 2.27.1 of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.27.0. See the Neuron SDK v2.27.0 release notes for the full set of changes that shipped with the 2.27.0 release.

What's Changed?

Neuron DLAMIs

  • Support for NKI has been added to all DLAMI virtual environments.

Bug Fixes

NxD Inference

  • Fixed stability issue affecting Llama 4 that may occur when changing model configuration.
  • Removed a debug print statement from the Qwen3-MoE model implementation.

For information about known issues in Neuron DLCs, see the Neuron DLC component release notes.

Neuron SDK Release - December 19, 2025

20 Dec 02:36

Choose a tag to compare

AWS Neuron SDK 2.27.0 Release Notes

This release adds support for Trainium3 (Trn3) instances. Enhanced NKI with new NKI Compiler introduces the nki.* namespace with updated APIs and language constructs. The NKI Library provides pre-optimized kernels for common model operations including attention, MLP, and normalization. Neuron Explorer delivers a unified profiling suite with AI-driven optimization recommendations. vLLM V1 integration is now available through the vLLM-Neuron Plugin. Deep Learning Containers and AMIs are updated with vLLM V1, PyTorch 2.9, JAX 0.7, Ubuntu 24.04, and Python 3.12.

In addition to this release, we are introducing new capabilities and features in private beta access (see Private Beta Access section). We are also announcing our transition to PyTorch native support starting with PyTorch 2.10 in Neuron 2.28, plans to simplify NxDI in upcoming releases, and other important updates. See the End of Support and Migration Notices section for more details.

Neuron Kernel Interface (NKI)

NKI Compiler - The new nki.* namespace replaces the legacy neuronxcc.nki.* namespace. Top-level kernel functions now require the @nki.jit annotation. Neuron 2.27 supports both namespaces side by side; the legacy namespace will be removed in Neuron 2.28. A kernel migration guide is available in the documentation.

NKI Library

The NKI Library provides pre-optimized kernels: Attention CTE, Attention TKG, MLP, Output Projection CTE, Output Projection TKG, QKV, and RMSNorm-Quant. Kernels are accessible via the nkilib.* namespace in neuronx-cc or from the GitHub repository.

Developer Tools

Neuron Explorer - A suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. This release features improved performance and user experience for device profiling, with four core viewers to provide insights into model performance:

  • Hierarchy Viewer: Visualizes model structure and component interactions
  • AI Recommendation Viewer: Delivers AI-driven optimization recommendations
  • Source Code Viewer: Links profiling data directly to source code
  • Summary Viewer: Displays high-level performance metrics

Neuron Explorer is available through UI, CLI, and VSCode IDE integration. Existing NTFF files are compatible but require reprocessing for new features.

New tutorials cover profiling NKI kernels, multi-node training jobs, and vLLM inference workloads. The nccom-test tool now includes fine-grained collective communication support.

Inference Updates

vLLM V1 - The vLLM-Neuron Plugin enables vLLM V1 integration for inference workloads. vLLM V0 support ends in Neuron 2.28.

NxD Inference - Model support expands with beta releases of Qwen3 MoE (Qwen3-235B-A22B) for multilingual text and Pixtral (Pixtral-Large-Instruct-2411) for image understanding. Both models use HuggingFace checkpoints and are supported on Trn2 and Trn3 instances.

Neuron Graph Compiler

Default accuracy settings are now optimized for precision. The --auto-cast flag defaults to none (previously matmul), and --enable-mixed-precision-accumulation is enabled by default. FP32 models may see performance impacts; restore previous behavior with --auto-cast=matmul and --disable-mixed-precision-accumulation. Python 3.10 or higher is now required.[]

Runtime Improvements

Neuron Runtime Library 2.29 adds support for Trainium3 (Trn3) instances and delivers performance improvements for Collectives Engine overhead, NeuronCore branch overhead, NEFF program startup, and all-gather latency.

Deep Learning AMIs and Containers

Platform Updates - All DLCs are updated to Ubuntu 24.04 and Python 3.12. DLAMIs add Ubuntu 24.04 support for base, single framework, and multi-framework configurations.

Framework Updates:

  • vLLM V1 single framework DLAMI and multi-framework virtual environments
  • PyTorch 2.9 single framework DLAMIs and multi-framework virtual environments (Amazon Linux 2023, Ubuntu 22.04, Ubuntu 24.04)
  • JAX 0.7 single framework DLAMI and multi-framework virtual environments

New Container - The pytorch-inference-vllm-neuronx 0.11.0 DLC provides a complete vLLM inference environment with PyTorch 2.8 and all dependencies.


Read What's New: Neuron 2.27.0 and Neuron 2.27.0 component release notes for specific Neuron component improvements and details.

Neuron SDK Release - October 29, 2025

29 Oct 23:21
1043931

Choose a tag to compare

Overview
Release 2.26.1 of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.26.0. See the Neuron SDK v2.26.0 release notes for the full set of changes that shipped with the 2.26.0 release.

Bug fixes in this release
Fix: To address an issue with out-of-memory errors in torch-neuronx, this release enables you to use the Neuron Runtime API to apply direct memory allocation.

Resources
For the set of SDK package version changes in 2.26.1, see Release Content.

Neuron SDK Release - September 18, 2025

19 Sep 22:33
49ee15c

Choose a tag to compare

AWS Neuron SDK 2.26.0 adds support for PyTorch 2.8, JAX 0.6.2, along with support for Python 3.11, and introduces inference improvements on Trainium2 (Trn2). This release includes expanded model support, enhanced parallelism features, new Neuron Kernel Interface (NKI) APIs, and improved development tools for optimization and profiling.

Inference Updates
NxD Inference - Model support expands with beta releases of Llama 4 Scout and Maverick variants on Trn2. The FLUX.1-dev image generation models are now available in beta on Trn2 instances.

Expert parallelism is now supported in beta, enabling MoE expert distribution across multiple NeuronCores. This release introduces on-device forward pipeline execution in beta and adds sequence parallelism in MoE routers for model deployment flexibility.

Neural Kernel Interface (NKI)
New APIs enable additional optimization capabilities:
- gelu_apprx_sigmoid: GELU activation with sigmoid approximation
- select_reduce: Selective element copying with maximum reduction

  • sequence_bounds: Sequence bounds computation

API enhancements include:

  • tile_size: Added total_available_sbuf_size field
  • dma_transpose: Added axes parameter for 4D transpose.
  • activation: Added gelu_apprx_sigmoid operation

Developer Tools
Neuron Profiler improvements include the ability to select multiple semaphores at once to correlate pending activity with semaphore waits and increments. Additionally, system profile grouping now uses a global NeuronCore ID instead of a process local ID for visibility across distributed workloads. The Profiler also adds warnings for dropped events due to limited buffer space.

The ncom-test utility adds State Buffer support on Trn2 for collective operations, including all-reduce, all-gather, and reduce-scatter operations. Error reporting provides messages for invalid all-to-all collective sizes to help developers identify and resolve issues.

Deep Learning AMI and Containers
The Deep Learning AMI now supports PyTorch 2.8 on Amazon Linux 2023 and Ubuntu 22.04. Container updates include PyTorch 2.8.0 and Python 3.11 across all DLCs. The transformers-neuronx environment and package have been removed from PyTorch inference DLAMI/DLC.

Component release highlights
These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.

Neuron SDK Release - July 31, 2025

01 Aug 00:21
56af36c

Choose a tag to compare

Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.

Inference Optimizations (NxD Core and NxDI)
Neuron 2.25.0 introduces performance optimizations and new capabilities including:

  • On-device Forward Pipeline, reducing latency by up to 43% in models like Pixtral
  • Context and Data Parallel support for improved batch scaling
  • Chunked Attention for efficient long sequence processing
  • 128k context length support for Llama 70B models
  • Automatic Aliasing (Beta) for faster tensor operations
  • Disaggregated Serving (Beta) showing 20% improvement in ITL/TTST

Model Support (NxDI)
Neuron 2.25.0 expands model support to include:

  • Qwen3 dense models (0.6B to 32B parameters)
  • Flux.1-dev model for text-to-image generation (Beta)
  • Pixtral-Large-Instruct-2411 for image-to-text generation (Beta)

Profiling Updates
Enhancements to profiling capabilities include:

  • Addition of timestamp sync points to align device execution with CPU events
  • Expanded JSON output providing the same detailed data set used by the Neuron Profiler UI
  • New total active time metric showing accelerator utilization as percentage of total runtime
  • Fixed DMA active time calculation for more accurate measurements

Monitoring and Observability

  • neuron-ls now displays CPU and NUMA node affinity information
  • neuron-ls adds NeuronCore IDs display for each Neuron Device
  • neuron-monitor improves accuracy of device utilization metrics

Framework Updates

  • JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5
  • vLLM support upgraded to version 0.9.x V0

Development Environment Updates
Neuron SDK updated to version 2.25.0 in:

  • Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023
  • Multi-framework DLAMI with environments for both PyTorch and JAX
  • PyTorch 2.7 Single Framework DLAMI
  • JAX 0.6 Single Framework DLAMI

Container Support
Neuron SDK updated to version 2.25.0 in:

  • PyTorch 2.7 Training and Inference DLCs
  • JAX 0.6 Training DLC
  • vLLM 0.9.1 Inference DLC
  • Neuron Device Plugin and Scheduler container images for Kubernetes integration

Neuron SDK Release - June 24, 2025

25 Jun 02:54
90a7fe4

Choose a tag to compare

Neuron version 2.24 introduces new inference capabilities including prefix caching, disaggregated inference (Beta), and context parallelization support (Beta). This release also includes NKI language enhancements and enhanced profiling visualizations for improved debugging and performance analysis. Neuron 2.24 adds support for PyTorch 2.7 and JAX 0.6, updates existing DLAMIs and DLCs, and introduces a new vLLM inference container.

Neuron SDK Release - May 20, 2025

20 May 19:03
4eba310

Choose a tag to compare

With the Neuron 2.23 release, we move NxD Inference (NxDI) library out of beta. It is now recommended for all multi-chip inference use-cases. In addition, Neuron has new training capabilities, including Context Parallelism and ORPO, NKI improvements (new operators and ISA features), and new Neuron Profiler debugging and performance analysis optimizations. Finaly, Neuron now supports PyTorch 2.6 and JAX 0.5.3.

Inference: NxD Inference (NxDI) moves from beta to GA. NxDI now supports Persistent Cache to reduce compilation times, and optimizes model loading with improved weight sharding performance.

Training: NxD Training (NxDT) added Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. NxDT now supports model alignment, ORPO, using DPO-style datasets. NxDT has upgraded supports for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.

Neuron Kernel Interface (NKI): New support for 32-bit integer nki.language.add and nki.language.multiply on GPSIMD Engine. NKI.ISA improvements include range_select for Trainium2, fine-grained engine control, and enhanced tensor operations. New performance tuning API no_reorder has been added to enable user-scheduling of instructions. When combined with allocation, this enables software pipelining. Language consistency has been improved for arithmetic operators (+=, -=, /=, *=) across loop types, PSUM, and SBUF.

Neuron Profiler: Profiling performance has improved, allowing users to view profile results 5x times faster on average. New features include timeline-based error tracking and JSON error event reporting, supporting execution and OOB error detection. Additionally, this release improves multiprocess visualization with Perfetto.

Neuron Monitoring: Added Kubernetes context information (pod_name, namespace, and container_name) to neuron monitor prometheus output, enabling resource utilization tracking by pod, namespace, and container.

Neuron DLCs: This release updates containers with PyTorch 2.6 support for inference and training. For JAX DLC, this release adds JAX 0.5.0 training support.

Neuron DLAMIs: This release updates MultiFramework AMIs to include PyTorch 2.6, JAX 0.5, and TensorFlow 2.10 and Single Framework AMIs for PyTorch 2.6 and JAX 0.5.

Neuron SDK Release - May 12, 2025

12 May 23:10
c3ac28c

Choose a tag to compare

Neuron 2.22.1 release includes a Neuron Driver update that resolves DMA abort errors on Trainium2 devices. These errors were previously occurring in the Neuron Runtime during specific workload executions.

Neuron SDK Release - April 3, 2025

04 Apr 05:52
9f6387f

Choose a tag to compare

The Neuron 2.22 release includes performance optimizations, enhancements and new capabilities across the Neuron software stack.

For inference workloads, the NxD Inference library now supports Llama-3.2-11B model and supports multi-LoRA serving, allowing customers to load and serve multiple LoRA adapters. Flexible quantization features have been added, enabling users to specify which model layers or NxDI modules to quantize. Asynchronous inference mode has also been introduced, improving performance by overlapping Input preparation with model execution.

For training, we added LoRA supervised fine-tuning to NxD Training to enable additional model customization and adaptation.

Neuron Kernel Interface (NKI): This release adds new APIs in nki.isa, nki.language, and nki.profile. These enhancements provide customers with greater flexibility and control.

The updated Neuron Runtime includes optimizations for reduced latency and improved device memory footprint. On the tooling side, the Neuron Profiler 2.0 (beta) has added UI enhancements and new event type support.

Neuron DLCs: this release reduces DLC image size by up to 50% and enables faster build times with updated Dockerfiles structure. On the Neuron DLAMI side, new PyTorch 2.5 single framework DLAMIs have been added for Ubuntu 22.04 and Amazon Linux 2023, along with several new virtual environments within the Neuron Multi Framework DLAMIs.