See Vitis™ AI Development Environment on amd.com |
Version: Vitis 2025.2
- Porting of Channelizer to Versal AI Edge Series Gen 2 (AIE-ML v2) leveraging Vitis Libraries
The purpose of this tutorial is to demonstrate how to port an AIE-ML reference design to AIE-ML v2 with minimal code changes. You do this by leveraging Vitis Libraries which contain templatized DSP IPs that allow you to target all AI Engine variants.
The tutorial targets the same System Requirements as the original tutorial, and highlights the changes required to build this design.
The following table summarizes the key architectural differences between the two platforms. These differences influence the porting decisions and design optimizations described throughout this tutorial.
| Aspect | Original Tutorial (AIE-ML) | This Tutorial (AIE-ML v2) |
|---|---|---|
| Device | VE2802 (Versal AI Edge) | VE3858 (Versal AI Edge Series Gen 2) |
| Evaluation Board | VEK280 | VEK385 |
| Part Number | xcve2802-vsvh1760-2MP-e-S | xc2ve3858-ssva2112-2MP-e-S |
| Base Platform | vek280_base | vek385_base_reva |
| AI Engine Architecture | AIE-ML | AIE-ML v2 |
| AIE Array Dimensions | 38 columns × 8 rows | 36 columns × 4 rows |
| Boot Framework | Traditional boot flow | EDF with Segmented Configuration |
Note: To reproduce any of the following steps, begin by cloning Vitis_Libraries and setting the DSPLIB_ROOT path to point to the <cloned_repo_path>/dsp.
Porting the design from AIE-ML (VEK280) to AIE-ML v2 (VEK385) required the following changes. Despite the architectural differences, using Vitis Libraries minimized code changes, demonstrating the portability benefits of parameterized IP.
- Device Part Number**: Changed
--partin aie/channelizer/Makefile fromxcve2802-vsvh1760-2MP-e-S(VEK280) toxc2ve3858-ssva2112-2MP-e-S(VEK385). - Platform**: Changed
platformin Top Makefile fromvek280_basetovek385_base_reva.
- Placement Constraints: Modified placement in aie/channelizer/channelizer_app.cpp to account for VEK385's reduced array height.
- Original (VEK280): 38 columns × 8 rows
- Ported (VEK385): 36 columns × 4 rows
- This required rethinking tile placement while maintaining data flow connectivity.
- Embedded Development Framework (EDF): For Versal AI Edge Series Gen 2, AMD tools by default use Segmented Configuration and AMD Embedded Development Framework (EDF).
- Segmented Configuration: Enables processor boot and DDR memory access before programmable logic (PL) configuration
- Primary Boot: OSPI firmware image - example pre-built disk image
VEK385 EDF boot firmware Image (OSPI) Imageavailable from AMD Adaptive Computing Support Downloads page - Secondary Boot: Linux built using Yocto, also found on the same page:
amd-cortexa78-mali-common_edf-linux-disk-image (SD wic) - Runtime Loading: The
pl-aie.pdi(built in this tutorial) is loaded usingfpgautilalong with the device tree - Execution: Host program points to the
dut.xclbinfor runtime configuration
The following table shows the system requirements for the polyphase channelizer. The sampling rate is 2 GSPS. The design supports M=4096 channels with each channel supporting 2G / 4096 = 488.28125 KHz of bandwidth. The filterbank used by the channelizer uses K=36 taps per phase, leading to a total of 4096 x 36 = 147456 taps overall.
| Parameter | Value | Units |
|---|---|---|
| Sampling Rate (Fs) | 2 | GSPS |
| # of Channels (M) | 4096 | channels |
| Channel Bandwidth | 488.28125 | KHz |
| # of taps per phase (K) | 36 | n/a |
| Input datatype | cint16 | n/a |
| Output datatype | cint32 | n/a |
| Filterbank coefficient type | int32 | n/a |
| FFT twiddle type | cint16 | n/a |
The System Partitioning analysis done in the original tutorial is still valid. The main difference is that AIE-ML v2 demonstrates improved compute capability compared to AIE-ML, as evidenced by the following measured throughput increases.
Compile and simulate the design to confirm it works as you expect.
[shell]% cd <path-to-design>/aie/tdm_fir
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, observe that the resource count dropped to 32 tiles with a throughput = 4096/1.5us = 2730 MSPS.
Performance Comparison with Original Design:
- Original (AIE-ML on VEK280): 32 tiles, 2230 MSPS throughput
- Ported (AIE-ML v2 on VEK385): 32 tiles, 2730 MSPS throughput (+22% improvement)
- Key Observation: AIE-ML v2 delivers superior performance with the same tile count, demonstrating architectural improvements in the second-generation AI Engine.
Compile and simulate the design to confirm it works as you expect.
[shell]% cd <path-to-design>/aie/ifft4096_2d
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, note the resource count of 16 AIE-ML v2 tiles.
Achieved throughput for:
- Front 64-point IFFT + point-wise twiddle multiplication = 2731 MSPS
- Back 64-point IFFT = 3436 MSPS
Performance Comparison with Original Design:
| Metric | Original (AIE-ML) | Ported (AIE-ML v2) | Improvement |
|---|---|---|---|
| Front 64-pt IFFT + Twiddle | 2386 MSPS | 2731 MSPS | +14% |
| Back 64-pt IFFT | 2376 MSPS | 3436 MSPS | +45% |
| AI Engine Tiles | 16 | 16 | Same |
| Memory Tiles | 6 | 7 | +1 |
Note: Minor difference in the number of memory tiles. This is due to the different design floorplan and the different number of available rows.
- TDM FIR uses 32 AI Engine tiles, leveraging newly introduced Vitis Library packet switching IP.
- The 4k-pt IFFT is implemented using 2D architecture (with Mode 1), with resources split between:
- 16 AI Engine tiles (compute)
- Seven memory tiles (front transpose and back transpose)
- PL (middle transpose)
- From a bandwidth perspective, the design requires two input and four output streams.
- Custom HLS blocks (packet_sender and packet_receiver) interface with AI Engine packet switching IP.
- Output ports of AI Engine going to PL can arrive at different times causing minor throughput loss. You can compensate by adding FIFOs during the v++ linking step Specifying-Streaming-Connections.
The following figure summarizes the AI Engine and PL resources required to implement the design in the VE3858 device on the VEK385 eval board. The design uses 48 AI Engine tiles for compute and seven Memory Tiles for transpose operations. The PL design includes the resources required to implement the DMA Source, Packet Sender/Receiver, Memory Transpose, and DMA Sink kernels.
Compared to the original design on VEK280, this design achieves better performance on VEK385 with the same AI Engine tile count, demonstrating the efficiency gains of AIE-ML v2 architecture.
You can build the polyphase channelizer design from the command line.
IMPORTANT: Before beginning the tutorial, ensure you have completed the following:
- Installed AMD Vitis™ 2025.2 software and set
PLATFORM_REPO_PATHSto the value<Vitis_tools>/base_platforms. - Created directory
<path-to-design>/yocto_artifactsand set environment variable YOCTO_ARTIFACTS to that path. - From Embedded Development Framework (EDF) downloads page package 25.11:
- Downloaded amd-cortexa78-mali-common_meta-edf-app-sdk, run the script and set path output to
<path-to-design>/yocto_artifacts/amd-cortexa78-mali-common_meta-edf-app-sdk/sdk. - Downloaded VEK385 OSPI Image and move into
<path-to-design>/yocto_artifacts/. - Downloaded amd-cortexa78-mali-common_edf-linux-disk-image (SD wic), unzip and move into
<path-to-design>/yocto_artifacts/. - Downloaded amd-cortexa78-mali-common_vek385_qemu_prebuilt, unzip and move
amd-cortexa78-mali-common_vek385_qemu_prebuiltinto<path-to-design>/yocto_artifacts/.
- Downloaded amd-cortexa78-mali-common_meta-edf-app-sdk, run the script and set path output to
You can build the channelizer design for hardware emulation using the Makefile as follows:
[shell]% cd <path-to-design>
[shell]% make all TARGET=hw_emu
[shell]% make run_emu -C vitis TARGET=hw_emu
This takes about 90 minutes to run. The build process generates a folder package which contains all the files required for hardware emulation. Hardware emulation then launches and runs producing the outputs shown below. You can apply an optional -g can to the launch_hw_emu.sh command to launch Vivado waveform GUI to observe the top-level AXI signal ports in the design. Do this by editing vitis/Makefile run_emu target.
You can meausre throughput by inspecting the traces. The design processes eight transforms, each with 4k samples in 13.7 µs. Throughput = 8 x 4096 / 13.7 = 2390 Msps.
You can build the channelizer design for the VEK385 board using the Makefile as follows:
[shell]% cd <path-to-design>
[shell]% make all TARGET=hw
The build process generates all the design specific files needed to run the design on hardware in the package folder.
- Write the EDF boot firmware (OSPI) to the primary boot device following the instructions here. The OSPI image is in
<path-to-design>/yocto_artifacts/edf-ospi-versal-2ve-2vm-vek385-sdt-seg-20251116021631.bin. - Write
<path-to-design>/yocto_artifacts/edf-linux-disk-image-amd-cortexa78-mali-common.rootfs-20251116015456.wicto the sd_card using your favorite SD imaging tool (Balena Etcher and Win32DiskImager seem to work well). - Put the sd_card in to the board, boot it and log in. (Default username is amd-edf and you will be prompted to set a password.)
- Determine the IP address of eth0 on the board using
ip addr show eth0. - cd
<path-to-design>/package; scp * amd-edf@<ip_address>:~/ - Run the design:
sudo ./embedded_exec.sh
The following displays on the terminal.
| Platform | Throughput | Improvement | Margin vs. 2000 MSPS Target | Notes |
|---|---|---|---|---|
| VEK280 (AIE-ML) | ~2250 MSPS | Baseline | 12.5% | Original tutorial |
| VEK385 (AIE-ML v2) | ~2390 MSPS | +6.2% | 19.5% | This tutorial - bandwidth-bound by I/O ports |
Following is a closer look at the placement delta between the VEK280 (AIE-ML) original tutorial compared to VEK385 (AIE-ML v2) shown in this tutorial.
VEK280 (AIE-ML) Original Tutorial
VEK385 (AIE-ML v2)
- Vitis Libraries Abstraction: Parameterized DSP IP from Vitis Libraries allowed re-targeting a different AI Engine variant with minimal code changes
- Performance Gains: AIE-ML v2's architectural enhancements increased throughput margin against the target requirement
- Similar AI Engine Tile Count and PL Resources: Achieved target performance with identical AI Engine tile utilization (48 tiles) and similar PL resources
- Array Geometry: Required rethinking placement constraints due to VEK385 having half the rows (4 vs 8). Adapted tile placement for reduced array height while preserving connectivity and performance
- Boot Framework: Usage of EDF/Segmented Configuration adds a learning curve compared to traditional boot flow
GitHub issues are used for tracking requests and bugs. For questions, go to Support.
Copyright © 2023-2026 Advanced Micro Devices, Inc.










