Name	Name	Last commit message	Last commit date
parent directory ..
aie	aie
hls	hls
images	images
ps_apps	ps_apps
scripts	scripts
vitis	vitis
Makefile	Makefile
README.md	README.md
description.json	description.json

AI Engine Development

See Vitis™ Development Environment on amd.com
See Vitis™ AI Development Environment on amd.com

Porting of Channelizer to Versal AI Edge Series Gen 2 (AIE-ML v2) leveraging Vitis Libraries

Version: Vitis 2025.2

Porting of Channelizer to Versal AI Edge Series Gen 2 (AIE-ML v2) leveraging Vitis Libraries

Introduction

The purpose of this tutorial is to demonstrate how to port an AIE-ML reference design to AIE-ML v2 with minimal code changes. You do this by leveraging Vitis Libraries which contain templatized DSP IPs that allow you to target all AI Engine variants.

The tutorial targets the same System Requirements as the original tutorial, and highlights the changes required to build this design.

The following table summarizes the key architectural differences between the two platforms. These differences influence the porting decisions and design optimizations described throughout this tutorial.

Platform Comparison

Aspect	Original Tutorial (AIE-ML)	This Tutorial (AIE-ML v2)
Device	VE2802 (Versal AI Edge)	VE3858 (Versal AI Edge Series Gen 2)
Evaluation Board	VEK280	VEK385
Part Number	xcve2802-vsvh1760-2MP-e-S	xc2ve3858-ssva2112-2MP-e-S
Base Platform	vek280_base	vek385_base_reva
AI Engine Architecture	AIE-ML	AIE-ML v2
AIE Array Dimensions	38 columns × 8 rows	36 columns × 4 rows
Boot Framework	Traditional boot flow	EDF with Segmented Configuration

Note: To reproduce any of the following steps, begin by cloning Vitis_Libraries and setting the DSPLIB_ROOT path to point to the <cloned_repo_path>/dsp.

Key Design Changes

Porting the design from AIE-ML (VEK280) to AIE-ML v2 (VEK385) required the following changes. Despite the architectural differences, using Vitis Libraries minimized code changes, demonstrating the portability benefits of parameterized IP.

Hardware Configuration Changes

Device Part Number**: Changed --part in aie/channelizer/Makefile from xcve2802-vsvh1760-2MP-e-S (VEK280) to xc2ve3858-ssva2112-2MP-e-S (VEK385).
Platform**: Changed platform in Top Makefile from vek280_base to vek385_base_reva.

AI Engine Array Changes

Placement Constraints: Modified placement in aie/channelizer/channelizer_app.cpp to account for VEK385's reduced array height.
- Original (VEK280): 38 columns × 8 rows
- Ported (VEK385): 36 columns × 4 rows
- This required rethinking tile placement while maintaining data flow connectivity.

Boot and Software Framework Changes

Embedded Development Framework (EDF): For Versal AI Edge Series Gen 2, AMD tools by default use Segmented Configuration and AMD Embedded Development Framework (EDF).
- Segmented Configuration: Enables processor boot and DDR memory access before programmable logic (PL) configuration
- Primary Boot: OSPI firmware image - example pre-built disk image VEK385 EDF boot firmware Image (OSPI) Image available from AMD Adaptive Computing Support Downloads page
- Secondary Boot: Linux built using Yocto, also found on the same page: amd-cortexa78-mali-common_edf-linux-disk-image (SD wic)
- Runtime Loading: The pl-aie.pdi (built in this tutorial) is loaded using fpgautil along with the device tree
- Execution: Host program points to the dut.xclbin for runtime configuration

Channelizer Requirements

The following table shows the system requirements for the polyphase channelizer. The sampling rate is 2 GSPS. The design supports M=4096 channels with each channel supporting 2G / 4096 = 488.28125 KHz of bandwidth. The filterbank used by the channelizer uses K=36 taps per phase, leading to a total of 4096 x 36 = 147456 taps overall.

Parameter	Value	Units
Sampling Rate (Fs)	2	GSPS
# of Channels (M)	4096	channels
Channel Bandwidth	488.28125	KHz
# of taps per phase (K)	36	n/a
Input datatype	cint16	n/a
Output datatype	cint32	n/a
Filterbank coefficient type	int32	n/a
FFT twiddle type	cint16	n/a

System Partitioning

The System Partitioning analysis done in the original tutorial is still valid. The main difference is that AIE-ML v2 demonstrates improved compute capability compared to AIE-ML, as evidenced by the following measured throughput increases.

Filterbank Library Characterization

Compile and simulate the design to confirm it works as you expect.

[shell]% cd <path-to-design>/aie/tdm_fir
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary

Inspecting vitis_analyzer, observe that the resource count dropped to 32 tiles with a throughput = 4096/1.5us = 2730 MSPS.

Performance Comparison with Original Design:

Original (AIE-ML on VEK280): 32 tiles, 2230 MSPS throughput
Ported (AIE-ML v2 on VEK385): 32 tiles, 2730 MSPS throughput (+22% improvement)
Key Observation: AIE-ML v2 delivers superior performance with the same tile count, demonstrating architectural improvements in the second-generation AI Engine.

IFFT-2D Library Characterization

Compile and simulate the design to confirm it works as you expect.

[shell]% cd <path-to-design>/aie/ifft4096_2d
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary

Inspecting vitis_analyzer, note the resource count of 16 AIE-ML v2 tiles.

Achieved throughput for:

Front 64-point IFFT + point-wise twiddle multiplication = 2731 MSPS
Back 64-point IFFT = 3436 MSPS

Performance Comparison with Original Design:

Metric	Original (AIE-ML)	Ported (AIE-ML v2)	Improvement
Front 64-pt IFFT + Twiddle	2386 MSPS	2731 MSPS	+14%
Back 64-pt IFFT	2376 MSPS	3436 MSPS	+45%
AI Engine Tiles	16	16	Same
Memory Tiles	6	7	+1

Note: Minor difference in the number of memory tiles. This is due to the different design floorplan and the different number of available rows.

Design Summary

TDM FIR uses 32 AI Engine tiles, leveraging newly introduced Vitis Library packet switching IP.
The 4k-pt IFFT is implemented using 2D architecture (with Mode 1), with resources split between:
- 16 AI Engine tiles (compute)
- Seven memory tiles (front transpose and back transpose)
- PL (middle transpose)
From a bandwidth perspective, the design requires two input and four output streams.
Custom HLS blocks (packet_sender and packet_receiver) interface with AI Engine packet switching IP.
Output ports of AI Engine going to PL can arrive at different times causing minor throughput loss. You can compensate by adding FIFOs during the v++ linking step Specifying-Streaming-Connections.

Design Resources

The following figure summarizes the AI Engine and PL resources required to implement the design in the VE3858 device on the VEK385 eval board. The design uses 48 AI Engine tiles for compute and seven Memory Tiles for transpose operations. The PL design includes the resources required to implement the DMA Source, Packet Sender/Receiver, Memory Transpose, and DMA Sink kernels.

Compared to the original design on VEK280, this design achieves better performance on VEK385 with the same AI Engine tile count, demonstrating the efficiency gains of AIE-ML v2 architecture.

Build and Run the Design

You can build the polyphase channelizer design from the command line.

Setup and Initialization

IMPORTANT: Before beginning the tutorial, ensure you have completed the following:

Installed AMD Vitis™ 2025.2 software and set PLATFORM_REPO_PATHS to the value <Vitis_tools>/base_platforms.
Created directory <path-to-design>/yocto_artifacts and set environment variable YOCTO_ARTIFACTS to that path.
From Embedded Development Framework (EDF) downloads page package 25.11:
- Downloaded amd-cortexa78-mali-common_meta-edf-app-sdk, run the script and set path output to <path-to-design>/yocto_artifacts/amd-cortexa78-mali-common_meta-edf-app-sdk/sdk.
- Downloaded VEK385 OSPI Image and move into <path-to-design>/yocto_artifacts/.
- Downloaded amd-cortexa78-mali-common_edf-linux-disk-image (SD wic), unzip and move into <path-to-design>/yocto_artifacts/.
- Downloaded amd-cortexa78-mali-common_vek385_qemu_prebuilt, unzip and move amd-cortexa78-mali-common_vek385_qemu_prebuilt into <path-to-design>/yocto_artifacts/.

Hardware Emulation

You can build the channelizer design for hardware emulation using the Makefile as follows:

[shell]% cd <path-to-design>
[shell]% make all TARGET=hw_emu
[shell]% make run_emu -C vitis TARGET=hw_emu

This takes about 90 minutes to run. The build process generates a folder package which contains all the files required for hardware emulation. Hardware emulation then launches and runs producing the outputs shown below. You can apply an optional -g can to the launch_hw_emu.sh command to launch Vivado waveform GUI to observe the top-level AXI signal ports in the design. Do this by editing vitis/Makefile run_emu target.

You can meausre throughput by inspecting the traces. The design processes eight transforms, each with 4k samples in 13.7 µs. Throughput = 8 x 4096 / 13.7 = 2390 Msps.

Hardware

You can build the channelizer design for the VEK385 board using the Makefile as follows:

[shell]% cd <path-to-design>
[shell]% make all TARGET=hw

The build process generates all the design specific files needed to run the design on hardware in the package folder.

Write the EDF boot firmware (OSPI) to the primary boot device following the instructions here. The OSPI image is in <path-to-design>/yocto_artifacts/edf-ospi-versal-2ve-2vm-vek385-sdt-seg-20251116021631.bin.
Write <path-to-design>/yocto_artifacts/edf-linux-disk-image-amd-cortexa78-mali-common.rootfs-20251116015456.wic to the sd_card using your favorite SD imaging tool (Balena Etcher and Win32DiskImager seem to work well).
Put the sd_card in to the board, boot it and log in. (Default username is amd-edf and you will be prompted to set a password.)
Determine the IP address of eth0 on the board using ip addr show eth0.
cd <path-to-design>/package; scp * amd-edf@<ip_address>:~/
Run the design: sudo ./embedded_exec.sh

The following displays on the terminal.

Summary and Lessons Learned

Performance Summary

Platform	Throughput	Improvement	Margin vs. 2000 MSPS Target	Notes
VEK280 (AIE-ML)	~2250 MSPS	Baseline	12.5%	Original tutorial
VEK385 (AIE-ML v2)	~2390 MSPS	+6.2%	19.5%	This tutorial - bandwidth-bound by I/O ports

AI Engine Placement Summary

Following is a closer look at the placement delta between the VEK280 (AIE-ML) original tutorial compared to VEK385 (AIE-ML v2) shown in this tutorial.

VEK280 (AIE-ML) Original Tutorial

VEK385 (AIE-ML v2)

Lessons Learned from Porting AIE-ML to AIE-ML v2

What Worked Well

Vitis Libraries Abstraction: Parameterized DSP IP from Vitis Libraries allowed re-targeting a different AI Engine variant with minimal code changes
Performance Gains: AIE-ML v2's architectural enhancements increased throughput margin against the target requirement
Similar AI Engine Tile Count and PL Resources: Achieved target performance with identical AI Engine tile utilization (48 tiles) and similar PL resources

Key Porting Considerations

Array Geometry: Required rethinking placement constraints due to VEK385 having half the rows (4 vs 8). Adapted tile placement for reduced array height while preserving connectivity and performance
Boot Framework: Usage of EDF/Segmented Configuration adds a learning curve compared to traditional boot flow

Support

GitHub issues are used for tracking requests and bugs. For questions, go to Support.

License

^{Terms and Conditions}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

AI Engine Development

Porting of Channelizer to Versal AI Edge Series Gen 2 (AIE-ML v2) leveraging Vitis Libraries

Table of Contents

Introduction

Platform Comparison

Key Design Changes

Hardware Configuration Changes

AI Engine Array Changes

Boot and Software Framework Changes

Channelizer Requirements

System Partitioning

Filterbank Library Characterization

IFFT-2D Library Characterization

Design Summary

Design Resources

Build and Run the Design

Setup and Initialization

Hardware Emulation

Hardware

Summary and Lessons Learned

Performance Summary

AI Engine Placement Summary

Lessons Learned from Porting AIE-ML to AIE-ML v2

What Worked Well

Key Porting Considerations

Support

License

FilesExpand file tree

02-Channelizer-Using-Vitis-Libraries

Directory actions

More options

Directory actions

More options

Latest commit

History

02-Channelizer-Using-Vitis-Libraries

Folders and files

parent directory

README.md

AI Engine Development

Porting of Channelizer to Versal AI Edge Series Gen 2 (AIE-ML v2) leveraging Vitis Libraries

Table of Contents

Introduction

Platform Comparison

Key Design Changes

Hardware Configuration Changes

AI Engine Array Changes

Boot and Software Framework Changes

Channelizer Requirements

System Partitioning

Filterbank Library Characterization

IFFT-2D Library Characterization

Design Summary

Design Resources

Build and Run the Design

Setup and Initialization

Hardware Emulation

Hardware

Summary and Lessons Learned

Performance Summary

AI Engine Placement Summary

Lessons Learned from Porting AIE-ML to AIE-ML v2

What Worked Well

Key Porting Considerations

Support

License