Real-time object classification demo and DPU performance benchmarking suite for the AMD Xilinx Kria KV260 Vision AI Starter Kit, combining Intel RealSense camera integration with Vitis AI inference and layer-by-layer profiling via vaitrace.
The system connects a RealSense D435 camera to the Kria KV260 board via USB, with the board accessible over Ethernet through a WiFi router from a development laptop.
- Real-time Object Classification — MobileNet V2 inference on live RealSense camera feed
- Dual FPS Monitoring — Overall frame rate and DPU inference speed
- System Monitoring — Power, temperature, CPU utilization, and memory usage
- Top-5 Predictions — Classification results with confidence scores
- Depth Visualization — Color-mapped depth stream (640×480 @ 30fps)
- Synthetic Testing — Uses random images, no camera required
- Vaitrace Instrumentation — Layer-by-layer DPU profiling
- Comprehensive Statistics — Mean, std, min, max, P50, P95, P99
- Reproducible Results — Fixed random seed support
- Warmup Phase — Ensures accurate steady-state measurements
- Automatic Analysis — Runs automatically after
run_vaitrace.shcompletes - Per-Layer Stats — Mean latency, std dev, CoV(%), min/max, efficiency (GOP/s)
- Timeline Order — Layers sorted by execution order in the model
- CSV Export — Save results with
--csv-out
| Component | Details |
|---|---|
| AMD Xilinx Kria KV260 | Vision AI Starter Kit |
| Intel RealSense D435 | Connected via USB 3.0 (camera demo only) |
| Network | Ethernet/WiFi for remote deployment |
- PYNQ framework with DPU support
- Python 3.10
- DPU bitstream (
dpu.bit) - Vitis AI runtime
rsync -avz --exclude=venv --exclude=.git --exclude=__pycache__ ./ ubuntu@192.168.100.8:~/kria-camera-demo/# Apply vaitrace bug fix (first time only)
ssh ubuntu@192.168.100.8 "cd ~/kria-camera-demo && ./patch_vaitrace.sh"
# Run profiling (500 frames)
ssh ubuntu@192.168.100.8 "cd ~/kria-camera-demo && ./run_vaitrace.sh -n 500"
# Download profiling results
scp ubuntu@192.168.100.8:~/kria-camera-demo/*.csv ./ssh ubuntu@192.168.100.8 "cd ~/kria-camera-demo && sudo /usr/local/share/pynq-venv/bin/python3 kria-camera-demo.py"git clone <repository-url>
cd kria-camera-demoModel files are not included in the repo. See Downloading Models below.
-
Activate the PYNQ environment:
source /usr/local/share/pynq-venv/bin/activate -
Install dependencies:
pip3 install -r requirements.txt
-
Verify the DPU bitstream is present:
ls dpu.bit
-
Download model files and place them in
models/(see Downloading Models).
Model files (.xmodel) are not included in this repository due to their size. Download them from the Vitis AI Model Zoo.
| Model | Input Size | Target DPU |
|---|---|---|
mobilenet_v2 |
224×224 | DPUCZDX8G_ISA1_B4096 |
mobilenet_v1_1_0_224_tf |
224×224 | DPUCZDX8G_ISA1_B4096 |
resnet50 |
224×224 | DPUCZDX8G_ISA1_B4096 |
Download pre-compiled .xmodel files for target DPUCZDX8G_ISA1_B4096 from:
https://github.com/Xilinx/Vitis-AI/tree/master/model_zoo/model-list
Place downloaded files as follows:
models/
└── your_model/
├── meta.json # Model metadata
├── your_model.xmodel # Compiled DPU model
└── your_model.prototxt # Preprocessing parameters
meta.json format:
{
"lib": "libvart-dpu-runner.so",
"filename": "your_model.xmodel",
"kernel": ["subgraph_name"],
"target": "DPUCZDX8G_ISA1_B4096"
}prototxt format:
transform_param {
mean_value: 104.0
mean_value: 117.0
mean_value: 123.0
scale: 0.00392157
scale: 0.00392157
scale: 0.00392157
}
Note: The
kernelname inmeta.jsonmust match the subgraph name in the.xmodelfile. Inspect the model withxir subgraph <model>.xmodelon the Kria board.
cd models/mobilenet_v2
md5sum -c md5sum.txtThe synthetic benchmark is the recommended way to profile DPU performance — no camera needed.
# With vaitrace profiling (recommended)
./run_vaitrace.sh -n 500
# Direct execution (no profiling)
sudo /usr/local/share/pynq-venv/bin/python3 utils/benchmark_dpu.py -n 500| Option | Description | Default |
|---|---|---|
-n, --num-frames N |
Number of frames to process | 100 |
--warmup N |
Warmup iterations | 10 |
--seed N |
Random seed for reproducibility | 42 |
-m, --model-dir PATH |
Model directory | models/mobilenet_v2 |
-b, --dpu-bit PATH |
DPU bitstream file | dpu.bit |
-l, --labels PATH |
Class labels file | words.txt |
Real-time camera application with live visualization. The interface shows the classified object, top-5 predictions with confidence scores, DPU inference FPS, and system stats (power, temperature, CPU, memory).
# Normal mode with GUI
sudo /usr/local/share/pynq-venv/bin/python3 kria-camera-demo.py
# Headless mode for profiling
./run_vaitrace.sh kria-camera-demo.py
# Custom model
sudo /usr/local/share/pynq-venv/bin/python3 kria-camera-demo.py -m models/resnet50| Option | Description | Default |
|---|---|---|
-m, --model-dir PATH |
Model directory | models/mobilenet_v2 |
-b, --dpu-bit PATH |
DPU bitstream | dpu.bit |
-l, --labels PATH |
Class labels | words.txt |
--headless |
Run without GUI (for profiling) | — |
--profile-frames N |
Frames in headless mode | 100 |
Press q to quit the GUI.
Vaitrace provides layer-by-layer DPU profiling with minimal overhead.
# 1. Apply vaitrace patch (fixes a KeyError bug — run once)
./patch_vaitrace.sh
# 2. Run profiling
./run_vaitrace.sh -n 500
# 3. Check generated output files
ls -lh *.csvOutput files:
| File | Contents |
|---|---|
vart_trace.csv |
Layer-by-layer execution trace with timestamps |
vitis_ai_profile.csv |
Detailed DPU profiling and utilization metrics |
profile_summary.csv |
Aggregated performance summary |
run_vaitrace.sh automatically parses vart_trace.csv via utils/analyze_trace.py and prints per-layer latency and efficiency statistics when profiling finishes. To save those results:
python3 utils/analyze_trace.py --csv-out layer_stats.csvutils/plot_ddr_traffic.py visualizes DDR memory bandwidth from profiling results. See doc/mem_io_metric.md for an explanation of the Mem IO(MB) and Mem Bandwidth(MB/s) columns in profile_summary.csv.
run_vaitrace.sh uses --fine_grained for detailed layer profiling and --va for VART runtime tracing, with automatic PYTHONPATH setup for PYNQ packages.
| Stage | Latency | Throughput |
|---|---|---|
| DPU Inference | ~6ms | 160+ FPS |
| Full Pipeline (camera + pre/postprocessing) | — | 25–35 FPS |
| Preprocessing | ~1–2ms | — |
| Postprocessing | <1ms | — |
- Use the synthetic benchmark to isolate DPU performance from camera I/O
- Use at least 500 frames for stable statistics
- Always apply the vaitrace patch before profiling
- Check P95/P99 latencies for worst-case analysis
kria-camera-demo/
├── kria-camera-demo.py # Real-time camera demo
├── run_vaitrace.sh # Vaitrace profiling wrapper ⭐
├── patch_vaitrace.sh # Fix vaitrace KeyError bug ⭐
├── utils/
│ ├── benchmark_dpu.py # Synthetic DPU benchmark ⭐
│ ├── analyze_trace.py # Per-layer trace analysis ⭐
│ ├── plot_ddr_traffic.py # DDR bandwidth visualization
│ └── sync_timestamps.py # Timestamp synchronization utility
├── camera_demo/
│ ├── kria_camera_demo.py # Main camera demo class
│ ├── utils.py # Utilities
│ ├── preprocessing.py # Image preprocessing
│ ├── visualization.py # Display helpers
│ └── platform_monitor.py # System monitoring
├── models/ # Model directories (download separately)
│ ├── mobilenet_v2/
│ ├── mobilenet_v1_1_0_224_tf/
│ └── resnet50/
├── doc/
│ ├── platform_setup.png # Hardware setup diagram
│ ├── mem_io_metric.md # DDR memory metric explanations
│ └── ddrc_port_assignments.md # DDR controller port reference
├── words.txt # ImageNet labels (1000 classes)
├── dpu.bit # DPU overlay bitstream
└── requirements.txt # Python dependencies
⭐ = Essential for DPU profiling
Symptom: KeyError: '_fun_XXXXXX'
Solution: Run ./patch_vaitrace.sh. This patches /usr/bin/xlnx/vaitrace/tracer/function.py to handle missing function symbols gracefully.
Symptom: Check failed: fromdata != ((void *) -1)
Solution: Don't use XLNX_ENABLE_DUMP=1. Use vaitrace instead.
Solution: Ensure the RealSense camera is connected via USB 3.0. Test with realsense-viewer.
Solution: Run with sudo — required for DPU hardware access.
Solution: Activate the PYNQ environment first:
source /usr/local/share/pynq-venv/bin/activate
# or use the full Python path:
/usr/local/share/pynq-venv/bin/python3Symptom: DPU runner fails to initialize or returns no results.
Solution: Inspect the xmodel to find the correct subgraph name, then update kernel in meta.json:
xir subgraph models/your_model/your_model.xmodelDefault Kria IP: 192.168.100.8
Update in CLAUDE.md if your board uses a different address.
When adding a new model:
- Download the compiled
.xmodelfor targetDPUCZDX8G_ISA1_B4096 - Create a directory under
models/with the proper structure - Add
meta.json,.prototxt, andmd5sum.txt - Test with the benchmark:
./run_vaitrace.sh -m models/new_model -n 100
This work was supported by the dAIEDGE Open Call Programme, funded by the European Union's Horizon Europe research and innovation programme.
This project is licensed under the Apache License 2.0.
You may use, reproduce, and distribute this work under the terms of the Apache License, Version 2.0. See the LICENSE file for the full text.

