-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(tapa) decrease knn_chipknn to 8 PEs
- Loading branch information
1 parent
04e3287
commit bade709
Showing
12 changed files
with
2,131 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors. All rights reserved. | ||
# The contributor(s) of this file has/have agreed to the RapidStream Contributor License Agreement. | ||
|
||
ROOT_DIR := $(shell git rev-parse --show-toplevel) | ||
GRP_UTIL := $(ROOT_DIR)/common/util/get_group.py | ||
TEMP_DIR := $(CURDIR)/build | ||
RS_TARGET := $(CURDIR)/$(TEMP_DIR)/dse/candidate_0/exported/impl/vitis_run_hw | ||
TAPA_XO := $(CURDIR)/design/generated/knn.xo | ||
PLATFORM := xilinx_u280_gen3x16_xdma_1_202211_1 | ||
PART := xcu280-fsvh2892-2L-e | ||
RUN_FILE := $(CURDIR)/run.py | ||
|
||
all: $(RS_TARGET) | ||
|
||
$(RS_TARGET):$(TAPA_XO) | ||
rapidstream $(RUN_FILE) | ||
|
||
show_groups: | ||
rapidstream $(GRP_UTIL) -i $(TEMP_DIR)/passes/0-imported.json \ | ||
-o $(TEMP_DIR)/module_types.csv | ||
|
||
|
||
clean: | ||
rm -rf $(TEMP_DIR) *.log | ||
rm -rf .Xil .run | ||
rm -rf *.exe | ||
rm -rf .ipcache |
145 changes: 145 additions & 0 deletions
145
benchmarks/tapa_flow/knn_chipknn/k2D_float_8PEs/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
<!-- | ||
Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors. All rights reserved. | ||
The contributor(s) of this file has/have agreed to the RapidStream Contributor License Agreement. | ||
--> | ||
|
||
<img src="https://imagedelivery.net/AU8IzMTGgpVmEBfwPILIgw/1b565657-df33-41f9-f29e-0d539743e700/128" width="64px" alt="RapidStream Logo" /> | ||
|
||
# TAPA Flow: K Nearest Neighbour | ||
|
||
## Introduction | ||
|
||
|
||
In this recipe, we demonstrate how to use RapidStream to optimize TAPA projects. The basic steps include: | ||
|
||
- Compile the HLS C++ code into a Vitis-compatible .xo file using TAPA. | ||
- Optimize the .xo file with RapidStream to obtain an optimized .xo file. | ||
- Use Vitis to compile the optimized .xo file into an .xclbin file for FPGA deployment. | ||
|
||
## Tutorial | ||
|
||
### Step 1 (Done): Generate the Xilinx Object File (`.xo`) | ||
|
||
|
||
We utilize TAPA to generate the `.xo` file. If you have not installed TAPA, we've already compiled the C++ source to `.xo` using TAPA. The original C++ source files are located in [design/src](design/src). The generated `.xo` file can be found at [design/generated/knn.xo](design/generated/knn.xo). To compile C++ to `.xo` using TAPA, we use the script [design/run_tapa.sh](design/run_tapa.sh), with the detailed commands shown below. For your convenience, we have also backed up all the generated metadata by TAPA in the [design/generated](design/generated/) directory. | ||
|
||
```bash | ||
WORK_DIR=generated | ||
tapac \ | ||
--work-dir ${WORK_DIR} \ | ||
--top Knn \ | ||
--part-num xcu280-fsvh2892-2L-e \ | ||
--clock-period 3.33 \ | ||
-o ${WORK_DIR}/knn.xo \ | ||
--connectivity config/link_config.ini \ | ||
src/knn.cpp \ | ||
2>&1 | tee tapa.log | ||
``` | ||
|
||
### Step 2: Use Rapidstream to Optimize `.xo` Design | ||
|
||
The RapidStream flow conducts design space exploration and generates solutions by taking all TAPA-generated `.xo` file as the input. | ||
The RapidStream flow for TAPA requires the following key inputs: | ||
|
||
- **Platform**: The Vitis platform (e.g., `xilinx_u280_gen3x16_xdma_1_202211_1`). | ||
- **Device**: virtual device define by calling rapidstream APIs based on platform (e.g., `get_u280_vitis_device_factory`). | ||
- **.xo file**: The `.xo` file generated by TAPA | ||
- **Connectivity** (.ini): Include the configuration file for `v++` ([link_config.ini](design/config/link_config.ini)). | ||
- **top_module_name**: Top module name for the kernel. | ||
- **Clock**: All the clock and frequencies. | ||
- **Flatten Module**: Within a design, not all modules need to be optimized. The flatten module name is the target module rapidstream will optimize. | ||
|
||
The Python snippet below shows how we initiate rapidstream instance to set up the rapidstream environment. | ||
|
||
```Python | ||
from rapidstream import get_u280_vitis_device_factory, RapidStreamTAPA | ||
import os | ||
CURR_DIR = os.path.dirname(os.path.abspath(__file__)) | ||
INI_PATH = f"{CURR_DIR}/design/config/link_config.ini" | ||
VITIS_PLATFORM = "xilinx_u280_gen3x16_xdma_1_202211_1" | ||
XO_PATH = f"{CURR_DIR}/design/generated/knn.xo" | ||
kernel_name = "Knn" | ||
factory = get_u280_vitis_device_factory(VITIS_PLATFORM) | ||
rs = RapidStreamTAPA(f"{CURR_DIR}/build") | ||
rs.set_virtual_device(factory.generate_virtual_device()) | ||
rs.add_xo_file(XO_PATH) | ||
rs.set_vitis_platform(VITIS_PLATFORM) | ||
rs.set_vitis_connectivity_config(INI_PATH) | ||
rs.set_top_module_name(kernel_name) | ||
rs.add_clock("ap_clk", 3.33) | ||
rs.add_flatten_targets([kernel_name]) | ||
``` | ||
|
||
The HBM AXI port connection is described in [design/config/link_config.ini](design/config/link_config.ini). | ||
|
||
```bash | ||
[connectivity] | ||
sp=Knn_1.in_0:HBM[0] | ||
sp=Knn_1.in_1:HBM[1] | ||
sp=Knn_1.in_2:HBM[2] | ||
sp=Knn_1.in_3:HBM[3] | ||
sp=Knn_1.in_4:HBM[4] | ||
sp=Knn_1.in_5:HBM[5] | ||
sp=Knn_1.in_6:HBM[6] | ||
sp=Knn_1.in_7:HBM[7] | ||
sp=Knn_1.in_8:HBM[8] | ||
sp=Knn_1.in_9:HBM[9] | ||
sp=Knn_1.in_10:HBM[10] | ||
sp=Knn_1.in_11:HBM[11] | ||
sp=Knn_1.in_12:HBM[12] | ||
sp=Knn_1.in_13:HBM[13] | ||
sp=Knn_1.in_14:HBM[14] | ||
sp=Knn_1.final_out_dist:HBM[14] | ||
sp=Knn_1.final_out_id:HBM[14] | ||
``` | ||
|
||
As a result, it is necessary to assign the kernel ports to the appropriate slots. The Python code below demonstrates this process. For comprehensive linking details, please refer to the [design/config/link_config.ini](design/config/link_config.ini) file. | ||
|
||
```Python | ||
right_slot = "SLOT_X1Y0:SLOT_X1Y0" | ||
left_slot = "SLOT_X0Y0:SLOT_X0Y0" | ||
rs.assign_port_to_region(".*in_.*", left_slot) | ||
rs.assign_port_to_region(".*final_out.*", left_slot) | ||
rs.assign_port_to_region("s_axi_control_.*", left_slot) | ||
rs.assign_port_to_region("ap_clk", left_slot) | ||
rs.assign_port_to_region("ap_rst_n", left_slot) | ||
rs.assign_port_to_region("interrupt", left_slot) | ||
``` | ||
|
||
For the complete detail, please refore to [./run.py](./run.py) file. Call the rapidstream by launching the command below or `make all`. | ||
|
||
```bash | ||
rapidstream run.py | ||
``` | ||
|
||
If everything is successful, you should at least get one optimized `.xclbin` file. | ||
|
||
|
||
|
||
### Step 3: Check the Group Module Report | ||
|
||
|
||
RapidStream mandates a clear distinction between communication and computation within user designs. | ||
|
||
- In `Group modules`, users are tasked solely with defining inter-submodule communication. For those familiar with Vivado IP Integrator flow, crafting a Group module mirrors the process of connecting IPs in IPI. RapidStream subsequently integrates appropriate pipeline registers into these Group modules. | ||
|
||
- In `Leaf modules`, users retain the flexibility to implement diverse computational patterns, as RapidStream leaves these Leaf modules unchanged. | ||
|
||
For further details, please consult the [code style](https://docs.rapidstream-da.com/required-coding-style/) section in our Documentation. | ||
|
||
To generate a report on group types, execute the commands below or `run make show_groups`: | ||
|
||
```bash | ||
rapidstream ../../../common/util/get_group.py \ | ||
-i build/passes/0-imported.json \ | ||
-o build/module_types.csv | ||
``` | ||
|
||
The module types for your design can be found in `build/module_types.csv`. Below, we list the four Group modules. In this design, `Knn` serves as a Group module, while the other three modules are added by RapidStream. | ||
|
||
| Module Name | Group Type | | ||
|:--------------------------------:|:--------------:| | ||
| Knn | grouped_module | | ||
|__rs_ap_ctrl_start_ready_pipeline | grouped_module | | ||
|__rs_ff_pipeline | grouped_module | | ||
|__rs_hs_pipeline | grouped_module | |
Empty file.
153 changes: 153 additions & 0 deletions
153
benchmarks/tapa_flow/knn_chipknn/k2D_float_8PEs/design/archive/emconfig.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
{ | ||
"Comment": "This file is auto-generated by the tool. Do not modify", | ||
"Platform": { | ||
"Boards": [ | ||
{ | ||
"Devices": [ | ||
{ | ||
"DdrBanks": [ | ||
{ | ||
"AXI_ARBITRATION_SCHEME": "RD_PRI_REG", | ||
"BURST_LENGTH": "8", | ||
"C0": { | ||
"APP_ADDR_WIDTH": "31", | ||
"APP_DATA_WIDTH": "512", | ||
"ControllerType": "DDR4_SDRAM", | ||
"DDR4_ADDR_WIDTH": "17", | ||
"DDR4_AXI_ADDR_WIDTH": "34", | ||
"DDR4_AXI_DATA_WIDTH": "512", | ||
"DDR4_AXI_ID_WIDTH": "1", | ||
"DDR4_AutoPrecharge": "false", | ||
"DDR4_AxiNarrowBurst": "false", | ||
"DDR4_BANK_GROUP_WIDTH": "2", | ||
"DDR4_BANK_WIDTH": "2", | ||
"DDR4_CL": "0", | ||
"DDR4_COLUMN_WIDTH": "10", | ||
"DDR4_CWL": "0", | ||
"DDR4_Mem_Add_Map": "ROW_COLUMN_BANK_INTLV", | ||
"DDR4_Ordering": "Normal", | ||
"DDR4_RANK_WIDTH": "1", | ||
"DDR4_ROW_WIDTH": "17", | ||
"DDR4_tCK": "833", | ||
"DDR4_tCKE": "0", | ||
"DDR4_tFAW": "16", | ||
"DDR4_tMRD": "2", | ||
"DDR4_tRAS": "39", | ||
"DDR4_tRCD": "17", | ||
"DDR4_tREFI": "9363", | ||
"DDR4_tRFC": "421", | ||
"DDR4_tRP": "17", | ||
"DDR4_tRRD_L": "6", | ||
"DDR4_tRRD_S": "4", | ||
"DDR4_tRTP": "10", | ||
"DDR4_tWR": "19", | ||
"DDR4_tWTR_L": "10", | ||
"DDR4_tWTR_S": "4", | ||
"DDR4_tXPR": "109", | ||
"DDR4_tZQCS": "128", | ||
"DDR4_tZQI": "0", | ||
"DDR4_tZQINIT": "256" | ||
}, | ||
"CAS_LATENCY": "17", | ||
"CAS_WRITE_LATENCY": "12", | ||
"DATA_WIDTH": "72", | ||
"MEMORY_PART": "MTA18ASF2G72PZ-2G3", | ||
"MEM_ADDR_MAP": "ROW_COLUMN_BANK_INTLV", | ||
"Name": "dynamic_region_memory_subsystem_memory_ddr4_mem00", | ||
"Size": "16GB", | ||
"TIMEPERIOD_PS": "833", | ||
"Type": "ddr4" | ||
}, | ||
{ | ||
"AXI_ARBITRATION_SCHEME": "RD_PRI_REG", | ||
"BURST_LENGTH": "8", | ||
"C0": { | ||
"APP_ADDR_WIDTH": "31", | ||
"APP_DATA_WIDTH": "512", | ||
"ControllerType": "DDR4_SDRAM", | ||
"DDR4_ADDR_WIDTH": "17", | ||
"DDR4_AXI_ADDR_WIDTH": "34", | ||
"DDR4_AXI_DATA_WIDTH": "512", | ||
"DDR4_AXI_ID_WIDTH": "1", | ||
"DDR4_AutoPrecharge": "false", | ||
"DDR4_AxiNarrowBurst": "false", | ||
"DDR4_BANK_GROUP_WIDTH": "2", | ||
"DDR4_BANK_WIDTH": "2", | ||
"DDR4_CL": "0", | ||
"DDR4_COLUMN_WIDTH": "10", | ||
"DDR4_CWL": "0", | ||
"DDR4_Mem_Add_Map": "ROW_COLUMN_BANK_INTLV", | ||
"DDR4_Ordering": "Normal", | ||
"DDR4_RANK_WIDTH": "1", | ||
"DDR4_ROW_WIDTH": "17", | ||
"DDR4_tCK": "833", | ||
"DDR4_tCKE": "0", | ||
"DDR4_tFAW": "16", | ||
"DDR4_tMRD": "2", | ||
"DDR4_tRAS": "39", | ||
"DDR4_tRCD": "17", | ||
"DDR4_tREFI": "9363", | ||
"DDR4_tRFC": "421", | ||
"DDR4_tRP": "17", | ||
"DDR4_tRRD_L": "6", | ||
"DDR4_tRRD_S": "4", | ||
"DDR4_tRTP": "10", | ||
"DDR4_tWR": "19", | ||
"DDR4_tWTR_L": "10", | ||
"DDR4_tWTR_S": "4", | ||
"DDR4_tXPR": "109", | ||
"DDR4_tZQCS": "128", | ||
"DDR4_tZQI": "0", | ||
"DDR4_tZQINIT": "256" | ||
}, | ||
"CAS_LATENCY": "17", | ||
"CAS_WRITE_LATENCY": "12", | ||
"DATA_WIDTH": "72", | ||
"MEMORY_PART": "MTA18ASF2G72PZ-2G3", | ||
"MEM_ADDR_MAP": "ROW_COLUMN_BANK_INTLV", | ||
"Name": "dynamic_region_memory_subsystem_memory_ddr4_mem01", | ||
"Size": "16GB", | ||
"TIMEPERIOD_PS": "833", | ||
"Type": "ddr4" | ||
} | ||
], | ||
"FeatureRom": { | ||
"Aurora_Link": "disabled", | ||
"Board_Mgmt": "enabled", | ||
"Board_Scheduler": "enabled", | ||
"Cdma_Base_Address0": "0", | ||
"Cdma_Base_Address1": "0", | ||
"Cdma_Base_Address2": "0", | ||
"Cdma_Base_Address3": "0", | ||
"Cdma_Size": "4", | ||
"Ddr_Channel_Count": "2", | ||
"Ddr_Channel_Size": "16", | ||
"Debug_Type": "0x2", | ||
"Dr_Base_Address": "0", | ||
"Feature_Bitmap": "197133", | ||
"Fpga_Part_Name": "xcu280-fsvh2892-2L-e", | ||
"Ip_Build_Id": "2719198", | ||
"Major_Version": "10", | ||
"Minor_Version": "1", | ||
"Peer_To_Peer": "enabled", | ||
"Prom_Type": "0x0", | ||
"Time_Since_Epoch": "1579649056", | ||
"Unified_Platform": "enabled", | ||
"Uuid": "f2b82d53-372f-45a4-bbe9-3d1c980216da", | ||
"Vbnv_Name": "xilinx_u280_xdma_201920_3", | ||
"Vivado_Build_Id": "2742762" | ||
}, | ||
"Name": "xilinx_u280_xdma_201920_3" | ||
} | ||
], | ||
"NumBoards": "1" | ||
} | ||
], | ||
"ExpandedPR": "false", | ||
"UnifiedPlatform": "true" | ||
}, | ||
"Version": { | ||
"FileVersion": "2.0", | ||
"ToolVersion": "2020.2" | ||
} | ||
} |
Oops, something went wrong.