Skip to content

Commit

Permalink
Final update for programming-script, added debugging-instructions to …
Browse files Browse the repository at this point in the history
…the Readme
  • Loading branch information
Maximilian committed Sep 17, 2024
1 parent 608fe64 commit 1ba3880
Show file tree
Hide file tree
Showing 5 changed files with 34 additions and 5 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,35 @@ It is important to know that latency-measurements are conducted using a ping-pon

#### tcp_iperf

## Coyote v2 Hardware-Debugging
Coyote can be debugged on the hardware-level using the AMD ILA / ChipScope-cores. This requires interaction with the Vivado GUI, so that it's important to know how to access the different project files, include ILA-cores and trigger a rebuild of the bitstream:

#### Shell (Static and Dynamic Layer)
Open the Vivado GUI and click `Open Project`. The required file is located within the previously generated hardware-build directory, at `.../<Name of HW-build folder>/test_shell/test.xpr` and should now be selected for opening the shell-project.

###### Creating a new ILA
The `Sources` tab in the GUI can now be used to navigate to any file that is part of the shell - i.e. the networking stacks. There, a new ILA can be placed by including the module-template in the source code:
~~~~
ila_<name> inst_ila_<name> (
.clk(nclk);
.probe0(<Signal #1>),
.probe1(<Signal #2>),
...
);
~~~~
It makes sense to annotate (in comments) the bidwidth of each signal, since this information is required for the instantiation of the ILA-IP.
In the next step, select the tab `IP Catalog` from the section `PROJECT MANAGER` on the left side of the GUI, search for `ILA` and select the first found item ("ILA (Integrated Logic Analyzer)"). Then, you enter the "Component Name" that was previously used for the instantiation of the module in hardware ("ila_<name>"), select the right number of probes and the desired sample data depth. Afterwards, assign the right bitwidth to all probes in the different tabs of the interface. Finally, you can start a `Out of context per IP`-run by clicking `Generate` in the next interface. Once this run is through, the bitstream-generation can be restarted via
~~~~
$ make bitgen
~~~~
in the original build-directory as described before. This build-process is expected to be considerably faster than the original run. Once it's finished, the new ILA should be accessible for testing:

###### Using an ILA for debugging
In the project-interface of the GUI click on `Open Hardware Manager` and select "Open target" in the top-dialogue. If you're logged into a machine with a locally attached FPGA, select `Auto Connect`, otherwise chose `Open New Target` to connect to a remote machine with FPGA via the network. Once the connection is established, you'll be able to select the specific ILA from the `Hardware` tab on the left side of the hardware manager. This opens a waveform-display, where the capturing-settings and the trigger-setup can be selected. This allows to create a data capturing customized to the desired experiment or debugging purpose.

#### Application Layer
The application layer Vivado-project can be opened via `.../<Name of HW-build folder>/test_config_0/user_c0_0/test.xpr`. The subsequent steps for creating and using new ILAs are then identical to what's described above.

## Deploying on the ETHZ HACC-cluster
The ETHZ HACC is a premiere cluster for research in systems, architecture and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.

Expand Down
2 changes: 1 addition & 1 deletion examples_hw/apps/rdma_perf/init_ip.tcl
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_0
set_property -dict [list CONFIG.C_PROBE17_WIDTH {128} CONFIG.C_PROBE14_WIDTH {128} CONFIG.C_NUM_OF_PROBES {20} CONFIG.C_EN_STRG_QUAL {1} CONFIG.C_PROBE19_MU_CNT {2} CONFIG.C_PROBE18_MU_CNT {2} CONFIG.C_PROBE17_MU_CNT {2} CONFIG.C_PROBE16_MU_CNT {2} CONFIG.C_PROBE15_MU_CNT {2} CONFIG.C_PROBE14_MU_CNT {2} CONFIG.C_PROBE13_MU_CNT {2} CONFIG.C_PROBE12_MU_CNT {2} CONFIG.C_PROBE11_MU_CNT {2} CONFIG.C_PROBE10_MU_CNT {2} CONFIG.C_PROBE9_MU_CNT {2} CONFIG.C_PROBE8_MU_CNT {2} CONFIG.C_PROBE7_MU_CNT {2} CONFIG.C_PROBE6_MU_CNT {2} CONFIG.C_PROBE5_MU_CNT {2} CONFIG.C_PROBE4_MU_CNT {2} CONFIG.C_PROBE3_MU_CNT {2} CONFIG.C_PROBE2_MU_CNT {2} CONFIG.C_PROBE1_MU_CNT {2} CONFIG.C_PROBE0_MU_CNT {2} CONFIG.ALL_PROBE_SAME_MU_CNT {2}] [get_ips ila_0]
set_property -dict [list CONFIG.C_PROBE17_WIDTH {128} CONFIG.C_PROBE14_WIDTH {128} CONFIG.C_NUM_OF_PROBES {20} CONFIG.C_EN_STRG_QUAL {1} CONFIG.C_PROBE19_MU_CNT {2} CONFIG.C_PROBE18_MU_CNT {2} CONFIG.C_PROBE17_MU_CNT {2} CONFIG.C_PROBE16_MU_CNT {2} CONFIG.C_PROBE15_MU_CNT {2} CONFIG.C_PROBE14_MU_CNT {2} CONFIG.C_PROBE13_MU_CNT {2} CONFIG.C_PROBE12_MU_CNT {2} CONFIG.C_PROBE11_MU_CNT {2} CONFIG.C_PROBE10_MU_CNT {2} CONFIG.C_PROBE9_MU_CNT {2} CONFIG.C_PROBE8_MU_CNT {2} CONFIG.C_PROBE7_MU_CNT {2} CONFIG.C_PROBE6_MU_CNT {2} CONFIG.C_PROBE5_MU_CNT {2} CONFIG.C_PROBE4_MU_CNT {2} CONFIG.C_PROBE3_MU_CNT {2} CONFIG.C_PROBE2_MU_CNT {2} CONFIG.C_PROBE1_MU_CNT {2} CONFIG.C_PROBE0_MU_CNT {2} CONFIG.ALL_PROBE_SAME_MU_CNT {2} CONTROL.DATA_DEPTH {4096}] [get_ips ila_0]
2 changes: 1 addition & 1 deletion examples_sw/apps/rdma_service/client/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ int main(int argc, char *argv[])

// Generate the required output based on the statistical data from the benchmarking tool
std::cout << std::fixed << std::setprecision(2);
std::cout << std::setw(8) << sg.rdma.len << " [bytes], thoughput: "
std::cout << std::setw(8) << sg.rdma.len << " [bytes], throughput: "
<< std::setw(8) << ((1 + oper) * ((1000 * sg.rdma.len ))) / ((bench.getAvg()) / n_reps_thr) << " [MB/s], latency: ";

// Sync - reset the completion counter from the thread, sync-up via ACK-handshakes
Expand Down
4 changes: 2 additions & 2 deletions hw/hdl/network/rdma/roce_stack.sv
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ assign rdma_wr_req.ready = m_rdma_wr_req.ready;
// RoCE stack
//

/*

ila_rdma inst_ila_rdma (
.clk(nclk),

Expand Down Expand Up @@ -263,7 +263,7 @@ ila_rdma inst_ila_rdma (
.probe36(rdma_rd_req.data), // 128
.probe37(rdma_wr_req.data) // 128
);
*/


metaIntf #(.STYPE(logic[103:0])) m_axis_dbg_0 ();
metaIntf #(.STYPE(logic[103:0])) m_axis_dbg_1 ();
Expand Down
2 changes: 1 addition & 1 deletion program_coyote.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ if [ $DRV_INSERT -eq 1 ]; then
echo "***"
echo "** IP_ADDRESS: $DEVICE_1_IP_ADDRESS_HEX_0"
echo "** MAC_ADDRESS: $DEVICE_1_MAC_ADDRESS_0"
sgutil program driver -m $DRV_PATH -p ip_addr=$DEVICE_1_IP_ADDRESS_HEX_0,mac_addr=$DEVICE_1_MAC_ADDRESS_0
sgutil program driver -i $DRV_PATH -p ip_addr=$DEVICE_1_IP_ADDRESS_HEX_0,mac_addr=$DEVICE_1_MAC_ADDRESS_0
# sgutil program driver -m $DRV_PATH
echo "***"
echo "** Driver loaded "
Expand Down

0 comments on commit 1ba3880

Please sign in to comment.