Skip to content

How to Write Software

Philip Bedoukian edited this page May 19, 2020 · 33 revisions

A document describing how to port parallel code (OpenMP, DOALL) to the proposed architecture. Each section gives guideline on how to think about the porting process. If you have a guideline in mind, please add it as a new section.

Pthread Style

Declaring Vector Groups

Using Vector Groups

Control Flow

Unconditional jumps and predication supported in vector cores. Give example usage.

Optimizing for Memory Hierarchy

Prefetching

Declaring a Frame

Using Frames

The Stack

Mapping Problem Sizes to Vector Groups

Vector groups typically have a characteristic software length that they can perform operations on. For example, in a stencil kernel with a filter dimension of three, a characteristic length would be FILTER_DIM * VECTOR_LEN where FILTER_DIM is the length of the filter and VECTOR_LEN is the number of cores apart of the vector group. This vector group would be able to process multiples of FILTER_DIM * VECTOR_LEN, but would fail for non-mulitples.

An obvious, but partial solution is to remove the part of the problem size that doesn't quite map to the vector groups i.e., with a mod operation. The question then becomes where to do this unmapped work. There are three solutions enumerated below.

  1. Do the work sequentially on the host CPU. This might be the right solution for a convetional discrete vector accelerator.
  2. Do the work in parallel using the manycore after the fact.
  3. Same as 2, but try to schedule the manycore work on inactive core in parallel to the main chunk of work.

The proposed architecture is in a somewhat unique scenario where we can deconfigure the vector groups and just use the more flexible manycore to finish the computation. This is preferred over running sequential on a host CPU because the manycore version will be much faster. It also might not a huge issue where we put this work because this unmapped part is likely a small part of the computation, but that may not be true in the case of a multidimensional kernel and you might have overhangs at the end of each row.

Reductions

To be determined.

Debugging

Common errors

gem5.opt: build/RVSP/cpu/io/iew.cc:258: void IEW::doWriteback(): Assertion 0 failed. is a segfault.

Remote GDB

Sometimes you can gdb into the software running on top of gem5. But haven't gotten to work with our software yet.

  • Run gem5 normally in one shell (running <software binary, build with -g>)

  • In another shell:

  • gdb <software binary, built with -g>

  • set remote Z-packet on

  • target remote 127.0.0.1:7000

  • c

Clone this wiki locally