In this first exercise you will learn how to enqueue your first kernel function
to run on a device and print Hello World!
to the console.
The first thing you must do is create a queue
to submit work to. The simplest
way to do this is to default construct it, this will choose a device for you.
Once you have a queue
you can now submit work to be executed on the device
that queue
is targeting and this is done via a command group.
Define a lambda expression which takes a reference to a handler
to represent
your command group function and pass it to the submit
member function of the
queue
.
Note that submitting a command group without any commands will result in an error.
Within the command group function define a SYCL kernel function via the
single_task
command within the command group, which takes only a function
object which itself doesn't take any parameters.
Remember to declare a class for your kernel name in the global namespace. While it is possible to leave the class declaration inline in the handler scope, this can produce long kernel names that show up in profiler and debugger output and make it harder to use. Defining the kernel names out of local scope avoids this.
Also remember to call wait
on the event
returned from submit
to await the
completion of the kernel function.
Create a stream
object within the command group scope. The parameters to the
constructor of the stream
class are the total buffer size, the work-item
buffer size and the handler
.
Then use the stream you constructed within the SYCL kernel function to print
"Hello world!"
using the <<
operator.
For DPC++: Using CMake to configure then build the exercise:
mkdir build
cd build
cmake .. "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON
-DSYCL_ACADEMY_ENABLE_SOLUTIONS=OFF -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
make exercise_2
Alternatively from a terminal at the command line:
icpx -fsycl -o sycl-ex-2 -I../External/Catch2/single_include ../Code_Exercises/Exercise_02_Hello_World/source.cpp
./sycl-ex-2
In Intel DevCloud, to run computational applications, you will submit jobs to a queue for execution on compute nodes, especially some features like longer walltime and multi-node computation is only available through the job queue. Please refer to the guide.
So wrap the binary into a script job_submission
and run:
qsub job_submission
For AdaptiveCpp:
# <target specification> is a list of backends and devices to target, for example
# "omp;generic" compiles for CPUs with the OpenMP backend and GPUs using the generic single-pass compiler.
# The simplest target specification is "omp" which compiles for CPUs using the OpenMP backend.
cmake -DSYCL_ACADEMY_USE_ADAPTIVECPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=OFF
-DSYCL_ACADEMY_INSTALL_ROOT=/insert/path/to/adaptivecpp -DACPP_TARGETS="<target specification>" ..
make exercise_2
alternatively, without CMake:
cd Code_Exercises/Exercise_02_Hello_World
/path/to/adaptivecpp/bin/acpp -o sycl-ex-2 -I../../External/Catch2/single_include --acpp-targets="<target specification>" source.cpp
./sycl-ex-2