Skip to content

Latest commit

 

History

History
437 lines (307 loc) · 17.1 KB

about-sycl.rst

File metadata and controls

437 lines (307 loc) · 17.1 KB

About SYCL

SYCL has a lot of interesting advantages compared to plain OpenCL or other approaches:

  • SYCL is an open standard from Khronos with a working committee (you can contribute!) and we can expect several implementations (commercial or open source) on many platforms soon, ranging from GPU, APU, FPGA, DSP... down to plain CPU;

  • it offers a single-source C++ programming model that allows taking advantage of the modern C++14/C++17 superpower, unifying both the host and accelerator sides. For example it is possible to write generic accelerated functions on the accelerators in a terse way by using (variadic) templates, meta-programming and generic variadic lambda expressions. This allows to build templated libraries such as Eigen or TensorFlow in a seamless way;

  • SYCL abstracts and leverages the concepts behind OpenCL and provides higher-level concepts such as tasks (or command group in OpenCL SYCL jargon) that allow the runtime to take advantage of a more task graph-oriented view of the computations. This allows lazy data transfers between accelerators and host or to use platform capabilities such as OpenCL 2 SVM or HSA for sharing data between host and accelerators;

  • the entry cost of the technology is zero since, after all, an existing OpenCL or C++ program is a valid SYCL program;

  • the exit cost is low since it is pure C++ without any extension or #pragma, by opposition to C++AMP or OpenMP for example. Retargeting the SYCL classes and functions to use other frameworks such as OpenMP 4 or C++AMP is feasible without rewriting a new compiler for example;

  • easier debugging

    • since all memory accesses to array parameters in kernels go through accessors, all the memory bound checks can be done in them if needed;
    • since there is a pure host mode, the kernel code can be run also on the host and debugged using the usual tools and use any system (such <cstdio> or <iostream>...) or data libraries (for nice data visualization);
    • since the kernel code is C++ code even when run on an accelerator, instrumenting the code by using special array classes or overloading some operators allows deep intrusive debugging or code analysis without changing the algorithmic parts of the code;
  • SYCL is high-level standard modern C++ without any extension, that means that you can use your usual compiler and the host part can use at the same time some cool and common extensions such as OpenMP, OpenHMPP, OpenACC,... or libraries such as MPI or PGAS Coarray++, be linked with other parts written in other languages (Fortran...). Thus SYCL is already Exascale-ready!

  • even if SYCL hides the OpenCL world by default, it inherits from all the OpenCL world:

    • same interoperability as the OpenCL underlying platform: Vulkan, OpenGL, DirectX...
    • access to all the underlying basic OpenCL objects behind the SYCL abstraction for interoperability and hard-core optimization;
    • construction of SYCL objects from basic OpenCL objects to add some SYCL parts to an existing OpenCL application;
    • so it provides a continuum from higher-level programming à la C++AMP or OpenMP 4 down to low-level OpenCL, according to the optimization needs, from using simple OpenCL intrinsics or vector operation from the cl::sycl namespace down to providing a real OpenCL kernel to be executed without requiring all the verbose usual OpenCL host API.

    This OpenCL seamless integration plus the gradual optimization features are perhaps the most compelling arguments for SYCL because it allows high-level programming simplicity without giving-up hard-core performance when needed;

  • since the SYCL task graph execution model is asynchronous, this can be used by side effect to overcome some underlying OpenCL implementation limitations. For example, some OpenCL stacks may have only in-order execution queues or even synchronous (blocking) ND-range enqueue, or some weird constrained mapping between OpenCL programmer level queue(s) and the hardware queues.

    In this case, a SYCL implementation can deal with this, relying for example on multiple host CPU threads, multiple thread-local-storage (TLS) queues, its own scheduler, etc. atop the limited OpenCL stack to provide computation and communication overlap in a natural pain-free fashion. This relieves the programmer to reorganize her application to work around these limitation, which can be quite a cumbersome work.

For introduction material on the interest of DSEL in this area, look for example at these articles:

Some other implementations

Some other known implementations:

Some presentations and publications related to SYCL

By reverse chronological order:

There are also many interesting articles in the publication list from Codeplay.

Related projects