Skip to content

Latest commit

 

History

History
67 lines (41 loc) · 3.78 KB

07_openmp.md

File metadata and controls

67 lines (41 loc) · 3.78 KB

IPython Cookbook, Second Edition This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.

Text on GitHub with a CC-BY-NC-ND license
Code on GitHub with a MIT license

Chapter 5 : High-Performance Computing

5.7. Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP

As we have seen in this chapter's introduction, CPython's GIL prevents pure Python code from taking advantage of multi-core processors. With Cython, we have a way to release the GIL temporarily in a portion of the code in order to enable multi-core computing. This is done with OpenMP, a multiprocessing API that is supported by most C compilers.

In this recipe, we will see how to parallelize the previous recipe's code on multiple cores.

Getting ready

To enable OpenMP in Cython, you just need to specify some options to the compiler. There is nothing special to install on your computer besides a good C compiler. See the instructions in this chapter's introduction for more details.

The code of this recipe has been written for gcc on Ubuntu. It can be adapted to other systems with minor changes to the %%cython options.

How to do it...

Our simple ray tracing engine implementation is "embarrassingly parallel" (see https://en.wikipedia.org/wiki/Embarrassingly_parallel); there is a main loop over all pixels, within which the exact same function is called repetitively. There is no crosstalk between loop iterations. Therefore, it would be theoretically possible to execute all iterations in parallel.

Here, we will execute one loop (over all columns in the image) in parallel with OpenMP.

You will find the entire code on the book's website (ray7 example). We will only show the most important steps here:

  1. We use the following magic command:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
  1. We import the prange() function:
from cython.parallel import prange
  1. We add nogil after each function definition in order to remove the GIL. We cannot use any Python variable or function inside a function annotated with nogil. For example:
cdef Vec3 add(Vec3 x, Vec3 y) nogil:
    return vec3(x.x + y.x, x.y + y.y, x.z + y.z)
  1. To run a loop in parallel over the cores with OpenMP, we use prange():
with nogil:
    for i in prange(w):
        # ...

The GIL needs to be released before using any parallel computing feature such as prange().

  1. With these changes, we reach a 3x speedup on a quad-core processor compared to the fastest version of the previous recipe.

How it works...

The GIL has been described in the introduction of this chapter. The nogil keyword tells Cython that a particular function or code section should be executed without the GIL. When the GIL is released, it is not possible to make any Python API calls, meaning that only C variables and C functions (declared with cdef) can be used.

See also

  • Accelerating Python code with Cython
  • Optimizing Cython code by writing less Python and more C
  • Distributing Python code across multiple cores with IPython