OpenMP in Pyclaw #527

weslowrie · 2015-10-16T21:29:30Z

I have updated step3ds.f90 in PyClaw to be able to use openmp. Some small changes were necessary to classic/solver.py to get the openmp environment variable OMP_NUM_THREADS and to properly set the size of the work array. Also the call to step3ds() was changed slightly such that there are not assumed size arrays (f2py does not like these; nthreads is passed in instead)
Additionally classic/setup.py was modified to explicitly add the '-fopenmp' f90 compiler flag as well as adding the 'gomp' library. I'm not sure if this is the best way to handle this, but it does work.

I have not added the openmp directive to step3.f90, but it should be mostly trivial at this point.

…lit algorithm. The unsplit algorithm can also be updated. Usage requires setting the environment variable OMP_NUM_THREADS before program execution. In addition the gfortran flag '-fopenmp' and the f2py flag '-lgomp' are required when compiling step3ds.f90. Added F90 and f2py options for OpenMP (-fopenmp, and gomp => -lgomp) in classic/setup.py so that when compiling 'classic3.so' the openmp lines are compiled.

…eads=1

…e in a dimension size in order to trick f2py into not making nthreads an optional argument. Otherwise it used nthreads=shape(qadd,2) as a size. Since qadd is optional, it get a bad value when qadd is not in the call to step3ds(), which is the default for pyclaw.

mandli · 2015-10-17T16:44:04Z

src/pyclaw/classic/solver.py

-        self.aux2 = np.empty((num_aux,maxm+2*num_ghost,3),order='F')
-        self.aux3 = np.empty((num_aux,maxm+2*num_ghost,3),order='F')
-        mwork = (maxm+2*num_ghost) * (31*num_eqn + num_waves + num_eqn*num_waves)
+        print "nthreads:",self.nthreads


Probably want to remove the print line.

Yes, I agree. It was left from debugging.

mandli · 2015-10-17T16:59:29Z

We should modify the travis test case so that it tests this code (set OMP_NUM_THREADS=2 maybe). I am also not entirely comfortable with having to directly use the environment variable rather than leave it up to the system. I noticed that the number of threads is being used to allocate work space but I am uncertain if I completely get why OpenMP cannot do this for us.

weslowrie · 2015-10-17T17:09:32Z

The line with
'int(os.environ.get('OMP_NUM_THREADS',1))'
gets the value set by the environment variable or if not set defaults to 1.
That would be nice if openmp could handle the work array sizing, as well as the extra dimension on aux1, qadd, etc. You may have noticed there is a slight hack to get f2py to not assume that 'nthreads' is optional.

mandli · 2015-10-17T18:16:38Z

Yeah, f2py is a bit annoying like that. In any case, if an array enters a parallel region and forks to the threads it should make copies of anything in the stack unless otherwise specified (via a shared directive). That being said, if these threads are being forked and collected often then it might be better to create persistent storage. I wonder if instead of using the environment variable that there is some way to fetch the desired number of threads.

removed debugging print statement

weslowrie · 2015-10-19T15:46:18Z

@mandli

I wonder if instead of using the environment variable that there is some way to fetch the desired number of threads.

Are you wondering if we can query the system to find an optimal number of threads? Or maybe setting the threads via program input rather than using the environment variable?

We could set input on the python side with something like this:
nthreads = os.environ["OMP_NUM_THREADS"] = "2"

mandli · 2015-10-19T15:49:40Z

There are a number of ways to set the number of threads that OpenMP can use beyond just the environment variable. In the OpenMP API there is a way to query this but I am not sure how to do this in Python.

mandli · 2015-10-24T14:32:29Z

I did a bit of looking into this but could not find a good way to call omp_get_max_threads which is what I was thinking. I do have a question though, how would this interace with the process based parallelism, is their a way to spawn say 16 threads on a node and have the PETSc library handle inter-node communication?

ketch · 2015-10-25T07:47:25Z

This is great -- thanks, @weslowrie ! My main question is: does this cost us anything when running without OpenMP? I see two potential concerns:

Does compiling the code with -fopenmp make anything slower?
Is it at all probable that users will try to compile the code on a system without OpenMP and thus get compilation failure? Or does installation of gcc/gfortran always include OpenMP?

@mandli I guess the first thing to do is just try a PETSc+OpenMP run. @weslowrie have you tried that already by any chance?

weslowrie · 2015-10-26T16:46:12Z

Does compiling the code with -fopenmp make anything slower?
Is it at all probable that users will try to compile the code on a system without OpenMP and thus get compilation failure? Or does installation of gcc/gfortran always include OpenMP?

I'm not sure about either of these, probably worth some tests. I think it is possible to NOT include the 'gomp' library while still having gcc, so some systems might not have the proper openmp library.

I recently noticed that the self.nthreads = int(os.environ.get('OMP_NUM_THREADS',1)) line is not working as intended when the environment variable is not set. I'll look into this.

@mandli I guess the first thing to do is just try a PETSc+OpenMP run. @weslowrie have you tried that already by any chance?

I have not tried the MPI+OpenMP combo yet. I might be able to run a test on a machine with multiple nodes to see if it works. I'll let you know if I am able to successfully test this.

weslowrie · 2015-10-26T18:53:10Z

I'm trying to have a default behavior when the openmp env variable OMP_NUM_THREADS is unset. The problem is the line in step3ds():
!$ nthreads = omp_get_num_threads()
sets nthreads to the number of processors present on the system if the env variable is unset. This contradicts the python side, which sets self.nthreads=1 when unset, and the mwork array is sized incorrectly.

I'm not sure how to handle this yet. A child process cannot change the shell env variable, which is what openmp looks for. I also did not see an obvious way to override openmp's decision when omp_get_num_threads() is called. Anyone see an acceptable way to handle this?

mandli · 2015-10-26T19:28:19Z

omp_get_num_threads should return the current number of threads regardless of what OMP_NUM_THREADS is. omp_get_max_threads without OMP_NUM_THREADS defaults to the number of threads defined by the system as the maximum number allowed. This was one of the reasons I was a bit concerned about using OMP_NUM_THREADS in Python. Maybe we should just require that OMP_NUM_THREADS is set or create a simple Fortran subroutine that asks it how many threads are available and call it from python.

ketch · 2015-10-28T12:57:24Z

@weslowrie Some testing is being done now by @aymkhalil -- we're going to see if we can run with MPI+OpenMP on the new supercomputer here, and if it helps over just vanilla MPI.

I'm pretty sure we'll want to merge this in, so could you go ahead and add the OMP modifications to step3.f90 too?

weslowrie · 2015-10-28T19:20:09Z

@ketch Yes no problem. I'll update step3.f90 as well.

We should find a resolution to the case where OMP_NUM_THREADS is unset or there is no openmp on a system. One possible easy solution that @mandli suggested is to just check for the OMP_NUM_THREADS env variable and exit the program if it is unset.
Would we want the default to be a run with the maximum possible number of threads?

Also, in the python setup.py do we need to have a way to skip -fopenmp and gomp compile flags if openmp is not present on the system?

donnacalhoun · 2015-10-28T20:19:19Z

Can you just use the

#if defined(_OPENMP)
    #pragma omp ...
#endif

pre-processing macro? This is defined by OpenMP standard. See

http://stackoverflow.com/questions/1300180/ignore-openmp-on-machine-that-doesnt-have-it

and

http://bisqwit.iki.fi/story/howto/openmp/#Discussion

These are C/C++ macros, but I would imagine that some thing similar is available in Fortran.

mandli · 2015-10-28T21:08:43Z

It is but I think @weslowrie needs it in Python which we have not been able to figure out beyond running a call through C or Fortran.

…of CPUs when the OMP_NUM_THREADS environment varibale is unset. This is consistent with what OpenMP returns for omp_get_num_threads() when the env variable is unset. Updated step3.f90 to use OpenMP, based on what is done in clawpack/classic. This was sucessfully tested with the Sedov.py test problem in the euler_3d example folder.

…nto pyclaw_openmp

weslowrie · 2015-10-29T06:38:17Z

I have added a modified step3.f90 to include OpenMP as done in classic clawpack. I tested this with the euler_3d, Sedov.py and the regression test within the folder. OpenMP appears to be working properly.
Also as a possibly temporary fix, I used the python multiprocessing module to check for the number of CPUs. I have used this to set the default number of threads, which is consistent with what omp_get_num_thread() returns when the environment variable is unset. This may not work on all systems, and a more rigorous solution probably needs to be implemented.

mandli · 2015-10-29T19:04:46Z

Good solution!

mandli · 2015-11-04T15:16:30Z

I may be wrong about this but if someone were to not set OMP_NUM_THREADS=1 and not compile with the OpenMP library incur a memory footprint penalty up to the number of cores they have, at least for the aux arrays?

weslowrie · 2015-11-04T18:52:09Z

@mandli yes you are correct, and it is a problem. Not only are the aux arrays larger, but the work array is larger as well set in _allocate_workspace() in solver.py. This would occur even if they did not compile with the OpenMP library. I'll have to think of a better solution.

How does the PyClaw setup typically deal with custom compiler flags? Should OpenMP be the default, or should we expect the user to set custom flags while compiling/installing PyClaw?

mandli · 2015-11-04T21:25:30Z

The flags are usually in the setup.py files in the relevant src directories.

Interesting question about the default though, should OpenMP be enabled and use all of the cores on a machine if someone did not set OMP_NUM_THREADS? Seems we may want to require OMP_NUM_THREADS to be set to use this but I may be convinced otherwise.

ketch · 2015-11-05T06:01:08Z

@mandli: Don't AMRClaw and GeoClaw use OpenMP now? How are these issues handled there?

…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.

weslowrie · 2015-11-05T17:39:39Z

I made some changes, and I think it gives a desirable default behavior. The python code checks if the OMP_NUM_THREADS variable is set, if not it gives a warning and sets the default array sizes with self.nthreads=1.
On the Fortran side, it also checks to see if the environment variable is set, if not it forces the OpenMP number of threads to 1 with:
!$ call omp_set_num_threads(1)
Otherwise it continues as before and reads the environment variable.

I think this gives a desirable default behavior as an unsuspecting user will only see a warning about the unset env variable, and can set it if they want. If one sets the env variable, then it just does the calculation with set number of threads. This way we don't have any arrays that are larger than necessary.

Possible last fix, and I don't know how to do this:
Check if the OpenMP compiler flags and libraries are used in compiling PyClaw and then suppress the OMP_NUM_THREADS env variable warning that was just added to solver.py

mandli · 2015-11-05T22:38:46Z

I just cannot remember if warnings are printed to the console or just to the log file if using the standard PyClaw loggers. Speaking of that you also might want to instead use the logger attached to the solver instead. You can use it with

self.logger.warning(warning_stuff)

…console and to the log file.

weslowrie · 2015-11-06T17:34:53Z

@mandli I modified it so it uses the PyClaw logger as suggested. It writes the warning to the console and the log file with this method. I did have to move the code below the
super(ClawSolver3D,self).__init__(riemann_solver, claw_package)
line because the logger otherwise had not been instantiated yet.

mandli · 2015-11-06T18:11:18Z

I would perhaps send the message to a level that is not sent to the console so as to not worry users who are not concerned with such things. This should still send the message to the log file though so astute users can look there.

weslowrie · 2015-11-06T18:36:07Z

@mandli Do you mean setting it to the INFO or DEBUG logger level? I'm not sure why, but if debug level is used, it is written neither to the console or file.

weslowrie · 2015-11-06T19:30:00Z

@mandli I guess the first thing to do is just try a PETSc+OpenMP run. @weslowrie have you tried that already by any chance?

I tried a PETSc + OpenMP run, and superficially it looks like it is working well. I did not time the runs, but it was a noticeable speedup. I ran on 4 nodes (4 MPI jobs), with 4 OpenMP threads per node. The machine I used has 4 CPUs per node.

mandli · 2015-11-06T21:00:40Z

Yes to the logger level. Which is not being sent to the logfile?

weslowrie · 2015-11-06T21:11:19Z

If I modify the code to:
self.logger.debug(message)
The message is sent to neither the console nor logfile. Probably due to the default level set for the solver logging. I don't think I should modify the solver logging levels here.

mandli · 2015-11-06T21:20:21Z

Huh, I thought that one of the levels got sent only to the file. Must have been wrong. I would avoid changing the logging perhaps.

weslowrie · 2015-11-13T19:23:47Z

@mandli Do you think it is worth investigating other solutions to the warning output/logging? Or just leave this as-is?

mandli · 2015-11-14T03:00:55Z

@weslowrie I think at this point it is fine as-is

…lit algorithm. The unsplit algorithm can also be updated. Usage requires setting the environment variable OMP_NUM_THREADS before program execution. In addition the gfortran flag '-fopenmp' and the f2py flag '-lgomp' are required when compiling step3ds.f90. This is done for a custom version of classic3.so, and might also be necessary for the general version.

…so that when compiling 'classic3.so' the openmp lines are compiled.

…eads=1

…of CPUs when the OMP_NUM_THREADS environment varibale is unset. This is consistent with what OpenMP returns for omp_get_num_threads() when the env variable is unset. Updated step3.f90 to use OpenMP, based on what is done in clawpack/classic. This was sucessfully tested with the Sedov.py test problem in the euler_3d example folder.

…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.

…console and to the log file.

…law_openmp Update again.

…hdf5

mandli · 2017-11-25T00:43:54Z

This is already in a PR, the question is whether the default method in that example should also be changed.

…s using hdf5" This reverts commit 0b595de.

…ch from dimensional split to non-dimensional split solver.

…law_openmp

weslowrie added 3 commits October 16, 2015 14:04

Fixed bug when OMP_NUM_THREADS env variable is not set, and uses nthr…

2cffe8b

…eads=1

mandli reviewed Oct 17, 2015
View reviewed changes

Update solver.py

3bc6ba8

removed debugging print statement

weslowrie added 2 commits October 28, 2015 23:26

Merge branch 'pyclaw_openmp' of https://github.com/weslowrie/pyclaw i…

015ba3d

…nto pyclaw_openmp

Updated implementation to use 1 OpenMP thread by default and give a w…

9a871d6

…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.

small simplification to OMP_NUM_THREADS env variable check.

44230e8

Updated OpenMP warning to use the PyClaw logging. This prints to the …

2d325b4

…console and to the log file.

weslowrie added 8 commits July 5, 2016 15:05

Added F90 and f2py options for OpenMP (-fopenmp, and gomp => -lgomp) …

87f5f16

…so that when compiling 'classic3.so' the openmp lines are compiled.

Fixed bug when OMP_NUM_THREADS env variable is not set, and uses nthr…

00880f6

…eads=1

removed debugging print statement

3dad707

Updated implementation to use 1 OpenMP thread by default and give a w…

ba15781

…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.

small simplification to OMP_NUM_THREADS env variable check.

55c4a3d

Updated OpenMP warning to use the PyClaw logging. This prints to the …

92158db

…console and to the log file.

weslowrie force-pushed the pyclaw_openmp branch from 2d325b4 to 92158db Compare July 5, 2016 22:14

Merge branch 'master' of https://github.com/weslowrie/pyclaw into pyc…

b20bf4f

…law_openmp Update again.

clawpack deleted a comment from coveralls Jul 11, 2017

Various changes to pyclaw. Updates to properly handle restarts using …

0b595de

…hdf5

weslowrie force-pushed the pyclaw_openmp branch from 26f6238 to 0b595de Compare November 27, 2017 17:46

weslowrie added 4 commits November 27, 2017 09:47

Revert "Various changes to pyclaw. Updates to properly handle restart…

cb6a50d

…s using hdf5" This reverts commit 0b595de.

after_step readded to commit, and comments to be able to quickly swit…

7a87fef

…ch from dimensional split to non-dimensional split solver.

Merge branch 'master' of https://github.com/weslowrie/pyclaw into pyc…

851b522

…law_openmp

removed the dimensional/non-dimensional split comments.

f4c60a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenMP in Pyclaw #527

OpenMP in Pyclaw #527

weslowrie commented Oct 16, 2015

mandli Oct 17, 2015

weslowrie Oct 17, 2015

mandli commented Oct 17, 2015

weslowrie commented Oct 17, 2015

mandli commented Oct 17, 2015

weslowrie commented Oct 19, 2015

mandli commented Oct 19, 2015

mandli commented Oct 24, 2015

ketch commented Oct 25, 2015

weslowrie commented Oct 26, 2015

weslowrie commented Oct 26, 2015

mandli commented Oct 26, 2015

ketch commented Oct 28, 2015

weslowrie commented Oct 28, 2015

donnacalhoun commented Oct 28, 2015

mandli commented Oct 28, 2015

weslowrie commented Oct 29, 2015

mandli commented Oct 29, 2015

mandli commented Nov 4, 2015

weslowrie commented Nov 4, 2015

mandli commented Nov 4, 2015

ketch commented Nov 5, 2015

weslowrie commented Nov 5, 2015

mandli commented Nov 5, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 6, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 13, 2015

mandli commented Nov 14, 2015

mandli commented Nov 25, 2017

OpenMP in Pyclaw #527

Are you sure you want to change the base?

OpenMP in Pyclaw #527

Conversation

weslowrie commented Oct 16, 2015

mandli Oct 17, 2015

Choose a reason for hiding this comment

weslowrie Oct 17, 2015

Choose a reason for hiding this comment

mandli commented Oct 17, 2015

weslowrie commented Oct 17, 2015

mandli commented Oct 17, 2015

weslowrie commented Oct 19, 2015

mandli commented Oct 19, 2015

mandli commented Oct 24, 2015

ketch commented Oct 25, 2015

weslowrie commented Oct 26, 2015

weslowrie commented Oct 26, 2015

mandli commented Oct 26, 2015

ketch commented Oct 28, 2015

weslowrie commented Oct 28, 2015

donnacalhoun commented Oct 28, 2015

mandli commented Oct 28, 2015

weslowrie commented Oct 29, 2015

mandli commented Oct 29, 2015

mandli commented Nov 4, 2015

weslowrie commented Nov 4, 2015

mandli commented Nov 4, 2015

ketch commented Nov 5, 2015

weslowrie commented Nov 5, 2015

mandli commented Nov 5, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 6, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 6, 2015

mandli commented Nov 6, 2015

weslowrie commented Nov 13, 2015

mandli commented Nov 14, 2015

mandli commented Nov 25, 2017