Parallelize calculations in consequences-v3.10.0.py #58

anthonyfok · 2022-05-05T07:45:14Z

Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously.

This would reduce the total run time of, for example, bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o
from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Also, most Flake8 errors/warnings (mostly having to do with spacing) have been fixed.

Fixes #57

Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously. This would reduce the total run time of, for example, bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance. Fixes OpenDRR#57

anthonyfok · 2022-05-05T20:15:27Z

Uh oh, I spoke too soon. When run in parallel, we do not always get the exact results from the generated consequences-rlt*.csv files. Sometimes we are lucky and get entirely identical results, but sometimes discrepancies show up somewhat randomly.

For example:

Uh oh, there are discrepancies in the consequences CSV files for scripts/python3 consequences-v3.10.0.py -1 run (calc id 36) compared with calc_id 31 from yesterday's run.
Of the 7795376 lines across 16 files, 445 lines are different.
To see the difference:
cd ~afok/jr ; for i in consequences-rlz-*_36.csv; do echo; echo $i; colordiff -u $i ~/jr/1/${i/36/31}; done | less -R

Meanwhile, scripts/python3 consequences-v3.10.0.py -2 (calc_id 35) produced identical CSV files to those from calc_id 31

Sample difference:

-1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.2,0.6,0.1,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.2
+1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

Note the following columns: (sometimes 9th but not in this example), 10th to 12th, and the last two columns. These seems to correspond to:

9th: collapse_ratio_str
10th column: repair_time
11th column: recovery_time
12th column: interruption_time
2nd last column: debris_brick_wood
last column: debris_concrete_steel

And they involve Numpy np.dot calculations.

Adding the following to run_OQStandard.sh did not seem to help:

export MKL_NUM_THREADS=1
export MPI_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

Numpy, apparently installed using pip as part of the OpenQuake install process, is apparently built with OpenBLAS:

$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
>>>

Note to self: More information may be found in the Slack DM between @jeremyrimando and me on Thu 2022-05-05.

This is beyond my capability, so documenting the problem we are seeing and hopefully an expert in Python multiprocessing and Numpy can help resolve this problem! Many thanks! 🙏

Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Supersedes Pull Request OpenDRR#58 Fixes OpenDRR#57

Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57

anthonyfok · 2022-05-11T20:58:36Z

Superseded by Pull Request Use GNU parallel to run consequences processing in parallel #61

Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57

anthonyfok added 2 commits May 5, 2022 00:41

Fix Flake8 errors/warnings in consequences-v3.10.0.py

8b35ed2

anthonyfok mentioned this pull request May 11, 2022

Use GNU parallel to run consequences processing in parallel #61

Open

anthonyfok closed this May 11, 2022

anthonyfok mentioned this pull request Jan 9, 2024

Some notes on parallelizing Consequences calculations gem/oq-engine#9324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize calculations in consequences-v3.10.0.py #58

Parallelize calculations in consequences-v3.10.0.py #58

anthonyfok commented May 5, 2022

anthonyfok commented May 5, 2022

anthonyfok commented May 11, 2022

Parallelize calculations in consequences-v3.10.0.py #58

Parallelize calculations in consequences-v3.10.0.py #58

Conversation

anthonyfok commented May 5, 2022

anthonyfok commented May 5, 2022

anthonyfok commented May 11, 2022