Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize calculations in consequences-v3.10.0.py #58

Closed

Conversation

anthonyfok
Copy link
Member

Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously.

This would reduce the total run time of, for example, bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o
from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Also, most Flake8 errors/warnings (mostly having to do with spacing) have been fixed.

Fixes #57

anthonyfok added 2 commits May 5, 2022 00:41
Use Python multiprocessing package to take advantage of multiple CPU cores
for processing multiple realizations simultaneously.

This would reduce the total run time of, for example,

    bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o

from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Fixes OpenDRR#57
@anthonyfok
Copy link
Member Author

Uh oh, I spoke too soon. When run in parallel, we do not always get the exact results from the generated consequences-rlt*.csv files. Sometimes we are lucky and get entirely identical results, but sometimes discrepancies show up somewhat randomly.

For example:

Uh oh, there are discrepancies in the consequences CSV files for scripts/python3 consequences-v3.10.0.py -1 run (calc id 36) compared with calc_id 31 from yesterday's run.
Of the 7795376 lines across 16 files, 445 lines are different.
To see the difference:
cd ~afok/jr ; for i in consequences-rlz-*_36.csv; do echo; echo $i; colordiff -u $i ~/jr/1/${i/36/31}; done | less -R

Meanwhile, scripts/python3 consequences-v3.10.0.py -2 (calc_id 35) produced identical CSV files to those from calc_id 31

Sample difference:

-1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.2,0.6,0.1,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.2
+1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

Note the following columns: (sometimes 9th but not in this example), 10th to 12th, and the last two columns. These seems to correspond to:

  • 9th: collapse_ratio_str
  • 10th column: repair_time
  • 11th column: recovery_time
  • 12th column: interruption_time
  • 2nd last column: debris_brick_wood
  • last column: debris_concrete_steel

And they involve Numpy np.dot calculations.

Adding the following to run_OQStandard.sh did not seem to help:

export MKL_NUM_THREADS=1
export MPI_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

Numpy, apparently installed using pip as part of the OpenQuake install process, is apparently built with OpenBLAS:

$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
>>> 

Note to self: More information may be found in the Slack DM between @jeremyrimando and me on Thu 2022-05-05.

This is beyond my capability, so documenting the problem we are seeing and hopefully an expert in Python multiprocessing and Numpy can help resolve this problem! Many thanks! 🙏

anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this pull request May 11, 2022
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Supersedes Pull Request OpenDRR#58

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this pull request May 11, 2022
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
@anthonyfok
Copy link
Member Author

@anthonyfok anthonyfok closed this May 11, 2022
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this pull request Oct 27, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this pull request Oct 31, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this pull request Nov 2, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate if scripts/consequences-v3.10.0.py could be optimized
1 participant