-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize calculations in consequences-v3.10.0.py #58
Parallelize calculations in consequences-v3.10.0.py #58
Conversation
Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously. This would reduce the total run time of, for example, bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance. Fixes OpenDRR#57
Uh oh, I spoke too soon. When run in parallel, we do not always get the exact results from the generated consequences-rlt*.csv files. Sometimes we are lucky and get entirely identical results, but sometimes discrepancies show up somewhat randomly. For example:
Sample difference: -1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.2,0.6,0.1,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.2
+1667113-COM2-RM1L-PC,1.0,"555,750.0","906,750.0","500,175.0",35.5,0.0,13.8,0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 Note the following columns: (sometimes 9th but not in this example), 10th to 12th, and the last two columns. These seems to correspond to:
And they involve Numpy np.dot calculations. Adding the following to run_OQStandard.sh did not seem to help:
Numpy, apparently installed using pip as part of the OpenQuake install process, is apparently built with OpenBLAS:
Note to self: More information may be found in the Slack DM between @jeremyrimando and me on Thu 2022-05-05. This is beyond my capability, so documenting the problem we are seeing and hopefully an expert in Python multiprocessing and Numpy can help resolve this problem! Many thanks! 🙏 |
Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Supersedes Pull Request OpenDRR#58 Fixes OpenDRR#57
Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57
|
Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57
Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57
Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57
Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously.
This would reduce the total run time of, for example,
bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o
from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.
Also, most Flake8 errors/warnings (mostly having to do with spacing) have been fixed.
Fixes #57