-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile activity detected when running Sharrow in production mode #756
Comments
It is likely that (re)compiling is being triggered because some DataFrame column data type in production mode is different from the type in the compile step. This can happen sometimes due to corner cases (e.g. rare instances where no choice is valid and the choice comes back as "null" instead of an integer) or just having more observations, if a column is promoted a value to a larger bit width to prevent an overflow. Solutions can include (a) run a much larger sample in the compile step so you encounter all these corner cases there instead of on production, (b) just run production again, all your compiling should be cached now, and/or (c) look forward to a future version of ActivitySim where an explicit data model prevents dtypes from changing unpredictably during a model run. |
Thanks, @jpn--. I tested your solutions. Solution (a) was not easy to work with, since even with running 50% sample in test mode, the recompiling seemed to happen in production. Running 100% in test mode weirdly resulted in a memory crash. I tested Solution (b), which successfully resulted in no more recompiling note in the timing log under a subsequent production run, but that run still took as long (300+ mins). So, I am not sure how much (or if) this recompiling bug was a problem. My tests are showing that if I take out all the calibration constants (of which we have about 40), the production runtime decreases to 38mins. Those constants are defined in the following format: with _COUNTY being a temp variable defined at the top. Do you see any problem with this way of defining the constants? |
@aletzdy is the spec file for this component published somewhere on GitHub where I can see it? If not can you send it to me? Thanks |
it is similar to the mwcog_example spec, but with some calibration-related updates: |
Another piece of potentially relevant info: the current model implementation reads in the area type and county omx files as separate omx files. These pseudo-skim files are created separately using a python script to allow fetching the county or area type of an alternative destination in workplace location model. I initially suspected that this might be the issue, so I merged all the skim files into one and made sure the zarr digital encoding is working fine (checked the created zarr cache), and it all looks good to me. but I am not sure if there might be a datatype issue here. |
@jpn-- I wanted to check back on this issue and see if you have any suggestions on how we can fix it. |
Closed by #782 |
Describe the bug
This bug is encountered in the MWCOG model. The compile step of Sharrow (running in test mode) concludes successfully, with sharrowcache folder created. Running the model in Sharrow production mode also concludes successfully, but runtimes of a few model steps (in addition to the overall runtime) are much longer compared to the non-Sharrow version. Workplace location, specifically, stands out, with the sharrow version taking 320mins vs. 50mins in the non-Sharrow mode. The notes column in the (sharrow production mode's) timing_log.csv shows compiled information for these steps. According to the source code, this is a bug that needs to be investigated.
@jpn-- Any suggestions on why this is happening?
The text was updated successfully, but these errors were encountered: