Compile activity detected when running Sharrow in production mode #756

aletzdy · 2023-10-18T16:18:55Z

Describe the bug

This bug is encountered in the MWCOG model. The compile step of Sharrow (running in test mode) concludes successfully, with sharrowcache folder created. Running the model in Sharrow production mode also concludes successfully, but runtimes of a few model steps (in addition to the overall runtime) are much longer compared to the non-Sharrow version. Workplace location, specifically, stands out, with the sharrow version taking 320mins vs. 50mins in the non-Sharrow mode. The notes column in the (sharrow production mode's) timing_log.csv shows compiled information for these steps. According to the source code, this is a bug that needs to be investigated.

@jpn-- Any suggestions on why this is happening?

jpn-- · 2023-10-18T17:08:29Z

It is likely that (re)compiling is being triggered because some DataFrame column data type in production mode is different from the type in the compile step. This can happen sometimes due to corner cases (e.g. rare instances where no choice is valid and the choice comes back as "null" instead of an integer) or just having more observations, if a column is promoted a value to a larger bit width to prevent an overflow. Solutions can include (a) run a much larger sample in the compile step so you encounter all these corner cases there instead of on production, (b) just run production again, all your compiling should be cached now, and/or (c) look forward to a future version of ActivitySim where an explicit data model prevents dtypes from changing unpredictably during a model run.

aletzdy · 2023-11-06T19:46:04Z

Thanks, @jpn--.

I tested your solutions. Solution (a) was not easy to work with, since even with running 50% sample in test mode, the recompiling seemed to happen in production. Running 100% in test mode weirdly resulted in a memory crash. I tested Solution (b), which successfully resulted in no more recompiling note in the timing log under a subsequent production run, but that run still took as long (300+ mins). So, I am not sure how much (or if) this recompiling bug was a problem.

My tests are showing that if I take out all the calibration constants (of which we have about 40), the production runtime decreases to 38mins. Those constants are defined in the following format:
@np.where((df['home_jurisdiction']==0) & (_COUNTY==0), 1, 0)

with _COUNTY being a temp variable defined at the top. Do you see any problem with this way of defining the constants?

jpn-- · 2023-11-06T19:53:28Z

@aletzdy is the spec file for this component published somewhere on GitHub where I can see it? If not can you send it to me? Thanks

aletzdy · 2023-11-06T19:57:15Z

it is similar to the mwcog_example spec, but with some calibration-related updates:
workplace_location_mwcog.csv

aletzdy · 2023-11-06T20:38:50Z

Another piece of potentially relevant info: the current model implementation reads in the area type and county omx files as separate omx files. These pseudo-skim files are created separately using a python script to allow fetching the county or area type of an alternative destination in workplace location model. I initially suspected that this might be the issue, so I merged all the skim files into one and made sure the zarr digital encoding is working fine (checked the created zarr cache), and it all looks good to me. but I am not sure if there might be a datatype issue here.

aletzdy · 2023-11-28T14:45:43Z

@jpn-- I wanted to check back on this issue and see if you have any suggestions on how we can fix it.

jpn-- · 2024-07-26T15:00:32Z

Closed by #782

aletzdy added the Bug Something isn't working/bug f label Oct 18, 2023

jpn-- mentioned this issue Feb 14, 2024

Indiscriminate conversion of string fields to categorical is problematic #799

Open

jpn-- closed this as completed Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile activity detected when running Sharrow in production mode #756

Compile activity detected when running Sharrow in production mode #756

aletzdy commented Oct 18, 2023

jpn-- commented Oct 18, 2023

aletzdy commented Nov 6, 2023

jpn-- commented Nov 6, 2023

aletzdy commented Nov 6, 2023

aletzdy commented Nov 6, 2023

aletzdy commented Nov 28, 2023

jpn-- commented Jul 26, 2024

Compile activity detected when running Sharrow in production mode #756

Compile activity detected when running Sharrow in production mode #756

Comments

aletzdy commented Oct 18, 2023

jpn-- commented Oct 18, 2023

aletzdy commented Nov 6, 2023

jpn-- commented Nov 6, 2023

aletzdy commented Nov 6, 2023

aletzdy commented Nov 6, 2023

aletzdy commented Nov 28, 2023

jpn-- commented Jul 26, 2024