Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support sharrow on 2-zone model #867

Merged
merged 48 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
915e8d7
test external regional model examples
jpn-- Apr 23, 2024
3038256
cache buster
jpn-- Apr 23, 2024
3bdc1a4
optional variable doc
jpn-- Apr 24, 2024
b876e9c
fix conda cache dirs
jpn-- Apr 24, 2024
f578e72
Merge branch 'main' into external-regions
jpn-- Apr 24, 2024
9c282f0
Merge branch 'main' into external-regions
jpn-- Apr 24, 2024
e41054f
Merge branch 'main' into external-regions
jpn-- Apr 25, 2024
6438f59
trip_destination alts preprocessor
dhensle May 2, 2024
33097eb
non_hh_veh cat, drop unused cols for alts
dhensle May 4, 2024
a61b2d5
blacken
dhensle May 4, 2024
4c4a9d9
adding missed alts columns used in xborder model
dhensle May 4, 2024
6cb139b
remove unneeded addition to categorical
dhensle May 6, 2024
0fa0269
clearer time logging
jpn-- May 2, 2024
35a57d4
bump required numba to 0.57 for np.nan_to_num
jpn-- May 6, 2024
5ea7362
sharrow docs
jpn-- May 6, 2024
ac2468f
use compute_setting in sharrow debugging
jpn-- May 6, 2024
97ff87b
fix comment
jpn-- May 6, 2024
26aef51
debug helper values
jpn-- May 6, 2024
16fc8e2
dtype compute fixes
jpn-- May 6, 2024
75d1a1d
land_use_columns_orig
jpn-- May 7, 2024
11a01a7
fix and test orig_land_use with explicit chunking
jpn-- May 7, 2024
5eb076a
repair
jpn-- May 7, 2024
0c7221a
add missing test result file
jpn-- May 7, 2024
b7b7244
omx_ignore_patterns
jpn-- May 8, 2024
17826ed
revert change to drop size terms
dhensle May 9, 2024
96483c0
Merge branch 'main' into trip_dest_alts_preprocess
jpn-- May 9, 2024
077b8a1
creating separate sample and simulate preprocessors
dhensle May 9, 2024
bd03050
Merge branch 'trip_dest_alts_preprocess' of https://github.com/dhensl…
dhensle May 9, 2024
b911f1f
bugfix
jpn-- May 13, 2024
3cabeda
skim_dataset loading without dask
jpn-- May 13, 2024
0016ea5
require sharrow 2.9
jpn-- May 13, 2024
ee5b8cc
Merge branch 'trip_dest_alts_preprocess' into sharrow-fix-all
jpn-- May 13, 2024
e630edf
wait to close open files
jpn-- May 14, 2024
2607029
require sharrow 2.9.1
jpn-- May 14, 2024
f1ee710
Merge commit '564c4762944d3288097eb4ef761bdbdb0f9d4d9f' into sharrow-…
jpn-- May 14, 2024
96d4bb6
landuse index sort before sharrow recode check
dhensle May 20, 2024
79a1a0a
decode time periods
jpn-- May 21, 2024
0bfa915
use original tazs where possible
jpn-- May 21, 2024
79cd6a2
Merge branch 'main' into external-regions
jpn-- May 21, 2024
d02a709
Merge commit '79cd6a2544c4162469057e99b6ec327f3f362306' into sharrow-…
jpn-- May 22, 2024
a551bfa
update numba in envs to 0.57
jpn-- May 22, 2024
93ed2df
Merge branch 'main' into sharrow-fix-all
jpn-- May 22, 2024
d98f776
no fastmath in tour mode choice
jpn-- May 23, 2024
b465dd0
sharrow cache by version
jpn-- Jun 2, 2024
fe13e93
include sharrow setting in log by defualt
dhensle Jun 19, 2024
2f260f2
Merge commit 'bd48d3db3624a20771095cf3252549eb10315375' into sharrow-…
jpn-- Jun 21, 2024
fcf7295
use dask if required
jpn-- Jun 21, 2024
c9d4205
store_skims_in_shm setting
jpn-- Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 0 additions & 78 deletions .github/workflows/core_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install activitysim
Expand Down Expand Up @@ -147,19 +134,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install activitysim
Expand Down Expand Up @@ -244,19 +218,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install activitysim
Expand Down Expand Up @@ -341,19 +302,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install activitysim
Expand Down Expand Up @@ -408,19 +356,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install activitysim
Expand Down Expand Up @@ -474,19 +409,6 @@ jobs:
- name: Update environment
run: |
mamba env update -n asim-test -f conda-environments/github-actions-tests.yml
mamba install --yes \
"psutil=5.9.5" \
"pydantic=2.6.1" \
"pypyr=5.8.0" \
"pytables=3.6.1" \
"pytest-cov" \
"pytest-regressions=2.5.0" \
"scikit-learn=1.2.2" \
"sharrow>=2.6.0" \
"simwrapper=1.8.5" \
"xarray=2023.2.0" \
"zarr=2.14.2" \
"zstandard=0.21.0"
if: steps.cache.outputs.cache-hit != 'true'

- name: Install Larch
Expand Down
35 changes: 34 additions & 1 deletion activitysim/abm/models/accessibility.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,25 @@ class AccessibilitySettings(PydanticReadable):
CONSTANTS: dict[str, Any] = {}

land_use_columns: list[str] = []
"""Only include the these columns in the computational tables
"""Only include the these columns in the computational tables.

This setting joins land use columns to the accessibility destinations.

Memory usage is reduced by only listing the minimum columns needed by
the SPEC, and nothing extra.
"""

land_use_columns_orig: list[str] = []
"""Join these land use columns to the origin zones.

This setting joins land use columns to the accessibility origins.
To disambiguate from the destination land use columns, the names of the
columns added will be prepended with 'landuse_orig_'.

Memory usage is reduced by only listing the minimum columns needed by
the SPEC, and nothing extra.

.. versionadded:: 1.3
"""

SPEC: str = "accessibility.csv"
Expand Down Expand Up @@ -55,6 +70,7 @@ def compute_accessibilities_for_zones(
state: workflow.State,
accessibility_df: pd.DataFrame,
land_use_df: pd.DataFrame,
orig_land_use_df: pd.DataFrame | None,
assignment_spec: dict,
constants: dict,
network_los: los.Network_LOS,
Expand All @@ -69,6 +85,7 @@ def compute_accessibilities_for_zones(
state : workflow.State
accessibility_df : pd.DataFrame
land_use_df : pd.DataFrame
orig_land_use_df : pd.DataFrame | None
assignment_spec : dict
constants : dict
network_los : los.Network_LOS
Expand Down Expand Up @@ -101,6 +118,12 @@ def compute_accessibilities_for_zones(
logger.debug(f"{trace_label}: tiling land_use_columns into od_data")
for c in land_use_df.columns:
od_data[c] = np.tile(land_use_df[c].to_numpy(), orig_zone_count)
if orig_land_use_df is not None:
logger.debug(f"{trace_label}: repeating orig_land_use_columns into od_data")
for c in orig_land_use_df:
od_data[f"landuse_orig_{c}"] = np.repeat(
orig_land_use_df[c], dest_zone_count
)
logger.debug(f"{trace_label}: converting od_data to DataFrame")
od_df = pd.DataFrame(od_data)
logger.debug(f"{trace_label}: dropping od_data")
Expand Down Expand Up @@ -233,6 +256,11 @@ def compute_accessibility(
land_use_df = land_use
land_use_df = land_use_df[land_use_columns]

if model_settings.land_use_columns_orig:
orig_land_use_df = land_use[model_settings.land_use_columns_orig]
else:
orig_land_use_df = None

logger.info(
f"Running {trace_label} with {len(accessibility_df.index)} orig zones "
f"{len(land_use_df)} dest zones"
Expand All @@ -249,10 +277,15 @@ def compute_accessibility(
) in chunk.adaptive_chunked_choosers(
state, accessibility_df, trace_label, explicit_chunk_size=explicit_chunk_size
):
if orig_land_use_df is not None:
orig_land_use_df_chunk = orig_land_use_df.loc[chooser_chunk.index]
else:
orig_land_use_df_chunk = None
accessibilities = compute_accessibilities_for_zones(
state,
chooser_chunk,
land_use_df,
orig_land_use_df_chunk,
assignment_spec,
constants,
network_los,
Expand Down
13 changes: 7 additions & 6 deletions activitysim/abm/tables/landuse.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@
def land_use(state: workflow.State):
df = read_input_table(state, "land_use")

# try to make life easy for everybody by keeping everything in canonical order
# but as long as coalesce_pipeline doesn't sort tables it coalesces, it might not stay in order
# so even though we do this, anyone downstream who depends on it, should look out for themselves...
if not df.index.is_monotonic_increasing:
logger.info(f"sorting land_use index")
df = df.sort_index()

sharrow_enabled = state.settings.sharrow
if sharrow_enabled:
err_msg = (
Expand All @@ -34,12 +41,6 @@ def land_use(state: workflow.State):
assert df.index[-1] == len(df.index) - 1, err_msg
assert df.index.dtype.kind == "i", err_msg

# try to make life easy for everybody by keeping everything in canonical order
# but as long as coalesce_pipeline doesn't sort tables it coalesces, it might not stay in order
# so even though we do this, anyone downstream who depends on it, should look out for themselves...
if not df.index.is_monotonic_increasing:
df = df.sort_index()

logger.info("loaded land_use %s" % (df.shape,))
buffer = io.StringIO()
df.info(buf=buffer)
Expand Down
45 changes: 45 additions & 0 deletions activitysim/abm/test/test_agg_accessibility.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,48 @@ def test_agg_accessibility_explicit_chunking(state, dataframe_regression):
)
df = state.get_dataframe("accessibility")
dataframe_regression.check(df, basename="simple_agg_accessibility")


@pytest.mark.parametrize("explicit_chunk", [0, 5])
def test_agg_accessibility_orig_land_use(
state, dataframe_regression, tmp_path, explicit_chunk
):
# set top level settings
state.settings.chunk_size = 0
state.settings.sharrow = False
state.settings.chunk_training_mode = "explicit"

# read the accessibility settings and override the explicit chunk size to 5
model_settings = AccessibilitySettings.read_settings_file(
state.filesystem, "accessibility.yaml"
)
model_settings.explicit_chunk = explicit_chunk
model_settings.land_use_columns = ["RETEMPN", "TOTEMP", "TOTACRE"]
model_settings.land_use_columns_orig = ["TOTACRE"]

land_use = state.get_dataframe("land_use")
accessibility = state.get_dataframe("accessibility")

tmp_spec = tmp_path / "tmp-accessibility.csv"
tmp_spec.open("w").write(
"""Description,Target,Expression
orig_acreage,orig_acreage,df.landuse_orig_TOTACRE
dest_acreage,dest_acreage,df.TOTACRE
"""
)
model_settings.SPEC = str(tmp_spec)

# state.filesystem.get_config_file_path(model_settings.SPEC)

compute_accessibility(
state,
land_use,
accessibility,
state.get("network_los"),
model_settings,
model_settings_file_name="accessibility.yaml",
trace_label="compute_accessibility",
output_table_name="accessibility",
)
df = state.get_dataframe("accessibility")
dataframe_regression.check(df, basename="simple_agg_accessibility_orig_land_use")
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
zone_id,orig_acreage,dest_acreage
0,6.2314652154886145,7.3737508868303339
1,6.657368991274053,7.3737508868303339
2,5.909440711629391,7.3737508868303339
3,6.1810513148933497,7.3737508868303339
4,7.1842500057933423,7.3737508868303339
5,6.5875500148247959,7.3737508868303339
6,7.026426808699636,7.3737508868303339
7,7.1514854639047352,7.3737508868303339
8,7.9377317752601089,7.3737508868303339
9,7.5167053007413269,7.3737508868303339
10,7.6138186848086287,7.3737508868303339
11,7.1955623436220684,7.3737508868303339
12,6.4975288537722626,7.3737508868303339
13,6.6411821697405919,7.3737508868303339
14,6.5701824369168911,7.3737508868303339
15,8.034631032923107,7.3737508868303339
16,8.2449906898128429,7.3737508868303339
17,7.8948771916168834,7.3737508868303339
18,8.0507033814702993,7.3737508868303339
19,7.8073066868519945,7.3737508868303339
20,7.5875638951029023,7.3737508868303339
21,7.6932537206062692,7.3737508868303339
22,7.7279755421055585,7.3737508868303339
23,6.8834625864130921,7.3737508868303339
24,6.2653012127377101,7.3737508868303339
12 changes: 12 additions & 0 deletions activitysim/core/configuration/top.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,18 @@ class Settings(PydanticBase, extra="allow", validate_assignment=True):
compatible with using :py:attr:`Settings.sharrow`.
"""

omx_ignore_patterns: list[str] = []
"""
List of regex patterns to ignore when reading OMX files.

This is useful if you have tables in your OMX file that you don't want to
read in. For example, if you have both time-of-day values and time-independent
values (e.g., "BIKE_TIME" and "BIKE_TIME__AM"), you can ignore the time-of-day
values by setting this to ["BIKE_TIME__.+"].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required for sharrow? If so, I think we need to put an assert statement in the skim read for sharrow somewhere that will prevent this from happening and provide some info to the user about the error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not for sharrow per se, but the SkimDataset converts all the by-time-of-day skim variables into three dimensional arrays (o,d,time). So without this, in the example the OMX files essentially have two different sets of data for the same named variable, one two-dimension BIKE_TIME, and one three-dimension BIKE_TIME, which coincidentally happens to have no variance across the temporal dimension. We can't have two variables in the same namespace with the same name, so one is overwritten.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue #873 to address this.


.. versionadded:: 1.3
"""

keep_mem_logs: bool = False

pipeline_complib: str = "NOTSET"
Expand Down
Loading
Loading