Use serial update for each upgrades data frame #186

wenyikuang · 2024-06-18T05:14:10Z

Pull request overview

This PR is intended to fix the #149 .

How do I test it works?

With cycle3_euss_10k_df_2 dataset and there are 3 upgrades. I confirmed the results in output parquets
are matching without the change.
With cycle3_euss_full_350k_combined dataset and there are 32 upgrades. I could finished the run in my local laptop.

This pull request makes changes to (select all the apply):

Author pull request checklist:

Review Checklist

This will not be exhaustively relevant to every PR.

Perform a code review on GitHub
All related changes have been implemented: data and method additions, changes, tests
If fixing a defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
Reviewed change documentation
Ensured code files contain License reference
Results differences are reasonable
Make sure the newly added measures has been added with tests and indexed properly
CI status: all tests pass

ComStock Licensing Language - Add to Beginning of Each Code File

# ComStock™, Copyright (c) 2023 Alliance for Sustainable Energy, LLC. All rights reserved.
# See top level LICENSE.txt file for license terms.

TODO:

Test with downloadable dataset and verify the results are matches.
Cleanup the comments and write documentation.

…into rHorsey/sampling-v2

wenyikuang · 2024-07-17T16:18:22Z

postprocessing/comstockpostproc/comstock_to_eia_comparison.py

+        assert isinstance(self.monthly_data, pl.LazyFrame)
+
+        # self.data = pl.concat(annual_dfs_to_concat, join='inner', ignore_index=True)
+        common_columns = set(annual_dfs_to_concat[0].columns)


That's my approach to implement the join='inner' , find all the shared columns and select them out.

asparke2 · 2024-07-23T12:17:37Z

postprocessing/comstockpostproc/comstock.py

@@ -2571,7 +2643,7 @@ def export_data_and_enumeration_dictionary(self):
            col_enums = []
            if col_def['data_type'] == 'string':
                str_enums = []
-                for enum in self.data.select(col).unique().to_series().to_list():
+                for enum in self.data.columns:


The original code isn't looping through column names, it's getting the unique set of values across all rows within the column.

asparke2

@wenyikuang let's have a discussion about this, I'm not convinced that the scaling weights are working as expected.

asparke2 · 2024-07-23T12:31:36Z

postprocessing/comstockpostproc/comstock.py


+                self.add_metadata_index_col(upgradIdcount)
+                self.get_comstock_unscaled_monthly_energy_consumption()
+                self.add_weighted_energy_savings_columns()


This method requires the self.BLDG_WEIGHT column to exist...but that column is not added until self.add_national_scaling_weights() is called. How is the code working now?

asparke2 · 2024-07-23T12:38:23Z

postprocessing/comstockpostproc/comstock_measure_comparison.py


                color_map = {'Baseline': self.COLOR_COMSTOCK_BEFORE, upgrade_name: self.COLOR_COMSTOCK_AFTER}
+                df_upgrade = df_upgrade.collect().to_pandas()


Can we really collect the dataframe at this point without first downselecting to only the columns actually used in the comparison plots? This needs to be tested with a full-sized dataset to make sure the memory usage is reasonable. Otherwise, it seems like it should be kept as a pl.LazyFrame and only collected inside the plotting functions.

asparke2 · 2024-07-23T12:39:28Z

postprocessing/mprofile_20240605175646.dat

@@ -0,0 +1,23 @@
+FUNC __main__.main 227.7188 1717631807.5411 254.3594 1717631808.2755 0


Delete this file, seems like accidentally committed from memory profiling.

…call

…refactor.

…comstock comparison.

rHorsey · 2024-09-08T19:28:13Z

This now also includes sampling v2. I'm pulling in changes directly here.

Input side of sampling v2

rHorsey · 2024-09-08T19:39:22Z

@ChristopherCaradonna This is the branch to test the 10k with! I haven't merged main into this however, so I'm unsure if there are conflicts esp w/ options lookup.

rHorsey · 2024-09-11T01:02:45Z

Big (and somewhat breaking) merge to speed work towards Nov EUSS Release with new sampling and much better postprocessing. Huge thanks @wenyikuang !!!

rHorsey and others added 14 commits January 22, 2024 16:55

Successful initial testing.

292c149

Updates for successful medium test.

6a6a949

3x performance bump - more possible with batching.

5dee2c7

Working with 1M, added in pseudorandom option.

4bff2d7

Mostly finished code updates for precomputed samples. Validating.

ee7bd7d

Fully working 100k, 1M, and 2M precomputed samples.

e60d2ab

Additional unused args

d6f3307

Merge branch 'rhorsey/bsb-23-10-upgrade' into rHorsey/sampling-v2

ff97aff

Merge branch 'main' into rHorsey/sampling-v2

a5e363d

More options_lookup changes.

7e7c086

Refactored sampling code to no longer follow buildstockbatch paradigms.

be67b1f

Updating sqft enumerations for geospatial join.

6141e26

Merge branch 'rHorsey/sampling-v2' of http://github.com/nrel/comstock …

14d3609

…into rHorsey/sampling-v2

sampling fixes to run on pc

1a5d329

wenyikuang added postprocessing PR improves or adds postprocessing content Pull Request - Ready for CI labels Jun 18, 2024

wenyikuang changed the title ~~Use serial update for each upgrades data frame~~ [WIP] Use serial update for each upgrades data frame Jun 18, 2024

wenyikuang changed the title ~~[WIP] Use serial update for each upgrades data frame~~ Use serial update for each upgrades data frame Jul 17, 2024

asparke2 and others added 12 commits July 17, 2024 11:40

Initial step at processing one upgrade at a time

43d6a9b

save test py script here

e719266

fix the metadata_index issue, the results are matched

7ac725f

Modify export function with lazy frame api.

952d066

update

838d203

update, working on syntax

eda0a3b

Fix logic in metadat index, use self.upgrade_id as primary key

d213a59

use lazy frame till seaborn to save memory

c0134a0

more assert to make it safer

ad05005

working on all in lazy frame

b94540a

Move add columns to constructore

38fbdca

check in the test file

79a7ad0

wenyikuang commented Jul 17, 2024

View reviewed changes

wenyikuang added 2 commits July 22, 2024 18:13

debug the scale issue

e4ff8f2

fixing scale

3a0c12b

wenyikuang force-pushed the postproc_per_upgrade branch from 80f16af to 3a0c12b Compare July 23, 2024 05:14

asparke2 reviewed Jul 23, 2024

View reviewed changes

wenyikuang and others added 3 commits July 24, 2024 11:23

Fixed the export_data_and_enumeration_dictionary funct for iterating …

e4e9aab

…call

Move weighting outside of init method

3a94647

Seperate the unweighted and weighted columns initilaztion out.

36c1893

wenyikuang force-pushed the postproc_per_upgrade branch from 2742465 to 36c1893 Compare July 31, 2024 05:23

wenyikuang and others added 10 commits August 6, 2024 14:30

Fixed the un-matched columns.

059d94a

Cleaned up logging and rename functions and variables.

0851fba

lazyframe plotter wrapper

8cd96f2

Refactored comstock_measure_comparison with fine grained way.

d6ec850

Optimized performance for plotting! with comstock vs cbecs lazyframe …

c2d6d12

…refactor.

Updated AMI/CBEC/EIA with plotter class.

0ae614d

Fixed the eia vs comstock comparision plotting and bugs in cebecs vs …

d806261

…comstock comparison.

Up to date with sampling v2 state.

32a03d0

National and Geospatial writes working.

8d376cd

Removing unneded profile.

c6da741

rHorsey added 2 commits September 8, 2024 13:29

Merge branch 'main' into postproc_per_upgrade

cd63d64

Merge pull request #207 from NREL/rHorsey/sampling-v2

1d45003

Input side of sampling v2

wenyikuang and others added 2 commits September 10, 2024 13:34

Fixed the syntax error from comstock.py

ec34a3c

Fixing CA CZ enumerations bug in options lookup.

2731285

rHorsey merged commit b2fb2a3 into main Sep 11, 2024
0 of 3 checks passed

rHorsey deleted the postproc_per_upgrade branch September 11, 2024 01:02

mdahlhausen mentioned this pull request Sep 12, 2024

Postprocessing errors #216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use serial update for each upgrades data frame #186

Use serial update for each upgrades data frame #186

wenyikuang commented Jun 18, 2024 •

edited

Loading

wenyikuang Jul 17, 2024

asparke2 Jul 23, 2024

asparke2 left a comment

asparke2 Jul 23, 2024

asparke2 Jul 23, 2024

asparke2 Jul 23, 2024

wenyikuang Jul 24, 2024

rHorsey commented Sep 8, 2024

rHorsey commented Sep 8, 2024

rHorsey commented Sep 11, 2024


		color_map = {'Baseline': self.COLOR_COMSTOCK_BEFORE, upgrade_name: self.COLOR_COMSTOCK_AFTER}
		df_upgrade = df_upgrade.collect().to_pandas()

		@@ -0,0 +1,23 @@
		FUNC __main__.main 227.7188 1717631807.5411 254.3594 1717631808.2755 0

Use serial update for each upgrades data frame #186

Use serial update for each upgrades data frame #186

Conversation

wenyikuang commented Jun 18, 2024 • edited Loading

Pull request overview

Review Checklist

ComStock Licensing Language - Add to Beginning of Each Code File

wenyikuang Jul 17, 2024

Choose a reason for hiding this comment

asparke2 Jul 23, 2024

Choose a reason for hiding this comment

asparke2 left a comment

Choose a reason for hiding this comment

asparke2 Jul 23, 2024

Choose a reason for hiding this comment

asparke2 Jul 23, 2024

Choose a reason for hiding this comment

asparke2 Jul 23, 2024

Choose a reason for hiding this comment

wenyikuang Jul 24, 2024

Choose a reason for hiding this comment

rHorsey commented Sep 8, 2024

rHorsey commented Sep 8, 2024

rHorsey commented Sep 11, 2024

wenyikuang commented Jun 18, 2024 •

edited

Loading