added supply components

asia-pacific-energy-research-centre · Jun 7, 2024 · dd2c528 · dd2c528
1 parent 3ba7096
commit dd2c528
Show file tree

Hide file tree

Showing 14 changed files with 795 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -24,6 +24,28 @@ The onus is on the modellers to ensure that their data is in the correct format,
 ### To do List:
 - consider whether we want 19_total fuel and also subtotals of fuels within 09_total_transformation. The creation of these totals creates confusing values since they are the sums of negatives (input_fuel) and positives (output_fuel), e.g. -natural_gas + lng
 
+
+### Incorporating supply components repo into the EBT system:
+
+*Note that the supply components repo also contains scripts to project pipeline transport demand and transformation own use. But from hereon they will be referred to by their purpose or as 'supply components'. The supply component of the supply components repo will have it's purpose referred to as 'minor supply components'.*
+
+For the short term we have decided to include the supply components repo in the EBT system. This is because the supply components are simple scripts that dont need to be changed much, and it is easier to keep them in the same system as the EBTs so they dont need to be run manually. The code has been designed to be easily run separately using the output from the EBT system, so it is easy to separate them again if necessary. Some slight changes to the supply components system methodology were needed, although these changes only involved being more specific about the data that was used for calculations.
+
+These components will be run after the merging_results() function, to simulate the process of runnning the EBT system, giving the merged results to the modellers and then having the modeller run the supply components using those merged results (if the merged results dont have the necessary data the supply components will simply just calculate values using 0's). *note that this means that its important the EBT operator has all the correct data (e.g. all demand data for pipelines and trans_own_use_addon) in the data\modelled_data\ECONOMY_ID folder before running the supply components functions, because they will not get notified if this isnt the case.*
+
+At the end of running the supply components functions, the results will be saved into the data/modelled_data folder, simulating the modeller running the supply components process, saving the results into the integration folder and the EBT system operator then taking those results and putting them in the modelled_data folder. In the case of pipeline transport and transformation own use, the EBT operator would've needed to run the EBT as soon as this data is put into integration, however this process will also just run the merging_results() function again after running the supply components functions to automatically update the data. This makes the supply components functions very unobtrusive, even shortening the previous process!
+
+The supply components contains 3 scripts: 
+
+1. pipeline transport: 
+This takes in data from the EBT system and separates historical data from projection data. It then calculates the ratio of gas consumption to total consumption in the pipeline sector and uses this ratio to calculate the energy consumption (of gas, petroleum prods and electricity) in the pipeline sector for the projection years.  This was done after demand was modelled, because the energy used for pipeline transport is a function of the demand of gas.
+
+2. transformation, own use and nonspecified (trans_own_use_addon()):
+Much like the pipeline_transport function, this function takes in demand data from the EBT system and separates historical data from projection data. It then calculates the energy consumption in the calculates the energy used for other-transformation, own use and nonspecified for the projection years.  This is also only done after demand is modelled. 
+
+3. minor supply components:
+this script takes in the transformation and demand data and calculates the energy used for some minor supply components (e.g. biofuel supply). This is done after all transformation is modelled, at the same time as the supply modelling is done. *this could cause minor confusion for supply or transformation modellers if they accidentally think they need to use any of the outputs from this in their modelling. Although this doesnt seem likely, it is something to be aware of.*
+
 ## Using Conda
 
 ### Creating the Conda environment

diff --git a/data/demand_results_data/.gitkeep → config/supply_components_data/.gitkeep b/data/demand_results_data/.gitkeep → config/supply_components_data/.gitkeep
diff --git a/data/modelled_data/.gitkeep b/data/modelled_data/.gitkeep
diff --git a/data/self_defined_layout/.gitkeep b/data/self_defined_layout/.gitkeep
diff --git a/results/supply_components/.gitkeep b/results/supply_components/.gitkeep
diff --git a/results/supply_components/01_pipeline_transport/.gitkeep b/results/supply_components/01_pipeline_transport/.gitkeep
diff --git a/results/supply_components/02_trans_own_addon/.gitkeep b/results/supply_components/02_trans_own_addon/.gitkeep
diff --git a/results/supply_components/03_supply_results/.gitkeep b/results/supply_components/03_supply_results/.gitkeep
diff --git a/workflow/scripts/D_merging_results.py b/workflow/scripts/D_merging_results.py
@@ -32,13 +32,13 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
 
     if (isinstance(SINGLE_ECONOMY_ID, str)):
         # Define the path pattern for the results data files
-        results_data_path = 'data/demand_results_data/'+SINGLE_ECONOMY_ID+'/*'
+        results_data_path = 'data/modelled_data/'+SINGLE_ECONOMY_ID+'/*'
         print(results_data_path)
     else:
         print("Not implemented yet.")
 
     # Define the path pattern for the results data files
-    #results_data_path = 'data/demand_results_data/*'
+    #results_data_path = 'data/modelled_data/*'
     # Get a list of all matching results data file paths
     results_data_files = [f for f in glob.glob(results_data_path) if os.path.isfile(f)]
     # Check if results_data_files is empty
@@ -119,7 +119,6 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
         basename = os.path.basename(file)
         filtered_results_df['origin'] = basename.split('.')[0]
         #the origin col is used because some data will come from two different results files, yet have the same sector and fuels columns but different levels of detail. This means that after we remove subtotals and then try to recreate them in calculate_subtotals, we might end up with duplicate rows. So we need to be able to identify that these rows came from different origin files so the duplicates can be removed by being summed together.
-
         filtered_results_df_subtotals_labelled = merging_functions.label_subtotals(filtered_results_df, shared_categories + ['origin'])
         # Combine the results_df with all the other results_dfs we have read so far
         concatted_results_df = pd.concat([concatted_results_df, filtered_results_df_subtotals_labelled])
@@ -135,7 +134,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
     concatted_results_df = merging_functions.calculate_subtotals(concatted_results_df, shared_categories + ['origin'], DATAFRAME_ORIGIN='results')
     # concatted_results_df.to_csv('data/temp/error_checking/concatted_results_df.csv')
     ##############################
-    
+
     ###NOW WE HAVE THE concatted RESULTS DF, WITH SUBTOTALS CALCAULTED. WE NEED TO MERGE IT WITH THE LAYOUT FILE TO IDENTIFY ANY STRUCTURAL ISSUES####
     layout_df = layout_df[layout_df['economy'].isin(economies)].copy()
     #drop years in range(OUTLOOK_BASE_YEAR, OUTLOOK_BASE_YEAR+1) as we dont need it. This will help to speed up the process. 
@@ -145,6 +144,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
     layout_df_subtotals_recalculated = merging_functions.calculate_subtotals(layout_df, shared_categories, DATAFRAME_ORIGIN='layout')
 
     ############################## 
+
     trimmed_layout_df, missing_sectors_df = merging_functions.trim_layout_before_merging_with_results(layout_df_subtotals_recalculated,concatted_results_df)
     trimmed_concatted_results_df = merging_functions.trim_results_before_merging_with_layout(concatted_results_df, shared_categories)
     #rename subtotal columns before merging:
@@ -155,7 +155,6 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
     merged_df = pd.merge(trimmed_layout_df, trimmed_concatted_results_df, on=shared_categories, how="outer", indicator=True)
 
     results_layout_df = merging_functions.format_merged_layout_results_df(merged_df, shared_categories, trimmed_layout_df, trimmed_concatted_results_df,missing_sectors_df)
-
     # results_layout_df.to_csv('results_layout_df_before_drop.csv')
 
     #add subtotals to shared_categories now its in all the dfs
@@ -203,6 +202,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
     fuel_aggregates_df = merging_functions.calculate_fuel_aggregates(sector_aggregates_df, results_layout_df, shared_categories)
 
     final_df = merging_functions.create_final_energy_df(sector_aggregates_df, fuel_aggregates_df,results_layout_df, shared_categories)
+
     #now check for issues with the new aggregates and subtotals by using the layout file as the reference    
     merging_functions.check_for_issues_by_comparing_to_layout_df(final_df, shared_categories_w_subtotals, new_aggregate_sectors, layout_df, REMOVE_LABELLED_SUBTOTALS=False)
     #######################################

diff --git a/workflow/scripts/main.py b/workflow/scripts/main.py
@@ -8,7 +8,9 @@
 import F_incorporate_capacity as F
 import utility_functions as utils
 import merging_functions
+import supply_component_repo_functions
 from datetime import datetime
+import pandas as pd
 
 def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_ID_VAR):
     """
@@ -43,6 +45,14 @@ def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_
     if (isinstance(SINGLE_ECONOMY_ID, str)) and not (ONLY_RUN_UP_TO_MERGING):#if we arent using a single economy we dont need to merge
         # Merge the results
         final_energy_df = D.merging_results(model_df_clean_wide, SINGLE_ECONOMY_ID)
+        print('\n ################################################# \nRunning supply component repo functions and merging_results right afterwards: \n')
+        supply_component_repo_functions.pipeline_transport(SINGLE_ECONOMY_ID, final_energy_df)
+        supply_component_repo_functions.trans_own_use_addon(SINGLE_ECONOMY_ID, final_energy_df)
+        supply_component_repo_functions.minor_supply_components(SINGLE_ECONOMY_ID, final_energy_df)
+        old_final_energy_df = final_energy_df.copy()
+        final_energy_df = D.merging_results(model_df_clean_wide, SINGLE_ECONOMY_ID)
+        # utils.compare_values_in_final_energy_dfs(old_final_energy_df, final_energy_df)
+        print('Done running supply component repo functions and merging_results \n################################################\n')
 
         #calc emissions:
         emissions_df = E.calculate_emissions(final_energy_df,SINGLE_ECONOMY_ID)
@@ -54,34 +64,15 @@ def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_
     # Return the final DataFrame
     return final_energy_df, emissions_df, capacity_df, model_df_clean_wide
 
-def run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH, MOVE_OLD_FILES_TO_ARCHIVE=False):
-    """
-    This is really just meant for moving every economy's model_df_clean_wide df into {LOCAL_FILE_PATH}\Modelling\Integration\{ECONOMY_ID}\00_LayoutTemplate so the modellers can use it as a starting point for their modelling.
-    
-    it will remove the original files from the folder and move them to an archive folder in the same directory using the function utils.move_files_to_archive_for_economy(LOCAL_FILE_PATH, economy) if MOVE_OLD_FILES_TO_ARCHIVE is True
-    """
-    file_date_id = datetime.now().strftime('%Y%m%d')
-    for economy in utils.ALL_ECONOMY_IDS:
-
-        if MOVE_OLD_FILES_TO_ARCHIVE:
-            utils.move_files_to_archive_for_economy(LOCAL_FILE_PATH, economy)
-        final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main(ONLY_RUN_UP_TO_MERGING = True, SINGLE_ECONOMY_ID=economy)
-        model_df_clean_wide.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_{economy}_{file_date_id}.csv', index=False)
-
-        reference_df = model_df_clean_wide[model_df_clean_wide['scenarios'] == 'reference'].copy().reset_index(drop = True)
-        target_df = model_df_clean_wide[model_df_clean_wide['scenarios'] == 'target'].copy().reset_index(drop = True)
-
-        reference_df.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_ref_{economy}_{file_date_id}.csv', index=False)
-        target_df.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_tgt_{economy}_{file_date_id}.csv', index=False)
-        print('Done run_main_up_to_merging_for_every_economy for ' + economy)
-
-
 #%%
 # Run the main function and store the result
-final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main()
+if __name__ == "__main__":#this will allow us to import main into other scripts without running the code below
+    final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main()
+    # test(SINGLE_ECONOMY_ID='20_USA')
 #C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th
-# run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th', MOVE_OLD_FILES_TO_ARCHIVE=True)
+# utils.run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th', MOVE_OLD_FILES_TO_ARCHIVE=True)
 
-# run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/hyuga.kasai/APERC/Outlook-9th - Modelling', MOVE_OLD_FILES_TO_ARCHIVE=True)
+# utils.run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/hyuga.kasai/APERC/Outlook-9th - Modelling', MOVE_OLD_FILES_TO_ARCHIVE=True)
 
-#%%
+#%%
+# %%
diff --git a/workflow/scripts/merging_functions.py b/workflow/scripts/merging_functions.py
@@ -47,6 +47,7 @@ def label_subtotals_for_sub_col(df, sub_col):
 
         ############################# 
         #if more than one value are not zero/nan for this group, then it could be a subtotal, if not, its a definite non-subtotal since its the most specific data we have for this group.
+
         value_mask = (abs(df['value'])> 0)
 
         # Group by all columns except 'value' and sub_col and check how many values are >0 or <0 for that group
@@ -307,6 +308,13 @@ def calculate_subtotal_for_columns(melted_df, cols_to_sum):
     #     breakpoint()
     ###################
     #make final_df wide
+    #check for duplicates
+    duplicates = final_df[final_df.duplicated(subset=shared_categories+['year'], keep=False)]
+    #if there are duplicates then save them to a csv so we can check them later and throw an error.
+    if duplicates.shape[0] > 0:
+        duplicates.to_csv('data/temp/error_checking/duplicates_in_subtotaled_df.csv', index=False)
+        breakpoint()
+        raise Exception("There are duplicates in the subtotaled DataFrame.")
     final_df_wide = final_df.pivot(index=shared_categories+['is_subtotal'], columns='year', values='value').reset_index()
     ###################
     try:
@@ -594,13 +602,11 @@ def calculate_sector_aggregates(df, sectors, aggregate_sector, shared_categories
                     for col in numeric_cols:
                         value1 = row[col]
                         value2 = corresponding_row[col].values[0]
-
                         # Check if both values are not NaN or zero, and if difference exceeds tolerance
                         if not (np.isnan(value1) or np.isnan(value2) or value1 == 0 or value2 == 0):
                             if np.abs(value1 - value2) > 100000:
                                 # Save the differing rows
-                                differences = differences.append(row)
-                                differences = differences.append(corresponding_row)
+                                differences = pd.concat([differences, pd.DataFrame([row]), corresponding_row])
 
             # Remove duplicates if any
             differences = differences.drop_duplicates()
@@ -1373,7 +1379,8 @@ def process_sheet(sheet_name, excel_file, economy, OUTLOOK_BASE_YEAR, OUTLOOK_LA
                 'subfuels': mapped_values['subfuels'],
                 **{str(year): row[year] for year in range(OUTLOOK_BASE_YEAR + 1, OUTLOOK_LAST_YEAR + 1)}
             }
-            transformed_data = transformed_data.append(new_row, ignore_index=True)
+            transformed_data = pd.concat([transformed_data, pd.DataFrame([new_row])], ignore_index=True)
+            # transformed_data = transformed_data.append(new_row, ignore_index=True)
 
         sheet_data = pd.concat([sheet_data, transformed_data])
 
@@ -1460,7 +1467,8 @@ def split_subfuels(csv_file, layout_df, shared_categories, OUTLOOK_BASE_YEAR, OU
                 proportion_dict = {}
                 for _, row in summed.iterrows():
                     if row['subfuels'] != 'x':
-                        proportion = row['value'] / total_values.iloc[0]
+                        # proportion = row['value'] / total_values.iloc[0]
+                        proportion = row['value'] / total_values.iloc[0] if total_values.iloc[0] != 0 else 0
                         proportion_dict[row['subfuels']] = proportion
 
                 # Create new rows in df using the proportions
@@ -1472,7 +1480,9 @@ def split_subfuels(csv_file, layout_df, shared_categories, OUTLOOK_BASE_YEAR, OU
                         new_row['subfuels'] = subfuel
                         for year in range(OUTLOOK_BASE_YEAR, OUTLOOK_LAST_YEAR+1):
                             new_row[str(year)] = new_row[str(year)] * proportion
-                        df = df.append(new_row, ignore_index=True)
+                        # Append the new row to df
+                        df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)    
+                        # df = df.append(new_row, ignore_index=True)
         # Drop the total rows (with 'x' in 'subfuels') for the current fuel type
         df = df.drop(df[(df['fuels'] == fuel) & (df['subfuels'] == 'x')].index)