Skip to content

Commit

Permalink
added supply components
Browse files Browse the repository at this point in the history
  • Loading branch information
H3yfinn committed Jun 7, 2024
1 parent 3ba7096 commit dd2c528
Show file tree
Hide file tree
Showing 14 changed files with 795 additions and 41 deletions.
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,28 @@ The onus is on the modellers to ensure that their data is in the correct format,
### To do List:
- consider whether we want 19_total fuel and also subtotals of fuels within 09_total_transformation. The creation of these totals creates confusing values since they are the sums of negatives (input_fuel) and positives (output_fuel), e.g. -natural_gas + lng


### Incorporating supply components repo into the EBT system:

*Note that the supply components repo also contains scripts to project pipeline transport demand and transformation own use. But from hereon they will be referred to by their purpose or as 'supply components'. The supply component of the supply components repo will have it's purpose referred to as 'minor supply components'.*

For the short term we have decided to include the supply components repo in the EBT system. This is because the supply components are simple scripts that dont need to be changed much, and it is easier to keep them in the same system as the EBTs so they dont need to be run manually. The code has been designed to be easily run separately using the output from the EBT system, so it is easy to separate them again if necessary. Some slight changes to the supply components system methodology were needed, although these changes only involved being more specific about the data that was used for calculations.

These components will be run after the merging_results() function, to simulate the process of runnning the EBT system, giving the merged results to the modellers and then having the modeller run the supply components using those merged results (if the merged results dont have the necessary data the supply components will simply just calculate values using 0's). *note that this means that its important the EBT operator has all the correct data (e.g. all demand data for pipelines and trans_own_use_addon) in the data\modelled_data\ECONOMY_ID folder before running the supply components functions, because they will not get notified if this isnt the case.*

At the end of running the supply components functions, the results will be saved into the data/modelled_data folder, simulating the modeller running the supply components process, saving the results into the integration folder and the EBT system operator then taking those results and putting them in the modelled_data folder. In the case of pipeline transport and transformation own use, the EBT operator would've needed to run the EBT as soon as this data is put into integration, however this process will also just run the merging_results() function again after running the supply components functions to automatically update the data. This makes the supply components functions very unobtrusive, even shortening the previous process!

The supply components contains 3 scripts:

1. pipeline transport:
This takes in data from the EBT system and separates historical data from projection data. It then calculates the ratio of gas consumption to total consumption in the pipeline sector and uses this ratio to calculate the energy consumption (of gas, petroleum prods and electricity) in the pipeline sector for the projection years. This was done after demand was modelled, because the energy used for pipeline transport is a function of the demand of gas.

2. transformation, own use and nonspecified (trans_own_use_addon()):
Much like the pipeline_transport function, this function takes in demand data from the EBT system and separates historical data from projection data. It then calculates the energy consumption in the calculates the energy used for other-transformation, own use and nonspecified for the projection years. This is also only done after demand is modelled.

3. minor supply components:
this script takes in the transformation and demand data and calculates the energy used for some minor supply components (e.g. biofuel supply). This is done after all transformation is modelled, at the same time as the supply modelling is done. *this could cause minor confusion for supply or transformation modellers if they accidentally think they need to use any of the outputs from this in their modelling. Although this doesnt seem likely, it is something to be aware of.*

## Using Conda

### Creating the Conda environment
Expand Down
File renamed without changes.
Empty file added data/modelled_data/.gitkeep
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
10 changes: 5 additions & 5 deletions workflow/scripts/D_merging_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi

if (isinstance(SINGLE_ECONOMY_ID, str)):
# Define the path pattern for the results data files
results_data_path = 'data/demand_results_data/'+SINGLE_ECONOMY_ID+'/*'
results_data_path = 'data/modelled_data/'+SINGLE_ECONOMY_ID+'/*'
print(results_data_path)
else:
print("Not implemented yet.")

# Define the path pattern for the results data files
#results_data_path = 'data/demand_results_data/*'
#results_data_path = 'data/modelled_data/*'
# Get a list of all matching results data file paths
results_data_files = [f for f in glob.glob(results_data_path) if os.path.isfile(f)]
# Check if results_data_files is empty
Expand Down Expand Up @@ -119,7 +119,6 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
basename = os.path.basename(file)
filtered_results_df['origin'] = basename.split('.')[0]
#the origin col is used because some data will come from two different results files, yet have the same sector and fuels columns but different levels of detail. This means that after we remove subtotals and then try to recreate them in calculate_subtotals, we might end up with duplicate rows. So we need to be able to identify that these rows came from different origin files so the duplicates can be removed by being summed together.

filtered_results_df_subtotals_labelled = merging_functions.label_subtotals(filtered_results_df, shared_categories + ['origin'])
# Combine the results_df with all the other results_dfs we have read so far
concatted_results_df = pd.concat([concatted_results_df, filtered_results_df_subtotals_labelled])
Expand All @@ -135,7 +134,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
concatted_results_df = merging_functions.calculate_subtotals(concatted_results_df, shared_categories + ['origin'], DATAFRAME_ORIGIN='results')
# concatted_results_df.to_csv('data/temp/error_checking/concatted_results_df.csv')
##############################

###NOW WE HAVE THE concatted RESULTS DF, WITH SUBTOTALS CALCAULTED. WE NEED TO MERGE IT WITH THE LAYOUT FILE TO IDENTIFY ANY STRUCTURAL ISSUES####
layout_df = layout_df[layout_df['economy'].isin(economies)].copy()
#drop years in range(OUTLOOK_BASE_YEAR, OUTLOOK_BASE_YEAR+1) as we dont need it. This will help to speed up the process.
Expand All @@ -145,6 +144,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
layout_df_subtotals_recalculated = merging_functions.calculate_subtotals(layout_df, shared_categories, DATAFRAME_ORIGIN='layout')

##############################

trimmed_layout_df, missing_sectors_df = merging_functions.trim_layout_before_merging_with_results(layout_df_subtotals_recalculated,concatted_results_df)
trimmed_concatted_results_df = merging_functions.trim_results_before_merging_with_layout(concatted_results_df, shared_categories)
#rename subtotal columns before merging:
Expand All @@ -155,7 +155,6 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
merged_df = pd.merge(trimmed_layout_df, trimmed_concatted_results_df, on=shared_categories, how="outer", indicator=True)

results_layout_df = merging_functions.format_merged_layout_results_df(merged_df, shared_categories, trimmed_layout_df, trimmed_concatted_results_df,missing_sectors_df)

# results_layout_df.to_csv('results_layout_df_before_drop.csv')

#add subtotals to shared_categories now its in all the dfs
Expand Down Expand Up @@ -203,6 +202,7 @@ def merging_results(original_layout_df, SINGLE_ECONOMY_ID, previous_merged_df_fi
fuel_aggregates_df = merging_functions.calculate_fuel_aggregates(sector_aggregates_df, results_layout_df, shared_categories)

final_df = merging_functions.create_final_energy_df(sector_aggregates_df, fuel_aggregates_df,results_layout_df, shared_categories)

#now check for issues with the new aggregates and subtotals by using the layout file as the reference
merging_functions.check_for_issues_by_comparing_to_layout_df(final_df, shared_categories_w_subtotals, new_aggregate_sectors, layout_df, REMOVE_LABELLED_SUBTOTALS=False)
#######################################
Expand Down
43 changes: 17 additions & 26 deletions workflow/scripts/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
import F_incorporate_capacity as F
import utility_functions as utils
import merging_functions
import supply_component_repo_functions
from datetime import datetime
import pandas as pd

def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_ID_VAR):
"""
Expand Down Expand Up @@ -43,6 +45,14 @@ def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_
if (isinstance(SINGLE_ECONOMY_ID, str)) and not (ONLY_RUN_UP_TO_MERGING):#if we arent using a single economy we dont need to merge
# Merge the results
final_energy_df = D.merging_results(model_df_clean_wide, SINGLE_ECONOMY_ID)
print('\n ################################################# \nRunning supply component repo functions and merging_results right afterwards: \n')
supply_component_repo_functions.pipeline_transport(SINGLE_ECONOMY_ID, final_energy_df)
supply_component_repo_functions.trans_own_use_addon(SINGLE_ECONOMY_ID, final_energy_df)
supply_component_repo_functions.minor_supply_components(SINGLE_ECONOMY_ID, final_energy_df)
old_final_energy_df = final_energy_df.copy()
final_energy_df = D.merging_results(model_df_clean_wide, SINGLE_ECONOMY_ID)
# utils.compare_values_in_final_energy_dfs(old_final_energy_df, final_energy_df)
print('Done running supply component repo functions and merging_results \n################################################\n')

#calc emissions:
emissions_df = E.calculate_emissions(final_energy_df,SINGLE_ECONOMY_ID)
Expand All @@ -54,34 +64,15 @@ def main(ONLY_RUN_UP_TO_MERGING=False, SINGLE_ECONOMY_ID = utils.SINGLE_ECONOMY_
# Return the final DataFrame
return final_energy_df, emissions_df, capacity_df, model_df_clean_wide

def run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH, MOVE_OLD_FILES_TO_ARCHIVE=False):
"""
This is really just meant for moving every economy's model_df_clean_wide df into {LOCAL_FILE_PATH}\Modelling\Integration\{ECONOMY_ID}\00_LayoutTemplate so the modellers can use it as a starting point for their modelling.
it will remove the original files from the folder and move them to an archive folder in the same directory using the function utils.move_files_to_archive_for_economy(LOCAL_FILE_PATH, economy) if MOVE_OLD_FILES_TO_ARCHIVE is True
"""
file_date_id = datetime.now().strftime('%Y%m%d')
for economy in utils.ALL_ECONOMY_IDS:

if MOVE_OLD_FILES_TO_ARCHIVE:
utils.move_files_to_archive_for_economy(LOCAL_FILE_PATH, economy)
final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main(ONLY_RUN_UP_TO_MERGING = True, SINGLE_ECONOMY_ID=economy)
model_df_clean_wide.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_{economy}_{file_date_id}.csv', index=False)

reference_df = model_df_clean_wide[model_df_clean_wide['scenarios'] == 'reference'].copy().reset_index(drop = True)
target_df = model_df_clean_wide[model_df_clean_wide['scenarios'] == 'target'].copy().reset_index(drop = True)

reference_df.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_ref_{economy}_{file_date_id}.csv', index=False)
target_df.to_csv(f'{LOCAL_FILE_PATH}/Integration/{economy}/00_LayoutTemplate/model_df_wide_tgt_{economy}_{file_date_id}.csv', index=False)
print('Done run_main_up_to_merging_for_every_economy for ' + economy)


#%%
# Run the main function and store the result
final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main()
if __name__ == "__main__":#this will allow us to import main into other scripts without running the code below
final_energy_df, emissions_df, capacity_df, model_df_clean_wide = main()
# test(SINGLE_ECONOMY_ID='20_USA')
#C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th
# run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th', MOVE_OLD_FILES_TO_ARCHIVE=True)
# utils.run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/finbar.maunsell/OneDrive - APERC/outlook 9th', MOVE_OLD_FILES_TO_ARCHIVE=True)

# run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/hyuga.kasai/APERC/Outlook-9th - Modelling', MOVE_OLD_FILES_TO_ARCHIVE=True)
# utils.run_main_up_to_merging_for_every_economy(LOCAL_FILE_PATH= r'C:/Users/hyuga.kasai/APERC/Outlook-9th - Modelling', MOVE_OLD_FILES_TO_ARCHIVE=True)

#%%
#%%
# %%
22 changes: 16 additions & 6 deletions workflow/scripts/merging_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def label_subtotals_for_sub_col(df, sub_col):

#############################
#if more than one value are not zero/nan for this group, then it could be a subtotal, if not, its a definite non-subtotal since its the most specific data we have for this group.

value_mask = (abs(df['value'])> 0)

# Group by all columns except 'value' and sub_col and check how many values are >0 or <0 for that group
Expand Down Expand Up @@ -307,6 +308,13 @@ def calculate_subtotal_for_columns(melted_df, cols_to_sum):
# breakpoint()
###################
#make final_df wide
#check for duplicates
duplicates = final_df[final_df.duplicated(subset=shared_categories+['year'], keep=False)]
#if there are duplicates then save them to a csv so we can check them later and throw an error.
if duplicates.shape[0] > 0:
duplicates.to_csv('data/temp/error_checking/duplicates_in_subtotaled_df.csv', index=False)
breakpoint()
raise Exception("There are duplicates in the subtotaled DataFrame.")
final_df_wide = final_df.pivot(index=shared_categories+['is_subtotal'], columns='year', values='value').reset_index()
###################
try:
Expand Down Expand Up @@ -594,13 +602,11 @@ def calculate_sector_aggregates(df, sectors, aggregate_sector, shared_categories
for col in numeric_cols:
value1 = row[col]
value2 = corresponding_row[col].values[0]

# Check if both values are not NaN or zero, and if difference exceeds tolerance
if not (np.isnan(value1) or np.isnan(value2) or value1 == 0 or value2 == 0):
if np.abs(value1 - value2) > 100000:
# Save the differing rows
differences = differences.append(row)
differences = differences.append(corresponding_row)
differences = pd.concat([differences, pd.DataFrame([row]), corresponding_row])

# Remove duplicates if any
differences = differences.drop_duplicates()
Expand Down Expand Up @@ -1373,7 +1379,8 @@ def process_sheet(sheet_name, excel_file, economy, OUTLOOK_BASE_YEAR, OUTLOOK_LA
'subfuels': mapped_values['subfuels'],
**{str(year): row[year] for year in range(OUTLOOK_BASE_YEAR + 1, OUTLOOK_LAST_YEAR + 1)}
}
transformed_data = transformed_data.append(new_row, ignore_index=True)
transformed_data = pd.concat([transformed_data, pd.DataFrame([new_row])], ignore_index=True)
# transformed_data = transformed_data.append(new_row, ignore_index=True)

sheet_data = pd.concat([sheet_data, transformed_data])

Expand Down Expand Up @@ -1460,7 +1467,8 @@ def split_subfuels(csv_file, layout_df, shared_categories, OUTLOOK_BASE_YEAR, OU
proportion_dict = {}
for _, row in summed.iterrows():
if row['subfuels'] != 'x':
proportion = row['value'] / total_values.iloc[0]
# proportion = row['value'] / total_values.iloc[0]
proportion = row['value'] / total_values.iloc[0] if total_values.iloc[0] != 0 else 0
proportion_dict[row['subfuels']] = proportion

# Create new rows in df using the proportions
Expand All @@ -1472,7 +1480,9 @@ def split_subfuels(csv_file, layout_df, shared_categories, OUTLOOK_BASE_YEAR, OU
new_row['subfuels'] = subfuel
for year in range(OUTLOOK_BASE_YEAR, OUTLOOK_LAST_YEAR+1):
new_row[str(year)] = new_row[str(year)] * proportion
df = df.append(new_row, ignore_index=True)
# Append the new row to df
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
# df = df.append(new_row, ignore_index=True)
# Drop the total rows (with 'x' in 'subfuels') for the current fuel type
df = df.drop(df[(df['fuels'] == fuel) & (df['subfuels'] == 'x')].index)

Expand Down
Loading

0 comments on commit dd2c528

Please sign in to comment.