Releases: singularity-energy/open-grid-emissions
v0.6.0
v0.6.0 of OGE includes new data for 2023, a major methodological update, and various other enhancements and bug fixes.
2023 Data Release and Early Release Capability.
OGE now includes data for 2023, based on the final release data from EIA and EPA.
In addition, OGE now includes functionality to be able to ingest "Early Release" data from the EIA, which is typically available several months prior to the final release data released each autumn.
Aggregating Subplant data rather then Plant data
OGE includes data both at the "plant" level, as well as at the "fleet" and region level (see our documentation for more on these aggregations). While all emissions calculations were happening at the generator or "subplant" level, we had previously aggregated subplant data to the plant level, and then used the aggregated plant-level data to further aggregate to the fleet and region level. While this made these latter aggregations more computationally feasible, this could result in some irregularities and inconsistencies in the fleet and region data when a plant burned multiple fuels. Instead, we now use subplant data as the basis of all fleet-level aggregations as well.
Consider the example of the now-retired Meramec plant (ID 2104) in Missouri, which had 2 natural gas steam turbines and 2 conventional coal boilers:
- In 2022, its final year of operation, this plant burned slightly more natural gas than coal (by heat content), so it was categorized as a natural gas plant.
- Previously, since we were determining fleets based on plant data, the emissions from the entire plant (including the 2 coal generators) would have been aggregated into the natural gas fleet. However, this means that the average emissions for the natural gas fleet in this region would include some coal emissions, and thus be higher than typical natural gas fleet emissions.
- Now that we use subplants as the basis for the fleet aggregations, the two natural gas generators at Meramec are aggregated to the natural gas fleet, and the two coal generators at Meramec are aggregated to the coal fleet.
- This new approach more closely matches, in our understanding, how balancing authorities generally aggregate fleet data, using generators as the basis for these aggregations rather than plants.
In addition to affecting the fleet totals, this change also affects the hourly profile imputation process, since the residual hourly profiles will now be determined based on the updated fleet definitions.
Now that subplant data is being used more extensively through the pipeline, OGE also contains two new data outputs:
- Subplant-level results data at the annual and monthly resolutions (in addition to the existing plant-level output data)
- Subplant-specific attributes table that lists the primary fuel, nameplate capacity, and primary prime mover for each subplant.
For more details on these changes, see: #395
Exapanded and enhanced EPA-EIA crosswalking
The subplant-level aggregation revealed a number of previously-uncaught issues with our existing mapping between EPA plant/unit IDs and EIA plant/generator IDs:
- The EPA-EIA mapping is not static over time: the relationship between an EPA ID and EIA ID can change from one year to the next, sometimes changing multiple times over the nearly 20-year historical period covered by OGE. In fact some mappings change one year, and then change back to the original mapping several years later! To address this, OGE now includes a "start year" and "end year" for each mapping, and only uses the mapping that is valid for the current year
- The existing power sector data crosswalk published by the EPA is missing a number of newer mappings (since 2018), as well as many mappings for earlier years in the 2000s. We were able to expand these mappings using data that already exists in CAMPD's facility database
Ultimately, this update includes about 350 new mappings between EPA and EIA IDs. Without these mappings, the generation and emissions from a subplant could be double-counted if the unit reports data to both the EPA and EIA, since these would have been previously identified as separate subplants.
We also found that across various EPA datasets, that units with IDs starting with leading 0s (e.g., "001") were inconsistently having those leading zeros removed, resulting in sometimes incomplete matches between datasets. To address this, we now strip all leading zeros from EPA unit IDs to ensure consistent mapping.
Data usability enhancements
In the annual, plant-level results file, we now include plant attributes (such as name, location, capacity, fuel, etc) to make these files easier to use and filter in Excel rather than needing to work with them programmatically.
Other Improvements
The subplant-level aggregations also revealed that a number of subplants only include steam output data from CEMS, but no generation data. Examining these units revealed that these boilers may only be used for steam production (for district steam systems for example) and not power production, so these are once again being dropped from the dataset until we can get further clarification from EPA on how to interpret this data.
What's Changed
- Updates code to zip files for upload by @grgmiller in #388
- Update EPA/EIA crosswalk of plant 55641 by @grgmiller in #389
- Enable running OGE pipeline with Early Release data and PUDL nightly builds by @grgmiller in #390
- Update data export notebook and small bug fixes by @grgmiller in #391
- Update year to 2023 by @grgmiller in #393
- Update reference tables by @rouille in #394
- Aggregate fleet data by subplant, not plant by @grgmiller in #395
- Remove extra comma in energy source groups csv file by @rouille in #397
- Expand EPA-EIA crosswalk manual table and create primary fuel manual table by @rouille in #398
- Clean up warnings for 2023 data pipeline by @grgmiller in #400
- Refactor function writing power sector data by @rouille in #401
- Strip leading zeros from CEMS
emission_unit_id_epa
by @grgmiller in #402 - Expand eia-epa crosswalk by @rouille in #403
- Final Cleanup by @grgmiller in #407
- v0.6.0 by @grgmiller in #408
Full Changelog: v0.5.0...v0.6.0
v0.5.0
OGE v0.5.0 is a new major release that expands the dataset's historical coverage back to 2005, and includes other methodological enhancements that improve data quality in all years.
In addition to the new data, users should expect changes to the existing 2019-2022 data: NOx and SO2 totals may change for some plants, net generation totals may change for some plants, data may change for CHP plants (see the "methodological updates" section for more details)
Input data changes
- Updates to use the most recent data version of PUDL (v2024.5.0). This includes a re-release of the 2022 EIA-923 data, which may change some of the 2022 results.
- Updates reference tables including the
energy_source_groups
file, and theutility_name_ba_code_map
file (#374), andepa_eia_crosswalk_manual
(#372), andemission_factors_for_co2_ch4_n2o
(#377)
Output data changes
- Expands historical coverage of OGE to include monthly and annual data for 2005-2018 (#295 and #362)
- All output files (those in the
outputs/
directory are now saved as compressed.csv.zip
files instead of.csv
files. This reduces the disk space of the outputs folder from approximately 16GB to 2.5GB. (#366) - Expands the data in the plant_static_attributes table to include location data (lat/long, address) and nameplate capacity (#364, #382, #385); commercial operation dates and retirement dates (#367). We also screen for and correct erroneous lat/long data (#368)
- Fixes a bug where the "total" values in the
outputs/annual_generation_averages_by_fuel
file were not being calculated correctly
Methodological updates
- When calculating the electric allocation factors for combined heat and power (CHP) plants, we previously were calculating this at the generator level, which was introducing bugs for certain combined cycle units when fuel and generation is reported for different generators at the same subplant. We now calculate this factor at the subplant level (#363)
- Fixes several bugs with the gross-to-net generation conversions where anomalous fleet-average ratios were being introduced, and default factors were not being mapped to certain generators. Also fixed a bug where GTN ratios were being calculated where there was missing gross generation or net generation data. (#370, #375, #383)
- Updates uncontrolled NOx and SO2 factors to align assumptions with those used by the EIA Electric Power Annual, and to fix a bug where we were adjusting the SO2 values for fluidized bed boilers, even though the control efficiencies are already incorporated into the uncontrolled emission factors (#373). In addition, because fuel sulfur content data is not available pre-2008, we use sulfur content values averaged from 2008-2012 to backfill the missing data. When calculating backstop values for missing values in any year, we now use state-specific values (rather than national-average) to reflect differences in the sulfur contents of fuels being delivered in specific parts of the country (#376)
Other minor fixes
- Remove the option to run the EIA-923 allocation at the plant level. This was an artifact that was no longer used (#361)
- Clean up function typehints and continue converting docstrings to Google format
- Updates where files are stored and accessed from in s3 (#384)
Pull Requests in this update
- Expand historical coverage pre-2019 by @grgmiller in #295
- Remove add_subplant_id optional argument by @rouille in #361
- Add 2005, 2006 and 2007 years by @rouille in #362
- Calculate electric_allocation_factor by subplant by @grgmiller in #363
- Compress OGE Outputs by @grgmiller in #366
- Add geographical information to the plant static attributes data frame by @rouille in #364
- Add operating and retirement dates to plant static attributes by @rouille in #367
- Update to use most recent version of pudl by @grgmiller in #369
- Fix issues with anomalous gross to net conversions by @grgmiller in #370
- Fix and add information to plant static attributes by @rouille in #368
- Fix function calculating averages of the fuel types by @rouille in #371
- Update manual epa eia crosswalk reference table by @rouille in #372
- Update Uncontrolled NOx and SO2 factors by @grgmiller in #373
- Update Energy Source Codes and Utility Name Map by @grgmiller in #374
- Correct Gross to Net Generation Bugs by @grgmiller in #375
- update co2 factors based on manual energy source group updates by @grgmiller in #377
- Add geopy to pyproject dependencies by @grgmiller in #378
- Add backstop sulfur content percentage for years 2005, 2006 and 2007 by @rouille in #376
- Compare plants coordinates from PUDL and EIA-860 by @rouille in #379
- Update warning message about validated years by @rouille in #381
- Discard non-operational generators when calculating plant capacity by @rouille in #382
- Revert removal of GTN shift factors by @grgmiller in #383
- Update to 0.5.0 and change s3 directory by @grgmiller in #384
- Fix missing capacity in plant static attributes by @grgmiller in #385
- Update documentation by @rouille in #380
- Historical coverage feature / v0.5.0 by @grgmiller in #386
- Update Citation by @grgmiller in #387
Full Changelog: v0.4.0...v0.5.0
v0.4.0
This minor release improves current validation checks, adds new validation checks, enforces static sub-plant id across years and allows users to access any Global Warming Potential value via the IPCC assessment report name where it is published.
Update sub-plant crosswalk table
As discovered in #351, the subplant_id
assigned to each (plant_id_eia
, generator_id
) does not remain static across each year of OGE data. This is an issue if trying to use subplant_id
as a primary key to compare data across multiple years.
This PR updates the process of creating sub-plant IDs to try to enforce static sub-plant IDs. The changes in this PR enforce static sub-plant IDs within a single data release version of OGE, although the sub-plant IDs may still change from version to version. (#353)
Validation Checks
- For all warnings about plant-level data, adds information about the balancing area the flagged plant belongs to to help identify BAs where data quality is affected. (#348)
- When checking The validation check detecting mismatch between input and allocated EIA-923 data is now done at the plant and energy source level (#350)
- Functions for detecting anomalies in timeseries data have been added to the code base, and we now identify where gross generation, fuel consumption, and CO2 emission timeseries in the reported CEMS data may be anomalous based on a global extreme filter. (#349)
New feature
The function for calculating CO2-equivalent values now allows for the user to specify which IPCC Assessment Report to use for calculating GWP-adjusted CO2-equivalent values. (#352)
v0.3.3
This patch release addresses two issues that were preventing some users from being able to run the pipeline and use the OGE package:
- Updates the instructions for using conda to manage the oge code environment and updates the environment.yml file that specifies the conda environment. This had fallen out of date with the pipfile environment files in recent releases. (#345)
- Fixes an issue where the use of back slashes instead of foward slashes in
oge.filepaths
was causing errors when attempting to load OGE files from the s3 bucket. (#346)
This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.
v0.3.2
This patch release of OGE fixes an issue where the python version specified in pyproject.toml
was incompatible with the version of python used in the rest of the package, preventing OGE from being installed in other projects (#344)
This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.
v0.3.1
This patch release of OGE makes several updates to OGE's code infrastructure, dependencies, documentation, and file downloads, but does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.
Accessing OGE outputs and results through the cloud (#338)
- In v0.3.0 we packaged OGE, allowing other projects to import OGE code directly. However, in order to load and use any of the downloads, outputs, or results files, it would still be necessary to run the data pipeline locally to make those files available.
- This release allows these files to now be read directly from an AWS s3 bucket, eliminating the need for the pipeline to be run locally when importing OGE into another project.
- Instructions for how to set the s3 bucket as the default data store are now included in the readme
- We also fixed a bug where a log file was being created whenever an OGE function was called from another project. Now, a log file should only be created whenever the main data pipeline is run (#340)
Updates eGRID downloads to include eGRID2022 (#337)
- Although eGRID is not used as an input to the OGE data pipeline, these files are downloaded and included in the data store, as the eGRID data can be loaded and explored via several functions in OGE.
- This release includes the newly-published eGRID2022 file in the set of downloaded files
- This release also standardizes the downloaded eGRID file names to use consistent capitalization across years.
More transparent conversion factors and constants (#339)
- In past versions of OGE, some of the standard conversion factors and assumed values were spread across multiple files.
- This release moves all of these factors and assumed values (if not already included in any of the
reference_tables
) to a centralized location inconstants.py
so that they can be easily reviewed. - Moving these factors also helped avoid the potential for circular imports between the modules.
Miscellaneous
- Updates several package dependencies in the pipfile to address security updates (#341)
- Updates small errors in README file
v0.3.0
Updates PUDL dependency (#318 )
- Updates pudl dependency from v2022.11.30 to v2023.12.01, which includes a number of updates to the database structure and naming conventions (see pudl release notes)
- Changes source of PUDL database download to AWS rather than Zenodo, providing faster access to PUDL data releases
- PUDL’s CEMS database now includes data from AK, HI, and PR, which should improve hourly emissions data coverage for plants in AK and HI
- A cleaned and standardized version of the EPA-EIA power sector data crosswalk is now included in the pudl database, meaning we no longer have to manually load and standardize this data
- Emissions control equipment data from EIA-860 is now included in the pudl database, meaning we no longer need to manually load and standardize this data
- Leading zeros removed from boiler_ids, which should improve mapping between boiler tables
- The EIA-923 generation and fuel allocation process is now fully integrated into PUDL
- Fixes an issue where certain plants in NY state were being assigned the wrong BA code.
Adds 2022 data (#322)
- Integrates Final release input data from the EIA and EPA for 2022
- Adds 2022 OGE outputs
Manual reference table update (#322)
- Most reference tables did not require updating
- NOX and SO2 emissions factors: added new factors for boiler configurations that had not previously been included in the table.
- Balancing Areas: Added retirement dates for the CFE (July 2018), GLHB (September 2022), GRIF (November 2023) balancing areas
- Added new EPA-EIA plant and unit crosswalks based on 2022 data
- Added several new mappings between utilities and balancing areas
Infrastructure Updates
- Updates Python dependency from 3.10 to 3.11
- Refactors and packages OGE codebase so that functions, reference tables, and data from OGE can be imported into other projects. This package will go live on PyPi soon. (#323)
- Re-organizes location of data files. The
data/manual
files have been renamed toreference_tables
and moved tosrc/oge
, while all downloads, output files, and result files will now be saved in the user’s home directory in a folder calledopen_grid_emissions_data
(#324) - Adds support for pipenv environment management in addition to conda (#313)
- Changes PUDL and gridemissions dependencies to forks within the singularity-energy organization, rather than forked versions that lived in individual authors’ github accounts.
- Moves documentation from separately-maintained repo into the OGE repo (#303)
- Changes code formatting from
black
toruff
and adds formatting checks that must pass before merging code (#317)
Other bug/data quality fixes
- Ensure complete as possible EPA-EIA power sector data crosswalk by combining pudl-standardized PSDC, plant code mappings from eGRID, and our own manual crosswalking.
- Add handling for negative fuel consumption reported in EIA-923
- Stop dropping missing and zero values to help ensure complete timeseries
- Previously, we had dropped data from CEMS that reflected units that only reported steam generation but no electricity generation. Based on an updated understanding of this data, we no longer drop this data from OGE.
- Fixes bug in EIA-923 generation and fuel allocation process that was resulting in certain reported fuel consumption data being dropped for plants that retire mid-year
- Updates manual timestamp corrections to EIA-930 data for 2022 and on CAISO data (#300), 2021 and on TEPC data (#322)
Adds new data validation checks
- Flags when different plant primary fuel identification methods result in different primary fuel assignments: Exports the primary_fuel_table with all intermediate columns to outputs to help with validation. Adds a new validation check to flag when the plant primary fuel assigned by the pipeline does not match the capacity-based primary fuel assignment. (#296)
- Flags when subplants only contain a single combined cycle component: Combined cycle generators contain a steam part (CA) and turbine part (CT) that are linked together. Thus, our subplant groups that contain one part of a combined cycle plant should always in theory contain the other part as well. This PR adds a test that checks that both parts exist in a subplant if one exists. Besides CT and CA prime movers, there is also CS prime movers which represent a "single shaft" combined cycle unit where the steam and turbine parts share a single generator. These prime movers are allowed to be by themselves in a subplant, as are CC prime movers, which represent a "total unit." This PR adds a prime_mover_code column to the subplant crosswalk table to help validating this.(#297)
- Checks for complete monthly data within a single year: Checks that 12 monthly “report_date”s exist for each plant/subplant, and also checks that the number of missing monthly datapoints matches the number of missing datapoints in the input data from CEMS and EIA-923.
- Checks for complete hourly timestamps within a single year or single month: If the period is a 'year', checks that the length of the timeseries is 8760 (for a non-leap year) or 8784 (for a leap year). If the period is a 'month', checks that the length of the timeseries is equal to the length of the complete date_range between the earliest and latest timestamp in a month.(#299)
- Exports a new output table that identifies whether input data (and non-zero input data) exists for each plant in EIA-923 and/or CEMS.
v0.2.2
This release primarily fixes a bug that affected the quality of the CO2 emissions data for multiple regions in the Southeastern U.S., namely AEC, SOCO, and TVA. This bug resulted in substantial (>1%) errors in the emission totals and rates for these regions. This bug also affected the CO2 data for a handful of individual plants in MISO, PJM, ERCO, CPLE, SWPP, DUK, and NYIS.
This release includes multiple improvements:
- Fixes a bug that was assigning CO2 data from CEMS to the wrong rows when attempting to fill missing CO2 data (#280)
- Updates the handling of command-line arguments when running the pipeline (#288)
- Whenever net generation in a period is zero, the calculated generated emission rate was previously missing due to dividing by zero. In this release, we now fill apply a zero emission rate to these periods. For all other periods where emissions or generation data is actually missing, the generated emission rate will still be missing (#290)
Validation improvements:
- Raises a warning in allocated net generation or fuel consumption outputted from the EIA-923 generator allocation process is more than 0.1% different than the input data (#278)
- Adds logging to the data pipline, instead of using print statements. This also fixes a bug that was preventing logging messages from pudl from showing when running the pipeline. This allows us to save an output of all warnings to help validate the results. (#285)
- Expands the coverage of multiple existing validation checks to make them more comprehensive (#287)
v0.2.1
This release primarily addresses an issue identified in #271, in which our data pipeline was dropping a substantial amount of data due to mismatches in the reported energy source codes used in EIA-860 and EIA-923, and our failure to validate the outputs of the allocation process more carefully. While fixing this issue, we also came across several other issues that were causing anomalous emission factor outputs.
Summary of changes
- Previously, some generation and fuel data reported in EIA Form 923 was being dropped from OGE due to inconsistent energy source codes being used for certain plants between the EIA-923 input data and EIA-860 input data. This was resulting in incorrect emissions and generation totals for certain plants, as well as incorrect primary fuel categories being assigned to these plants.
- This release also includes several updates to the method for identifying the primary fuel type of each plant, which fixes a bug that was causing certain nuclear plants to be identified as a non-nuclear fuel type due to missing fuel consumption data in EIA-923.
- This release also includes updates to our methodology for converting gross generation data reported in CEMS to net generation. These updates include more stringent standards for which conversion factors are used for each plant, and more robust backstop conversion factors. This update will result in more net generation being reported for certain plants, and more realistic plant-level emission intensity values.
- We have also added the newly-released eGRID2021 dataset to the list of downloaded files so that 2021 OGE values can be easily compared to 2021 eGRID values using our validation notebooks included in the repository.
Detailed changes
Fixes the generation and fuel allocation process
- Fix bugs in pudl allocate_net_gen module, as described in this PR (catalyst-cooperative/pudl#2235):
- Adds a new function
add_missing_energy_source_codes_to_gens()
that adds energy_source_codes that appear in thegf
table but notgens
togens
.- In some cases, non-zero fuel consumption and net generation is reported in the EIA-923 generation and fuel table that is associated with an energy_source_code that is not associated with that plant-prime mover in the gens table, which would cause these data to get dropped when these two tables are merged. To fix this, for each plant-pm, this function identifies such esc, and adds them to the
gens_at_freq
table as new energy_source_code columns.- The sub-function
identify_missing_gf_escs_in_gens()
identifies when there are fuels reported in the gf table for that plant-pm that are not listed in the gens table for that plant pm.- Adds the
MISSING_SENTINEL
value to thenet_generation_mwh_g_tbl
column. For some reason, this column had been commented out, which was leading to NaNs appearing in the data when dividing by this column when the value was zero. I un-commented this line.- In
allocate_net_gen_by_gen_esc()
, we no longer allowfrac_from_g_tbl
to be greater than 100%. This was previously happening when the mwh reported in the g table were greater than the mwh reported in the gf table. However, numbers greater than 100% was causing thefrac_missing_from_g_tbl
to become negative, which was resulting in nonsensical allocations. We implement the same cap onfrac_from_bf_tbl
inallocate_fuel_by_gen_esc()
.- In
allocate_fuel_by_gen_esc()
, when calculatingfrac_cap
, the code had been dividingcapacity_mw
bycapacity_mw_unit_fuel
. However, this was resulting in some nonsensical allocations because fuel is being allocated by PM-fuel, not by unit. We changed this to divide bycapacity_mw_pm_fuel
instead. This is consistent with howfrac_cap
is calculated in theallocate_net_gen_by_gen_esc()
function- Rename
adjust_energy_source_codes()
toadjust_msw_energy_source_codes()
to more precisely describe what the function does
- Adds new entries to the manual emissions factor tables for NOx and SO2 to represent fuel-boiler combinations that had previously been getting dropped from the data due to this bug.
- Changes the pudl version that we use in our environment from
catalyst-cooperative/pudl@main
togrgmiller/pudl@oge_release
. This will give us more control over performing fixes like this in the future. - Adds a new validation check to the EIA-923 data cleaning process to verify that for each plant, the total allocated fuel and generation matches the total fuel and generation reported in the input generation and fuel table (basically that the allocation process is not dropping or inflating the data).
Updates plant primary fuel identification
- When assigning the plant primary fuel based on the most consumed fuel, we were previously assigning this based on the fuel with the highest
fuel_consumed_mmbtu
. However, we should be usingfuel_consumed_for_electricity_mmbtu
since we want to assign the primary fuel used for electricity generation. - Sometimes nuclear generators report 0 fuel consumption in EIA-923. Since we were assigning a plant's primary fuel first based on fuel consumption, this meant that sometimes if a nuclear plant had a backup fossil generator, the plant was being assigned the fuel code of that backup generator. To fix this, we now assign the primary fuel of any plant that contains a nuclear unit based on the nameplate capacity of the unit.
Updates to gross to net generation conversions
- Previously, when converting CEMS gross generation to net generation, we had filtered out any ratios that were greater than 1.5 or less than 0.2. However, these values were somewhat arbitrary and turning out to be too wide of a range. For example, when there was a large discrepancy between CEMS gross generation and EIA-923 net generation, we were scaling the CEMS generation to match, even though the fuel consumption and emissions reported in CEMS also disagreed and were not being scaled. This was leading to instances where a plant was using CEMS CO2 totals but EIA-923 net generation totals, resulting in the plant having abnormally high emission rates. To be consistent, if we are going to use CEMS data at all, we want to make sure that the net generation values are reasonable given the reported net generation. After analyzing three years (2019-2021) of annual gross to net ratios, both at the plant and subplant levels, it appears that generally the interquartile range of GTN ratios is between 0.75 and 1.00, with an upper bound around 1.25. Thus, we are now using 0.75 as the lower bound for filtering out ratios, and 1.25 as the upper bound.
- Previously, the backstop gross to net generation approach if all other conversion factors were not available was to assume that gross generation equaled net generation (i.e. a GTN ratio of 1). However, as identified in #177, the EIA has default gross to net conversion factors for each prime mover that they use. This PR introduces these PM-specific gross to net conversion factors as the default backstop option now. As noted in the issue, there are still improvements that need to be made before #177 can be closed, but this is a step in the right direction.
Flags potentially anomalous generated emission factors
- When outputting annual plant level data, we add a new validation check that calculates a generated co2 rate, and flags any rates that appear to be anomalous, so that we can manually inspect these plants to see if there are any unexpected results. For this test, we define anomalous values two ways. On the high end, the check flags the plant if the co2 rate is higher than 15,000 lb/MWh. On the low end, the check flags any plants that have rate lower than 10 lb/MWh but higher than 0 MWh.
Add eGRID2021 to the list of downloads for validation
- This release adds eGRID 2021 data to our list of downloads, and updates the list of non-grid connected plants based on additions to the list in eGRID 2021.
Fixes a bug that was leading to incorrect balancing authority assignments
- Fixes an issue where a plant with no reported ba_code was getting filled with the incorrect code based on the ba_name.
- We have also identified a new known issue that is not fixed in this release: In comparing our 2021 data to the eGRID 2021 data, we found that there are some plants that EIA-860 identifies as being in ISNE that are getting assigned to NYIS by pudl, and thus are categorized in a different BA than they are in eGRID. All of these plants seem to be physically located in the state of New York, but are listed with an ISNE BA code. Also, all of these plants are pretty small. See: catalyst-cooperative/pudl#2255. We will work to address this with the pudl team.
v0.2.0
Release Notes
2021 Data Release
- This release includes new data for 2021, and updates the year 2019 and 2020 data.
Hourly data for all individual plants
- The plant data results now include hourly data for every individual plant in the U.S., including those plants for which we impute the hourly generation profile. Previously, we had aggregated the imputed data to the fleet level. Details #246
Updates OGE pipeline to work with PUDL v2022.11.30
- Updates the pipeline to work with v2022.11.30 of PUDL, which introduced many breaking changes. See #258
- The new PUDL release now performs many of the CEMS data cleaning steps that we previously performed in the OGE pipeline, so these data cleaning steps have been removed
Fixes a bug that was overestimating NOx and SO2 emissions for some plants
- Some NOx and SO2 control data is missing control ID numbers in EIA-923, which was causing this data to get dropped, which meant that OGE was treating emissions from these generators as uncontrolled. This update fixes that issue. See #255
Patches bugs with consumed emissions calculation
- Updates environment to fix bug that was leading to random missing values in consumed calculation on some operating systems.
- When calculating consumed emissions, we calculate demand in a BA by subtracting interchange from generation, but for certain BAs, this approach results in negative emission factors being calculated, we we directly use reported demand from EIA-930. We updated the list of BAs for which we apply this approach, which is now differentiated by year so that we are only performing this patch where strictly necessary.
- Makes the approach for identifying imputed values in the EIA-930 data cleaned by gridemissions consistent across the hourly shaping step and the consumed emissions step of the data pipeline.
- Updates the manual cleaning of EIA-930 data to remove OVEC corrections, and add a timestamp offset for SC prior to 12-31-2020
Validation
- Add a validation check for missing values in all results files
Output files
- Export a cleaned version of the unit-level CEMS dataset to outputs (
outputs/cems_cleaned.csv
). Previously we only exported a version after aggregation to subplants and gross-to-net-generation conversion. This original file was renamedoutputs/cems_subplant.csv
. - Add an option to export metric files or not when outputting data
Balancing authority updates
- Anchorage Municipal Light and Power retired on October 30, 2020
- Electric Energy, Inc (EEI) changed to a generation-only BA
- Map Pacific Gas and Electric utility to CISO
Emissions Calculation Updates
- When imputing missing emissions data in CEMS, we now calculated fuel-weighted emission factors for each subplant-month which are used for imputing missing emissions values. This is based on the total consumption of each type of fuel that is reported to be consumed in each subplant-month in EIA-923. The process for imputing missing emissions is now:
- If a unit has non-missing emissions data for other hours in the same month, calculate a unit-month specific EF from the CEMS-reported fuel consumption and emissions
- For all remaining missing values, use the subplant and month-specific weighted average emission factor from subplant_emission_factors calculated from the EIA-923 data
- For any remaining missing values, calculate emissions based on the subplant primary fuel and fuel consumption
- For any remaining missing values, calculate emissions based on the fuel type assigned in the power sector data crosswalk.
- Previously, when assigning a fuel type to each
emissions_unit_id_epa
, we prioritized using the fuel type reported in the power sector data crosswalk. We now identify the primary fuel type for each subplant using EIA-923 fuel consumption data, and use this to assign a fuel type to each CEMS unit. The fuel type reported in the power sector data crosswalk is now used as a last case. - Add a NOx emission factor for CS prime movers with OG fuel consumption
Subplant identification
- Manually assign all units at plant 1391 to a single subplant
EPA-EIA Crosswalk
- Update to use v0.3 of the power sector data crosswalk
- The crosswalk is now integrated as a table into PUDL, so we will use the pudl cleaned version in the pipeline
- Add manual unit to generator crosswalks for plants 60925, 60910, 63259
- Identified plant 59073 (Cove Point LNG Terminal) as a non-grid connected plant
Notebooks
- Add notebook to explore the reported fuel heat content for each fuel (notebooks/manual_data/export_fuel_heat_content.ipynb)
- Update notebook used to identify uncorrected time lags in raw EIA-930 data
Known issues
- The gross to net conversion for two plants (plant 55799 in 2019, and plant 57865 in 2021) is likely incorrect due to data inconsistencies between reported gross generation in CEMS and reported net generation in EIA-923 for these plants.
What's Changed
- Integrate Power Sector Data Crosswalk v0.3 by @grgmiller in #244
- Fix missing NOx and SO2 identifiers by @grgmiller in #256
- Add hourly data for all individual plants by @grgmiller in #246
- Gailin/evaluate 2021 lags by @gailin-p in #264
- Patch negative efs by @grgmiller in #265
- Fix consumed calc by @gailin-p in #266
- Update OGE for to work with PUDL v.2022.11.30 and integrate 2021 data by @grgmiller in #259
- Release 0.2.0 by @grgmiller in #267
Full Changelog: v0.1.2...v0.2.0