To answer this question, we examine:
- historical electrical generation from PUDL
- forecasts from the managing operator PJM
- the existing mix of power plants and Illinois government policies
- a brief survey of recent data center construction
Based on the analysis, we can make four key observations:
- Past electrical generation since 2002 in the Northern Illinois ComEd zone shows a steady, modest rise.
- Hyperscaler data center growth in the Midwest is not an illusion [16]. States and consumers are feeling the price increases [18].
- PJM forecasts for 2025 and beyond do NOT show an expected "AI bump" and are relatively flat, suggesting a "wait and see" approach from operators.
- The flat prediction is probably wrong. ComEd might rely on nuclear to offset high demand in the immediate future--maybe 2-5 years but, operators and governments should fast-track new renewable plants now.
The rapid growth of energy-intensive AI/ML technologies dominates the current energy discourse. Google Trends reveals a fivefold increase in searches for "data center" over the past five years. Media coverage highlights concerns of soaring energy bills and the possibility of grid instability [7][10][11][12]. For Illinois, maintaining a robust and resilient power grid is essential to sustaining investment and reinforcing its position as a critical national infrastructure hub.
Illinois offers a particularly instructive case study in data center expansion:
- The state ranks among the top five global data center markets, with Chicago tied for the third-largest hub in the U.S. (222 sites in Q3 2025) [15].
- Data centers in Illinois consumed approximately 5.4% of statewide electricity as of early 2025, significantly higher than national averages [1].
- Illinois is the nation’s leading producer of nuclear power, generating nearly one-eighth of total U.S. nuclear electricity [2].
These dynamics underscore the importance of examining the past, present, and future of Illinois’ electricity system as it prepares for continued data center growth.
PAST:
Historical records from PUDL (pronounced puddle) provide decades of monthly generation data at national and state levels. Developed by Catalyst Cooperative, PUDL consolidates, cleans, and standardizes widely used public datasets. This long-term record allows direct analysis of generation capacity and mix in Illinois since the early 2000s.
PRESENT:
Indicators suggest that pressure on the Illinois grid is already materializing. Analysts cite rising consumer costs, water usage concerns, and capacity constraints [10][11][12]. Forecasts by Synapse point to a near-term 8% price increase. Additionally, policymakers in PJM states are increasingly vocal about the financial impacts of data center demand on electricity markets [7]. The breadth and frequency of these warnings suggest that current risks, while unevenly distributed, are both tangible and escalating.
FUTURE:
Illinois is part of the PJM interconnection, a multi-state regional transmission operator. PJM does not generate power; rather, it functions as the traffic control system for electricity, balancing supply and demand to prevent outages and price volatility. Within the COMED (Commonwealth Edison) zone in Northern Illinois, PJM plays a critical role in integrating future data center growth. Looking ahead, transparency will increase with the passage of SB2181, which requires data centers to disclose site-level energy and water consumption starting in 2026 [5]. Coupled with PJM’s multi-year demand forecasts, these data sources provide a foundation for assessing Illinois’ readiness under different growth scenarios.
- The Public Utility Data Liberation (PUDL) Project — open-source U.S. electricity sector data
- PJM — regional coordinator of electricity in all or parts of 13 states and D.C.
- DATACENTERS.com — searchable directory of U.S. data centers
- Nominatim — open-source geocoding API service
| Technology | Usage |
|---|---|
| Python | Expecting Visual Basic? All workflows and tools where chosen to be Python-centric |
| DuckDB | Fast, portable local database optimized for analytical workflows in data science |
| Selenium | Open-source automation framework for web browsers |
| dbt | Open-source command-line tool for modeling transformations and objects in SQL |
| Prefect | Pythonic open-source framework for orchestrating, scheduling, and monitoring data workflows |
| Grafana | Open-source platform for creating interactive dashboards and visualizing metrics, logs, and other data |
Honestly, there wasn't much need for the "T" in ETL or much integration It's not that kind of DE project :)
The team at Catalyst have already done the work of reconciling EIA data using Dagster instead of Prefect and hosting it on S3 in parquet format. They're also using dbt in an interesting way to track invocations.
PJM's data access leaves much to be desired. They have robust trading-centric web apps--perhaps they have internal APIs that members can access?
If I had to track power generation for PJM, I would try to unify ComEd SCADA output using a more industry-standard tool like Apache Airflow. Ironically, there would probably be some RAG in the pipeline. It will be interesting to se what reporting standards emerge with Illinois SB2181.
-
Python: I used v3.13.7; anything from 3.10+ will probably work.
-
Git: 2.51 was used for this project. 2+ should work.
-
Grafana: homebrew:
brew install grafanabrew services start grafana -
uv: (sort of optional, but highly recommended)
brew install uvorpipx install uv
1. Clone the repo:
git clone git@github.com:rbhughes/power_puddle.git
2. Install project dependencies
uv sync (or install libs in pyproject.toml if you prefer pip)
3. Setup dbt dependencies
(within the project's dbt/ folder, cd dbt)
uv run dbt deps
4. Run Prefect flows
If you don't care about the 150+ data center locations on the map (they're not part of the analysis) or you want to avoid Selenium, comment it out in flows.etl_flows before running.
uv run --project . python -m flows.etl_flows
5. Build dbt models
(again, from within dbt/)
uv run dbt run
To selectively (re)run a model after modifying
uv run dbt run --select int_actual_vs_forecast
6. Start the Flask API for Grafana
The port is set to 5500 by default. Change it in api/puddle_api.py.
uv run --project . python -m api.puddle_api
Test a route in your browser:
http://localhost:5500/api/us-monthly-generation
7. Import visualizations into Grafana
I'll leave this as an exercise for the reader. It is likely that some incompatibility in the JSON formats will have crept in for open source Grafana, but exports of the visualizations and dashboards in this repo are in grafana/dashboards.
-
The pipeline integrates four distinct data sources - PUDL parquet files from S3 for historical power generation data, PJM Excel files for demand forecasts, web-scraped data center information from DATACENTERS.com, and geocoding services from Nominatim.
-
Prefect orchestrates the entire workflow, coordinating data collection, web scraping with Selenium, and Python processing scripts. This demonstrates modern data engineering practices with proper workflow management.
-
DuckDB serves as the central analytical database, chosen for its native parquet support and simplicity for analytical workloads. Raw data is staged in the data/ directory before processing.
-
The dbt transformation layer implements a clean staging-intermediate-marts architecture, handling the complex task of merging nuclear generation data (reported separately by EIA) with other fuel sources, then creating dimensional models for analysis.
-
A Flask API provides six endpoints that serve JSON data to Grafana dashboards, enabling interactive visualization of power plant locations, generation trends, and forecast accuracy analysis.
Most steps are orchestrated with Prefect. See the Prefect flows for technical details. DuckDB was chosen for its native parquet support and simplicity.
PostgreSQL was a close runner-up, but my dev environment was configured for an older version that I didn't want to break.
1. Collect PJM data
The PJM site has basic search and download features. The forecast files required are available as Excel files (.xls or .xlsx) with inconsistent naming. Download the relevant files to the data directory.
Note:
<year>-load-report-data.xlsxfiles contain forecast data for the publication year and several years ahead. These are used for COMED forecasts. The*stage-1-resources-by-zone*files, once considered for linking PJM and PUDL plant names, were too unreliable for use.
2. Initialize PUDL tables
Load the parquet files from the PUDL distribution’s S3 site into your local DuckDB database. These contain power plant locations and monthly historical generation data for all states. Nuclear generation data is managed separately and merged later.
| PUDL | DuckDB |
|---|---|
| core_eia__entity_plants | puddle.main.plants |
| core_eia__monthly_generation_fuel_nuclear | puddle.main.monthly_gen_nuc |
| core_eia__monthly_generation_fuel | puddle.main.monthly_gen |
3. Collect Data Center info
Scrape data center names and street addresses from datacenters.com using Selenium and BeautifulSoup. The site imposes rate limits and has tricky pagination. After obtaining addresses, use Nominatim for geocoding (lat/lon).
These locations are not used for primary analysis but highlight that nearly all 154 are in Northern IL (within the COMED zone), while Southern IL coverage is sparse.
4. Run dbt workflows
- Staging:
Merge monthly generation from nuclear and other sources (solar, natural gas, oil, etc.), standardizing thefuel_type_code_pudlvalues asfuel_category. - Intermediate:
Perform a union ofmonthly_genandmonthly_gen_nucsince nuclear MWh is reported separately by the US EIA. - Marts:
| model | purpose |
|---|---|
dim_data_center.sql |
list from datacenters.com + lat/lon |
dim_fuel.sql |
standardize fuel_type_code_pudl values |
dim_plant.sql |
power plant metadata from PUDL |
fact_generation.sql |
group generation by plant |
mart_actual_vs_forecast.sql |
MWh from PUDL vs. forecasts from PJM |
mart_monthly_generation_summary.sql |
all US (and IL) historical PUDL MWh |
5. Flask/Blueprint API
Grafana is used for visualization and analysis. It does not natively connect to DuckDB but can consume JSON from APIs. Endpoints reference the dbt models and are served via a local Flask app—ideal for iterative development and avoids handling intermediate CSV files or PostgreSQL setup.
Defined routes:
| route | description |
|---|---|
/il-data-centers |
data center names, lat/lon |
/il-power-plants |
power plants with net MWh generation |
/us-monthly-generation |
US monthly generation time-series |
/il-monthly-generation |
IL monthly generation time-series |
/actual-vs-forecast |
PUDL generation vs PJM forecasts |
/actual-vs-forecast-mase |
Mean Absolute Scaled Error scores |
6. Define Grafana Visualizations
Grafana is run locally via Homebrew. Visualizations use the Flask API routes described above.
The distinction between the ComEd region in Northern Illinois and the southern half of the state is significant. Large fossil-fuel plants (red) are concentrated mostly in the south, while nuclear power (purple) dominates in the north. Data centers (yellow) are highly clustered near Chicago.
This time-series chart displays US monthly generation since 2002. Key trends include:
- The gradual decline of coal
- The rise and dominance of natural gas
- General stability of nuclear power
- Steady but slower growth in renewables, primarily wind and solar
Northern Illinois's robust nuclear capacity sets it apart from other US regions. While this chart does include southern Illinois fossil-fuel plants, the magnitude of nuclear's impact is obvious. The state's nuclear moratorium has recently been updated, paving the way for small nuclear plant deployment [14].
Note the relatively modest contribution of natural gas in Illinois compared to the national picture. While Illinois is a major natural gas consumer with extensive pipeline infrastructure, future energy policy is expected to emphasize wind and solar over new gas-fired plants [7].
Analyzing the PUDL and PJM datasets side-by-side reveals surprising results. The PUDL data (green) shows actual MWh generated for all ComEd counties in Northern Illinois. PJM data (purple) reflects the latest published forecasts for upcoming years.
Forecasts from 2016–2018 appeared to project modest growth. In contrast, more recent forecasts from 2019–2023 are noticeably flat, lacking any anticipated surge in demand. Despite widespread warnings about looming grid failures and price shocks attributed to data centers, the latest PJM forecasts still project minimal or no increase in future demand. The linear trend looks nearly flat. PJM itself acknowledges possible underestimation of hyperscale data center impacts in the ComEd zone [17]. The absence of even modest projected growth is puzzling.
Note: 2023 forecast is from PJM; the 2024 Excel file was corrupted and could not be included.
I attempted to calculate a Mean Absolute Scaled Error to see if their forecast accuracy had measurably changed over the past few years, and it does suggest increasing error.
|
MASE (Mean Absolute Scaled Error)** is a forecast accuracy metric: - MASE 0.7 — model's error is 70% of the naïve error (good). - MASE 2.2 — model's error is more than double the naïve forecast (poor). - Lower scores are better; scores below 1 are ideal. MASE is scale-independent. Changes in magnitude, actuals, or variance affect both the error numerator (average absolute forecast error) and denominator (average absolute naïve error) proportionally. |
Caution
For complete accuracy, this comparison would require plant-by-plant PUDL data mapped precisely to PJM’s forecast input list for the ComEd zone. That data is not publicly available. Attempts to fuzzy-match plant names across PJM and PUDL were inconsistent. Other complicating factors include interconnections, occasional non-PJM supply at times of peak demand, new generator additions/retirements, and scheduled outages. The chosen method uses all Northern Illinois counties assigned to ComEd as the “actual” filter, with PJM forecasts left unadjusted.
-
Brighlio: Illinois Data Centers Offer High Security, Efficiency
Summary: Overview of major Chicago-area data centers, highlighting their sustainable design, high security, and industry certifications for reliable and efficient operations. -
EIA: One-Page Illinois Energy Data Snapshot
Summary: Comprehensive state energy profile including generation mix, electricity prices, top industries, and historical trends from the U.S. Energy Information Administration. -
IL General Assembly: Nuclear Reactors Law Lifts Small Plant Ban
Summary: Details 2023 Illinois law partially lifting the ban on new nuclear plant construction, specifically permitting small modular reactors under 300 MW. -
Illinois Policy: Nuclear Growth Potential Despite Moratorium
Summary: Argues that Illinois can harness significant economic benefits from expanding nuclear energy if policy restrictions are further relaxed. -
IL SB2181: New Data Center Energy Reporting Requirement
Summary: Illinois legislation mandating state agencies report on large data center power usage for improved grid planning. -
PJM: Board Fast-Tracks ‘Critical Issue’ Process for Data Centers
Summary: PJM’s board launches an accelerated rulemaking track to address dramatic growth in data center electricity demand, targeting new reliability and interconnection standards. -
Reuters: Governors Demand Greater Say in PJM Grid
Summary: More than a quarter of U.S. state governors seek more control over PJM, citing soaring electricity prices fueled by surging AI data center demand. -
SiteSelection: Illinois Remains a Top Data Center Location
Summary: Analysis of Illinois’s national standing as a leading U.S. data center destination, with factors including incentives, infrastructure, workforce, and market activity. -
Small Nuclear Reactors Now Allowed in Illinois
Summary: Law permitting new small modular nuclear reactors, ending Illinois’ total ban on new nuclear sites. -
Synapse: Illinois Data Center Growth Risks (Slide Deck)
Summary: Slides quantifying how rapid new data center load will boost Illinois and PJM grid demand, pushing total loads and requiring major new energy infrastructure by 2040. -
Synapse: Illinois Data Center Load Growth Risks (Fact Sheet)
Summary: Brief describing a projected 30% increase in ComEd’s grid load from data centers by 2040, driving up residential bills 8.3% and boosting CO2 emissions by 64% in the region. -
Synapse: PJM Data Center Growth Raises Regional Bills
Summary: PJM-wide analysis finds large anticipated increases in load, peak demand, and emissions from rapid data center buildout, raising residential electricity bills. -
Synapse: State of Illinois Energy System—2025 Landscape
Summary: Deep-dive report on Illinois energy trends, policies, decarbonization goals, and forecasted demand scenarios, including data center and electrification load impacts. -
TGS: Oil Revitalization and Data Center Expansion in Illinois Basin
Summary: Outlines how massive regional oilfield redevelopment projects and hyperscale data center construction are reshaping the Illinois Basin’s economic and energy future, with integrated geoscience datasets supporting both. -
Brighlio: Illinois Rises as Colocation and Hosting Hub
Summary: Summarizes key players and facility features drawing major cloud and tech clients to Illinois’s data center market. -
Whitecase: Large Pipeline of Hyperscaler project in PJM
Summary: Concrete growth through 2030 is now fully backed by utility/grid agreements, not projections or speculation -
PJM: Long Term Load Forecasts
Summary: Slow large-load reporting, model lag, and the difficulty of distinguishing “hyperscale” facilities from standard commercial uses makes modelling forecasts difficult -
UtilityDive: PA Governor threatens to leave PJM
Republican and Democratic governors of PJM Interconnection states on Monday threatened to pull out of the grid operator’s markets unless states are given a role in governing the organization.






