This repo walks through a complete exploration of the Open Power System Data “Time Series” package. You’ll see how I handled real-world energy data, cleaned and transformed it, ran some deeper analytics, and then turned everything into four polished charts.
-
Grabbing the Data
I start by downloading the latest OPSD zip file, checking its integrity, and unpacking it intodata/. No manual clicks—everything runs from the notebook. -
Tidying and Validating
Next, I align timestamps (including daylight-saving quirks), forward-fill any gaps under six hours, flag longer gaps, and clip any wildly impossible values (like negative loads or huge spikes). -
Building New Features
To get more insight, I tag weekends and holidays, calculate rolling averages (24 h and 7 d), and compute each hour’s “renewable share” (wind + solar divided by load). -
Digging into the Numbers
Here’s where the fun begins. I pull together country-by-year summaries, plot hourly profiles and heatmaps, trace how base-load has drifted since 2015, and chart renewables’ growing slice of the pie. -
Advanced Analysis & Reporting
In one script (src/advanced_analytics.py), I crunch all these features and spit out a single JSON report (output/reports/comprehensive_energy_report.json) that captures every metric and insight. -
Final Portfolio-Ready Charts
The final notebook reads that JSON and produces four publication-style PNGs inoutput/figures/—no interactive widgets, just clean static images with captions.
- Grid Flexibility (Duck Curve): See how ramp-up requirements vary by country—critical for folks planning new storage or demand-response programs.
- Renewable Intermittency: Quantify solar’s predictability versus its volatility.
- Demand Forecasting: A Random Forest model nails an R² over 0.94 using just last hour’s load and weekday features.
- Behavior Clusters: K-Means identifies distinct daily patterns (e.g., winter workdays vs. summer weekends), which can guide targeted grid interventions.
opsd\_timeseries\_analysis/
├── data/ # raw OPSD files (git-ignored)
├── notebooks/
| ├── output/
| ├── figures/ # final PNG charts
| └── reports/
| └── comprehensive\_energy\_report.json
│ ├── 01\_download.ipynb # grabs and unzips the data
│ ├── 02\_clean\_explore.ipynb # cleaning, feature engineering, EDA
│ ├── 04\_advanced\_insights.ipynb # runs analytics script, writes JSON
│ └── 05\_final\_visualizations.ipynb # loads JSON, saves final PNGs
├── src/
│ ├── opsd\_utils.py # data-loading & helper functions
│ └── advanced\_analytics.py # builds the JSON report
├── environment.yml # conda setup
└── README.md # you’re reading it now
-
Clone it
git clone https://github.com/Ollenmire/opsd_timeseries_analysis.git cd opsd_timeseries_analysis -
Set up your environment
conda env create -f environment.yml conda activate opsd-timeseries-analysis
-
Run the notebooks in order
01_download.ipynb→ fetch data02_clean_explore.ipynb→ clean & EDA04_advanced_insights.ipynb→ generate JSON05_final_visualizations.ipynb→ export final charts
All the text commentary and interpretations land in the JSON report and are called out in each notebook; the static PNGs live in notebooks/output/figures/.
After running the full analysis pipeline, four standalone charts landed in notebooks/output/. Here’s what each one tells us:
-
Duck Curve Ramp Rate Trends

The familiar midday “dip-and-rise” in net load (the duck curve) is getting steeper over time. As solar capacity grows, evening ramp-up rates have climbed significantly—driving home the urgent need for faster-responding reserves or storage to keep the lights on. -
Renewable Intermittency Analysis

Wind and solar outputs swing unpredictably hour to hour. Some countries (e.g. DE, ES) show much higher volatility and more frequent extreme ramp events than others, demonstrating exactly why grid operators lean on batteries or backup plants when the wind stops or a cloud passes. -
ML Demand Forecasting Results for Germany (DE)

A Random Forest model nails short-term demand with R²≈0.94 using just lag features and day-of-week. Most of the scatter in the “Actual vs. Predicted” plot happens around peak hours, suggesting those are the toughest times to forecast, and where better real-time data could pay dividends. -
Load Profile Clustering for Germany (DE)

Daily consumption naturally splits into five clusters—think “winter workday,” “summer weekend,” etc. Each cluster has a distinct peak hour, valley hour, and daily range. Utilities can leverage these segments to tailor demand-response or tariff programs to the right customer groups.
Run it yourself: all code lives in the
notebooks/folder. If you want to verify any of these insights—or drill into another country or time window—just follow the Getting Started steps above and let the notebooks do the rest.
- Source: OPSD Time Series
- Download URL: https://data.open-power-system-data.org/time_series/2020-10-06/opsd-time_series-2020-10-06.zip
- License: CC BY 4.0
- Coverage: 37 European countries, hourly resolution from 2015 onward