OPSD Time Series Analysis

This repo walks through a complete exploration of the Open Power System Data “Time Series” package. You’ll see how I handled real-world energy data, cleaned and transformed it, ran some deeper analytics, and then turned everything into four polished charts.

What We’re Doing

Grabbing the Data
I start by downloading the latest OPSD zip file, checking its integrity, and unpacking it into data/. No manual clicks—everything runs from the notebook.
Tidying and Validating
Next, I align timestamps (including daylight-saving quirks), forward-fill any gaps under six hours, flag longer gaps, and clip any wildly impossible values (like negative loads or huge spikes).
Building New Features
To get more insight, I tag weekends and holidays, calculate rolling averages (24 h and 7 d), and compute each hour’s “renewable share” (wind + solar divided by load).
Digging into the Numbers
Here’s where the fun begins. I pull together country-by-year summaries, plot hourly profiles and heatmaps, trace how base-load has drifted since 2015, and chart renewables’ growing slice of the pie.
Advanced Analysis & Reporting
In one script (src/advanced_analytics.py), I crunch all these features and spit out a single JSON report (output/reports/comprehensive_energy_report.json) that captures every metric and insight.
Final Portfolio-Ready Charts
The final notebook reads that JSON and produces four publication-style PNGs in output/figures/—no interactive widgets, just clean static images with captions.

Why It Matters

Grid Flexibility (Duck Curve): See how ramp-up requirements vary by country—critical for folks planning new storage or demand-response programs.
Renewable Intermittency: Quantify solar’s predictability versus its volatility.
Demand Forecasting: A Random Forest model nails an R² over 0.94 using just last hour’s load and weekday features.
Behavior Clusters: K-Means identifies distinct daily patterns (e.g., winter workdays vs. summer weekends), which can guide targeted grid interventions.

Repo Layout

opsd\_timeseries\_analysis/
├── data/                              # raw OPSD files (git-ignored)
├── notebooks/
|   ├── output/
|        ├── figures/                   # final PNG charts
|        └── reports/
|             └── comprehensive\_energy\_report.json
│   ├── 01\_download.ipynb              # grabs and unzips the data
│   ├── 02\_clean\_explore.ipynb         # cleaning, feature engineering, EDA
│   ├── 04\_advanced\_insights.ipynb     # runs analytics script, writes JSON
│   └── 05\_final\_visualizations.ipynb  # loads JSON, saves final PNGs
├── src/
│   ├── opsd\_utils.py                  # data-loading & helper functions
│   └── advanced\_analytics.py          # builds the JSON report
├── environment.yml                    # conda setup
└── README.md                          # you’re reading it now

Getting Started

Clone it

git clone https://github.com/Ollenmire/opsd_timeseries_analysis.git
cd opsd_timeseries_analysis

Set up your environment

conda env create -f environment.yml
conda activate opsd-timeseries-analysis

Run the notebooks in order
- 01_download.ipynb → fetch data
- 02_clean_explore.ipynb → clean & EDA
- 04_advanced_insights.ipynb → generate JSON
- 05_final_visualizations.ipynb → export final charts

All the text commentary and interpretations land in the JSON report and are called out in each notebook; the static PNGs live in notebooks/output/figures/.

Key Findings

After running the full analysis pipeline, four standalone charts landed in notebooks/output/. Here’s what each one tells us:

Duck Curve Ramp Rate Trends

The familiar midday “dip-and-rise” in net load (the duck curve) is getting steeper over time. As solar capacity grows, evening ramp-up rates have climbed significantly—driving home the urgent need for faster-responding reserves or storage to keep the lights on.
Renewable Intermittency Analysis

Wind and solar outputs swing unpredictably hour to hour. Some countries (e.g. DE, ES) show much higher volatility and more frequent extreme ramp events than others, demonstrating exactly why grid operators lean on batteries or backup plants when the wind stops or a cloud passes.
ML Demand Forecasting Results for Germany (DE)

A Random Forest model nails short-term demand with R²≈0.94 using just lag features and day-of-week. Most of the scatter in the “Actual vs. Predicted” plot happens around peak hours, suggesting those are the toughest times to forecast, and where better real-time data could pay dividends.
Load Profile Clustering for Germany (DE)

Daily consumption naturally splits into five clusters—think “winter workday,” “summer weekend,” etc. Each cluster has a distinct peak hour, valley hour, and daily range. Utilities can leverage these segments to tailor demand-response or tariff programs to the right customer groups.

Run it yourself: all code lives in the notebooks/ folder. If you want to verify any of these insights—or drill into another country or time window—just follow the Getting Started steps above and let the notebooks do the rest.

Data Details

Source: OPSD Time Series
Download URL: https://data.open-power-system-data.org/time_series/2020-10-06/opsd-time_series-2020-10-06.zip
License: CC BY 4.0
Coverage: 37 European countries, hourly resolution from 2015 onward

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OPSD Time Series Analysis

What We’re Doing

Why It Matters

Repo Layout

Getting Started

Key Findings

Data Details

About

Uh oh!

Releases

Packages

Languages

Ollenmire/opsd_timeseries_analysis

Folders and files

Latest commit

History

Repository files navigation

OPSD Time Series Analysis

What We’re Doing

Why It Matters

Repo Layout

Getting Started

Key Findings

Data Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages