Powers the analytics backend of intermittent.energy/, providing deep insights into global electricity markets and renewable energy integration.
A high-performance Ruby application for collecting, processing, and analyzing real-time power grid data worldwide. Built on PostgreSQL with TimescaleDB for lightning-fast time series processing.
Follow @IntermittentNRG for updates.
🌎 Americas
- 🇺🇸 ⚡ CAISO (California Independent System Operator)
- 🇺🇸 🔋 EIA (U.S. Energy Information Administration)
- 🇺🇸 💡 ERCOT (Electric Reliability Council of Texas) (unused)
- 🇺🇸 🔌 NYISO (New York Independent System Operator) (unused)
- 🇨🇦 ⚡ AESO (Alberta Electric System Operator)
- 🇨🇦 🔋 HYDRO-QUÉBEC (WIP, lacks history)
- 🇨🇦 💡 IESO (Independent Electricity System Operator - Ontario)
- 🇨🇦 🔌 NS Power (Nova Scotia Power) (WIP, only reports generation in %)
- 🇧🇷 🔋 ONS (National System Operator - Brazil)
- 🇦🇷 💡 CAMMESA (Wholesale Electricity Market Administrator - Argentina)
🌍 Europe
- 🇪🇺 ⚡ ENTSOE (European Network of Transmission System Operators for Electricity)
- 🇬🇧 🔋 ELEXON (GB Electricity Market)
- 🇬🇧 💡 National Grid ESO (Great Britain)
- 🇪🇸 ⚡ REE (Red Eléctrica de España)
🌏 Asia-Pacific
- 🇦🇺 ⚡ AEMO (Australian Energy Market Operator)
- 🔌 AEMO NEM - National Electricity Market
- 🔋 AEMO WEM - Western Australia
- 💡 AEMO NEM Archive - Historical NEM data
- 🇦🇺 🔌 OpenNEM (Australian National Electricity Market Data)
- 🇹🇼 ⚡ Taipower (Taiwan Power Company)
- 🇯🇵 🔋 Tohoku (Tohoku Electric Power Company - Japan)
🌍 Africa
- 🇿🇦 ⚡ ESKOM (South Africa)
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.
- Ruby
- PostgreSQL with TimescaleDB extension
- Environment variables for API access and configuration:
AESO_TOKEN
- Alberta Electric System Operator API tokenAESO_QUEUE_URL
- AWS SQS queue URL for AESO data processingEIA_TOKEN
- U.S. Energy Information Administration API tokenELEXON_TOKEN
- GB Electricity Market API tokenENTSOE_TOKEN
- ENTSOE API tokenENTSOE_USER
- ENTSOE SFTP usernameENTSOE_PASSWORD
- ENTSOE SFTP passwordERCOT_PROXY_API_KEY
- ERCOT proxy API keyES_URL
- Elasticsearch URL for logging (optional)RAILS_ENV
- Environment name (development/test/production)TAIPOWER_QUEUE_URL
- AWS SQS queue URL for Taipower data processing
Database configuration should be specified in db/config.yml
. The system uses ActiveRecord with TimescaleDB for time-series data storage.
The project uses RSpec for testing. Tests can be run with:
bundle exec rspec
Test coverage is tracked using SimpleCov with Cobertura formatter.
- Ruby with YJIT enabled
- PostgreSQL with TimescaleDB extension for time-series data
- Semantic Logger for structured logging
- AWS SQS for queue processing
- Elasticsearch for logging (optional)
- Various HTTP clients (Faraday, HTTParty) for API access
- Fast parsers (FastJsonparser, FastestCSV) for efficient data processing
The system includes several key components:
- Data collectors for each market operator
- Time zone handling for different regions using TZInfo
- Data validation and normalization
- Bulk data import capabilities
- Historical data processing
- Real-time data updates
- SFTP support for certain providers (e.g., ENTSOE)
- AWS SQS integration for queue-based processing
- Automatic data deduplication
Data can be processed into various formats through the Out2
module:
- Generation data by fuel type
- Unit capacity and details
- Transmission flows
- Price data
- Load/demand data
- Unit-level generation data
- Rooftop solar data (where available)
When contributing:
- Write tests for new functionality (RSpec with VCR for HTTP mocking)
- Follow existing code patterns for new market integrations
- Use the provided CLI mixins for consistent command-line interfaces
- Handle time zones appropriately using the TZ constants
- Use the provided logging infrastructure (SemanticLogger)
- Leverage the fast parsers (FastJsonparser, FastestCSV) for data processing
- Follow the established data validation patterns
To setup the database, run:
rake db:setup
The system provides various Rake tasks for data collection:
# ENTSOE data collection
rake entsoe:all # Run all ENTSOE tasks
rake entsoe:generation # Collect generation data
rake entsoe:load # Collect load data
rake entsoe:price # Collect price data
rake entsoe:transmission # Collect transmission data
# ELEXON data collection
rake elexon:all # Run all ELEXON tasks
rake elexon:fuelinst # Collect fuel mix data
rake elexon:load # Collect load data
rake elexon:unit # Collect unit data
# Other market operators
rake aemo:all # Run all AEMO tasks
rake nordpool:all # Run all Nordpool tasks
Production tasks:
rake -j4 all # Runs all collection tasks
Also used in Jenkinsfile and GH actions. Remember that it can also run multiple tasks in parallel via -j4.
The system uses a TimescaleDB-powered PostgreSQL database with the following structure:
erDiagram
Areas {
smallint id PK
string code
string name
string source
string internal_id
string type
string region
string electricitymaps_id
}
ProductionTypes {
smallint id PK
string name
string name2
boolean enabled
}
AreasProductionTypes {
int id PK
int area_id FK
int production_type_id FK
}
Units {
int id PK
int area_id FK
int production_type_id FK
int areas_production_type_id FK
string internal_id
string code
}
GenerationUnits {
int unit_id PK,FK
timestamptz time PK
int value
}
GenerationData {
int area_id FK
int production_type_id FK
int areas_production_type_id FK
timestamptz time PK
int value
}
Load {
int area_id PK,FK
timestamptz time PK
int value
}
Transmission {
int areas_area_id FK
timestamptz time PK
int value
}
AreasAreas {
smallint id PK
smallint from_area_id FK
smallint to_area_id FK
}
Prices {
int area_id PK,FK
timestamptz time PK
decimal value
}
DataFiles {
string path PK
string source PK
timestamp updated_at
}
Areas ||--o{ Units : "has"
Areas ||--o{ AreasProductionTypes : "has"
Areas ||--o{ Load : "has"
Areas ||--o{ Prices : "has"
Areas ||--o{ AreasAreas : "from"
Areas ||--o{ AreasAreas : "to"
ProductionTypes ||--o{ Units : "type"
ProductionTypes ||--o{ AreasProductionTypes : "type"
AreasProductionTypes ||--o{ GenerationData : "generates"
Units ||--o{ GenerationUnits : "generates"
AreasAreas ||--o{ Transmission : "connects"
Key features of the schema:
-
Time-Series Tables
generation_data
- Core time-series data for generationgeneration
- View for compatibility and simpler queryingload
,transmission
,prices
- Other core time-series data- All optimized with TimescaleDB hypertables
- All time columns use
timestamptz
(timestamp with timezone) type - Compressed for efficient storage
-
Reference Tables
areas
- Geographic/market regionsproduction_types
- Types of power generationareas_production_types
- Links areas with their production typesunits
- Individual generation units (supports both direct area/production_type references and areas_production_type_id)
-
Tracking Tables
data_files
- Import tracking and deduplication
-
Key Features
- Composite primary keys for time-series data
- Foreign key constraints for data integrity
- Specialized indices for time-range queries
- Compression policies for historical data
-
Join Tables
areas_areas
- Area interconnectionsareas_production_types
- Area-specific production types
A learning here is that previously the transmission
table had from_area_id
and to_area_id
columns. Moving these to a separate areas_areas
table reduces table size, improves indexing, simplifies query plans etc. Similarly, generation_data
now uses areas_production_type_id
instead of separate area and production type columns, though the generation
view maintains the old interface for compatibility.