noaa_weather_hourly
cleans historical LCD weather files from the National Oceanic and Atmospheric Administration (NOAA). It uses a simple command line interface to generate observed hourly (and other frequency) .CSV files.
- Date
- AltimeterSetting
- DewPointTemperature
- DryBulbTemperature
- Precipitation
- PressureChange
- RelativeHumidity
- StationPressure
- Sunrise
- Sunset
- Visibility
- WetBulbTemperature
- WindDirection
- WindGustSpeed
- WindSpeed
- No source data
This is a Python script that requires a local Python installation. The following method uses pipx for installation which makes the 'noaa_weather_hourly' command available to run from any directory on the computer.
Does this seem like too much work? Download a NOAA CSV file and try to use it. As described in Process Details, there can be numerous issues that complicate the use of a raw source file in spreadsheet analysis.
-
Obtain Code through git OR download
a.git clone https://github.com/emskiphoto/noaa_weather_hourly.git
at a terminal prompt (requires git installation git installation) a. Simple download…. -
Install pipx
Windows:
py -m pip install --user pipx
py -m pipx ensurepath
Unix/macOS:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
-
Install noaa_weather_hourly
pipx install noaa_weather_hourly
Open a terminal prompt ('Powershell' in Windows). Navigate to specific folders using the cd folder_name
command to go up the directory tree and cd..
to go back down the directory tree.
Process the version 1 LCD file ".\data\3876540.csv" that is included in installation.
$ noaa_weather_hourly -filename ".\data\3876540.csv"
Automatically select the newest files in the current directory based on last date modified and group all files with the same weather station ID in to a single output.
$ noaa_weather_hourly
The default output contains Hourly frequency data. Any of the following data frequencies can be output using the -frequency
argument:
$ noaa_weather_hourly [-frequency FREQUENCY] filename
For example, a 15-minute frequency output:
$ noaa_weather_hourly -frequency '15T' '<path_to_LCD_CSV_file>'
Or a daily frequency output using the most recent file(s):
$ noaa_weather_hourly -frequency 'D'
'H': 'Hourly', 'T': 'Minutely' 'D': 'Daily', 'W': 'Weekly', 'M': 'Monthly', 'Q': 'Quarterly', 'Y': 'Yearly',
The core frequency argument can be modified for other frequencies. For example, a 15-minute frequency dataset can be generated with '15T' and '3H' will generate a '3-Hourly' frequency file.
noaa_weather_hourly
takes a raw NOAA Local Climatological Data .csv-format file as input. Download file(s) for a specific location and date range from NOAA as described below. NOAA changed the download process & interface in 2024 to use AWS buckets for storage. As of December 2024 the new and old methods both work. No account or API key is required, just an email address to receive a download link.
It is recommended to store downloaded files in separate folders by location.
- Go to NOAA Data Tools: Local Climatological Data
- Find the desired Weather Station and select 'Add to Cart'
- Click on 'cart (Free items)'
- Select the Output Format 'LCD CSV'
- Select the Date Range. Consider adding an additional week before and after the needed date range to support interpolation of missing values.
- "Enter Email Address" where a link to the LCD CSV download should be delivered.
- "Submit Order"
- Check email inbox for a "Climate Data Online request 1234567 complete" message and Download the LCD CSV file to a local folder using the "Download" link. Do not change the name(s) of the LCD file(s).
- Go to Local Climatological Data (LCD), Version 2 (LCDv2)
- What ?: Select columns to be included in the file by clicking on 'Show List'.
- Beware that selecting columns that are not available for a given location will result in that location being excluded entirely from the search results.
- It's recommended to select only the columns listed in Output Columns
- Where ?: Input weather station location. A list of all available annual weather files for all matching locations will be displayed. Use the 'When' inputs to filter this list.
- When ?: (optional) For a single calendar year, select any date in that year. For multiple calendar years click 'Select Date Range' and input start and end dates of the range.
- (Recommended) Review list of matching files and click "Download" for each file.
- (Alternatively) Merge multiple years of data as a single large file by "+ Select" multiple files. Then select "Output Format" csv, click on "Configure and Add" and "Add Order to Cart". "Proceed to Cart", provide and Email address and click "Submit". Check email inbox for a "Climate Data Online request 1234567 complete" message and download the LCD CSV file to any local folder using the "Download" link. Do not change the name(s) of the LCD file(s).
The noaa_weather_hourly
makes the source LCD file ready-to-use by resolving the following data formatting and quality issues.
NOAA LCD files can contain more than 100 types of meteorological observations, but noaa_weather_hourly
processes only these output columns. This is a standalone process that does not access any external (internet) resources and operates only in the directory it is intiated in unless a -filename
in another directory is provided.
- Locates the most recent LCD v1 or v2 file in the current directory (or uses optional file specified in
-filename
) and creates a copy of the source file(s), leaving the source file(s) unmodified - Extracts ID data and gathers additional station details
- Determines if input files are LCD v1 or v2 and
- Merges multiple source files having the same station ID and resolves overlapping date ranges
- Formats 'Sunrise' and 'Sunset' times
- Removes recurring daily timestamps that contain more null values than allowed by 'pct_null_timestamp_max' parameter
- Displays the percent of null values in source data to screen
- Resamples and/or interpolates values per the input '-frequency' value
- most columns are expected to have numeric values for every timestamp. The maximum number of contiguous missing values to be interpolated is 24. The 'max_records_to_interpolate' default can be overriden in the command line, for example
noaa_weather_hourly -max_records_to_interpolate 12
would limit interpolations to no more than 12 missing values in a row - some columns are expected to have null values at some times and the null values are preserved in the output (ie., 'Precipitation', 'WindGustSpeed')
- most columns are expected to have numeric values for every timestamp. The maximum number of contiguous missing values to be interpolated is 24. The 'max_records_to_interpolate' default can be overriden in the command line, for example
- Saves a single .CSV file to the same location as the source LCD file(s) (will overwrite existing files if an identical file already exists).
- Output file is named "{STATION_NAME} {start_MM-DD-YYYY} to {end_MM-DD-YYYY} {frequency}.csv", (ie., "CHICAGO O'HARE INTERNATIONAL 2020-01-01 to 2023-12-31 H.csv")
noaa_weather_hourly
includes a processed 'isd-history.csv' file containing the details of ~11,600 active stations and ID cross-references (ICAO, FAA, WMO, WBAN) provided by Historical Observing Metadata Repository (HOMR). This data is only used to weather station location details as they are not provided in the LCD CSV file. The data source is updated regularly, but the version in this script is not. If updates are needed, consider running the 'ISD History Station Table.py' to update 'data/isd-history.csv'.
- NOAA LCD source is for atmospheric data primarily for locations in the United States of America.
- NOAA LCD data is not certified for use in litigation
noaa_weather_hourly
is not an APInoaa_weather_hourly
Python module does not (currently) integrate with other Python tools- Does not validate
- Does not visualize
- Processes only 'Hourly...' columns and 'Sunrise', 'Sunset' & 'DATE'
- Does not modify values from source
- Does not filter or smooth apparently noisy source data
- Does not process or convert categorical data like 'HourlySkyConditions'
- Intended for data frequencies between yearly and 15-minutely. Will accept frequencies as low as minutely ('T') but the output file size may be excessively large.
- Does not compare to other references
- Uses only units of measure used by LCD (no unit conversion)
- No Forecast
- diyepw - Amanda D. Smith, Benjamin Stürmer, Travis Thurber, & Chris R. Vernon. (2021). diyepw: A Python package for EnergyPlus Weather (EPW) files generation. Zenodo. https://doi.org/10.5281/zenodo.5258122
- pycli - Python command-line interface reference
- Degree Days.net - Excellent reference for weather-related energy engineering analysis
- http://weather.whiteboxtechnologies.com/hist
- https://openweathermap.org/history
- https://docs.synopticdata.com/services/weather-data-api
- https://mesowest.utah.edu/
- https://registry.opendata.aws/noaa-isd/
- https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database
- https://registry.opendata.aws/noaa-isd/
- https://github.com/celikfatih/noaa-weather
- https://github.com/cagledw/climate_analyzer
- https://pypi.org/project/diyepw/
- https://github.com/GClunies/noaa_coops
- https://github.com/DevinRShaw/simple_noaa
- https://github.com/awslabs/amazon-asdi/tree/main/examples/noaa-isd
- https://pypi.org/project/climate-analyzer
Weather-dependent models like building energy models generally use typical weather values to estimate a given metric for a system. For example, typical weather values would be used to estimate the annual energy consumption of a building cooling system. Inevitably, the actual weather the system experiences in the real world is different than the weather used to create the model. If the model is sensitive to weather the distinct typical & actual weather values will cause cumulative metrics to be different. If the amplitude of these differences is larger than the cumulative impact of individual system elements (ex. the cooling system), it may not be possible to compare modeled and actual performance.
The use cases below often require reporting & analysis that is easy to access, distribute, understand and review. Therefore, energy engineers often must use spreadsheet software like MS Excel, Google Sheets, etc. noaa_weather_hourly
was developed to support spreadsheet analysis where a single missing value can break the calculation.
Many building design choices must be made before a physical building exists, and once equipment is installed, changing design choices become costly. The installed performance will likely persist as-is for a decade or more until equipment needs to be replaced. The building owners and the environment will feel the impact of early-stage design choices every year until equipment is changed. Because the life-cycle cost of operating the equipment may be several times larger than the initial purchase price, it is cost-effective to develop energy models that optimize operational cost and inform system design choices.
Incentives and financing of high-efficiency solutions are ubiquitous. When large financial incentives are in play, many providers must measure and verify (M&V) the true impact of specific system elements (energy conservation measures) to ensure that incentives are returning the intended results.
Energy Service Companies (ESCO) and other efficiency solutions providers often guarantee specific performance improvements in contracts (ie., 10% reduction in annual electricity cost due to cooling system replacement). If the solution does not achieve the estimated savings, the ESCO is liable for the financial difference in operating cost and possibly more. Observed performance improvements are determined by comparing a baseline model to observed performance.
In both cases, the estimation of performance improvements is only valid if the impact of weather variance between the baseline model and observed reality is accounted for.
The solution to aligning the results of a model made with one set of weather values and actual results resulting from a different set of values is to weather-normalize the results. This normalization process is only possible if both the typical and actual weather values are available. Once the modeled and actual results are aligned (ie. the influence of weather variance is removed) the difference in expected and actual performance can be evaluated in detail.
Otaining good, usable data that is already available in the public domain is not necessarily easy or free of cost. noaa_weather_hourly
was created to facilitate convenient, free usage of limited volumes of hourly observed weather published by NOAA as a convenient .CSV file.
There are numerous subscription or purchase-based sources of historical weather, and many offer API access. These sources may be preferable when many locations are needed and/or the data need to be updated frequently.