CAELUS: Classification Algorithm for the Evaluation of the cLoUdiness Situations

A Python implementation of the CAELUS sky classification algorithm, described in:

Ruiz-Arias, J. A., and Gueymard, C. A. (2023) CAELUS: classification of sky conditions from 1-min time series of global solar irradiance using variability indices and dynamic thresholds. Solar Energy, 263, 111895 doi: 10.1016/j.solener.2023.111895 (open access)

CAELUS classifies the sky conditions in up to 6 different classes:

OVERCAST
THICK CLOUDS
SCATTER CLOUDS
THIN CLOUDS
CLOUDLESS
CLOUD ENHANCEMENT

using for that 1-min time series of: solar position, global horizontal irradiance and global horizontal irradiances under hypothetical cloudless and cloudless-and-clean-and-dry atmospheres. It works for solar zenith angles up to 85$^{\circ}$.

The package also provides easy access to the data set that was used to develop, validate and benchmark the algorithm.

Important

The name of the different sky conditions is only orientative of the expected situations within each class. However, it does not mean that, for instance, all situations detected as THICK_CLOUDS are actually made up only by thick clouds. Among other reasons, because what can be considered a "thick" cloud is highly subjective, and also because there are many situations made up by those "thick" clouds, but also others. The same reasoning holds for all other sky conditions.

Installation

python3 -m pip install git+https://github.com/jararias/caelus

Or to install in a virtual environment:

python3 -m venv venv-caelus
source venv-caelus/bin/activate
python3 -m pip install git+https://github.com/jararias/caelus

To get out the virtual environment just type deactivate. To enter again, go to the directory where the virtual environment was created and do source venv-caelus/bin/activate.

Classifying data

The classification process is simple. It can be done directly from a csv file with the appropriate data using the script caelus that is installed with the classification library. Type the following in your terminal to get usage information (be sure that you are in the virtual environment, that is, after doing source venv-caelus/bin/activate):

caelus --help

The sky classification can be used also within a python script:

import caelus
sky_type = caelus.classify(data)

where data is a Pandas DataFrame with the following 1-min time-series variables (columns):

longitude: the site's longitude, in degrees
sza: the solar zenith angle, in degrees
eth: the extraterrestrial solar irradiance, in W/m$^2$
ghi: the global horizontal irradiance, in W/m$^2$
ghics: the clear-sky global horizontal irradiacne, in W/m$^2$
ghicda: the cloudless-and-clean-and-dry-sky global horizontal irradiance, in W/m$^2$

The dataframe's index must be a Pandas DatetimeIndex in coordinated universal time (UTC).

Important

It is important to keep data gaps to a minimum as the sky-type classification algorithm relies heavily on variability indicators that are computed as a centered moving window. Data gaps prevent a proper evaluation of such indicators and the classification performance can be deteriorated.

Note

While an internal approach to evaluate ghicda is devised, it will have to be provided externally. This can be done using a clear-sky model with null aerosols and water vapor. It can be the same clear-sky model used to evaluate ghics (e.g., SPARTA).

Note

Logging in caelus is managed with loguru. If you want to show logging messages, just do:

from loguru import logger
logger.enable('caelus')

The output is an integer Pandas Series with the same index as the input. In particular, caelus identifies the different sky classes with labels from 2 thru 7 (1 is reserved for UNKNOWN situations; e.g., sza > 85deg). The correspondence between the integer labels and the actual sky conditions are mapped in the SkyType enumerate type, as you could see by running the following code snippet:

for n in range(1, 8):
    print(n, caelus.skytype.SkyType(n))

Load data

In order to evaluate the algorithm, caelus can also access the individual site-and-year data files used to develop it, and that are available in zenodo.org. For instance, to load the data taken during 2014 in the BSRN station in Carpentras, France, one can do the following:

import caelus

# `car` is the BSRN's acronymn for the Carpentras station.
# You can explore all the available sites and years, directly
# in the data repository, or in the paper
data = caelus.data.load('car', year=2014)

In this case, data is a DataFrame with the exactly the same variables enumerated above plus sky_type, which is the classification performed by caelus for each data instance. For instance, the first 5 data rows with sza < 85deg and no NaN's are:

times_utc	longitude	sza	eth	ghi	dif	ghics	ghicda	sky_type
2014-01-01 08:00:30	5.059	84.2043	142.14	94	54	62.37	100.96	4
2014-01-01 08:01:30	5.059	84.0672	145.49	97	54	64.36	103.65	4
2014-01-01 08:02:30	5.059	83.9305	148.83	99	55	66.37	106.33	4
2014-01-01 08:03:30	5.059	83.7942	152.16	102	56	68.38	109.02	4
2014-01-01 08:04:30	5.059	83.6583	155.48	104	56	70.41	111.71	4

Note

Although dif is in the dataframe, it is not used by caelus.

When data for a site and year is accessed for the first time, it will take a while because it is first downloaded to a local database. The local database is, by default, <HOME>/CAELUS-DATA, where <HOME> is the user's directory. However, the user may choose a different location by setting the environment variable CAELUS_DATA_DIR to the desired location. Once downloaded, the data will be available in the file <site_name>/<site_name>_bsrn_<year>.zip (e.g., car/car_bsrn_2014.zip) relative to the local database. Subsequent data requests that involve this file will be faster because the file is already downloaded.

Comparing results

One would expect that the sky_type column included in the data DataFrame is identical to the sky_type Series just obtained with caelus.classify. However, there are few points with slightly different sky types, that mostly occur at sunrise and sunset, as you would see by running:

print(data.loc[data.sky_type != sky_type])

that yields:

times_utc	longitude	sza	eth	ghi	dif	ghics	ghicda	sky_type
2014-05-27 04:40:30	5.059	84.8696	118.56	28	28	46.31	82.53	4
2014-05-27 04:41:30	5.059	84.7065	122.32	28	28	48.37	85.48	4
2014-05-27 04:42:30	5.059	84.5431	126.08	28	28	50.45	88.44	4
2014-05-27 04:43:30	5.059	84.3796	129.85	28	27	52.57	91.42	4
2014-05-27 04:44:30	5.059	84.2158	133.62	29	28	54.71	94.42	4
2014-05-27 04:45:30	5.059	84.0518	137.39	35	29	56.88	97.44	4
2014-05-27 04:46:30	5.059	83.8875	141.17	50	30	59.08	100.47	4
2014-05-27 04:47:30	5.059	83.7231	144.96	69	31	61.3	103.52	4
2014-05-27 04:48:30	5.059	83.5585	148.74	81	31	63.55	106.59	4
2014-05-29 04:47:30	5.059	83.5347	149.19	83	36	69.17	106.98	4
2014-12-18 08:01:30	5.059	83.4353	160.7	31	26	77.49	116.11	2

The reason of this mismatch is that the precision of the data was slightly decreased to reduce the volume of data in the repository, but this was done after the sky_type column in data was calculated. The precision decrease, however, still keeps reasonable precision level. For instance, GHI and DIF are archived with two significant digits. Nonetheless, it was sufficient to induce the very few discrepancies shown above, even though the two of sky_type versions were obtained with this same code. However, note that, in reality, the mismatch is negligible because it affects only to 11 time steps out of the 229,783 time steps that have sza < 85deg and no NaN's (i.e., only 0.005%).

Diagnostic plots

caelus also provides basic functions to make some diagnostic plots:

caelus.diagnostics.histogram(sky_type)
caelus.diagnostics.pie_chart(sky_type)
caelus.diagnostics.density_ktk(data, sky_type)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.ipython		.ipython
assets		assets
src/caelus		src/caelus
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hatch.toml		hatch.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAELUS: Classification Algorithm for the Evaluation of the cLoUdiness Situations

Installation

Classifying data

Load data

Comparing results

Diagnostic plots

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

jararias/caelus

Folders and files

Latest commit

History

Repository files navigation

CAELUS: Classification Algorithm for the Evaluation of the cLoUdiness Situations

Installation

Classifying data

Load data

Comparing results

Diagnostic plots

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages