Releases: holukas/diive
v0.74.1
v0.74.1 | 23 Apr 2024
This update adds the first notebooks (and tests) for outlier detection methods. Only two tests are included so far and
both tests are relatively simple, but both notebooks already show in principle how outlier removal is handled. An
important aspect is that diive
single outlier methods do not remove outliers by default, but instead a flag is created
that shows where the outliers are located. The flag can then be used to remove the data points.
This update also includes the addition of a small function that creates artificial spikes in time series data and is
therefore very useful for testing outlier detection methods.
More outlier removal notebooks will be added in the future, including a notebook that shows how to combine results from
multiple outlier tests into one single overall outlier flag.
New features
- Added: new function to add impulse noise to time series (
diive.pkgs.createvar.noise.impulse
)
Notebooks
- Added: new notebook for outlier detection: absolute limits, separately for daytime and nighttime
data (notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb
) - Added: new notebook for outlier detection: absolute limits (
notebooks/OutlierDetection/AbsoluteLimits.ipynb
)
Tests
- Added: test case for outlier detection: absolute limits, separately for daytime and
nighttime data (tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits
) - Added: test case for outlier detection: absolute
limits (tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits
)
What's Changed
- Outlier notebooks by @holukas in #95
- Update README.md by @inkenbrandt in #86
- Update pyproject.toml by @inkenbrandt in #85
Full Changelog: v0.74.0...v0.74.1
v0.74.0
v0.74.0 | 21 Apr 2024
Additions
- Added: new function to remove rows that do not have timestamp
info (NaT
) (diive.core.times.times.remove_rows_nat
anddiive.core.times.times.TimestampSanitizer
) - Added: new settings
VARNAMES_ROW
andVARUNITS_ROW
in filetypes YAML files, allows better and more specific
configuration when reading data files (diive/configs/filetypes
) - Added: many (small) example data files for various filetypes, e.g.
ETH-RECORD-TOA5-CSVGZ-20HZ
- Added: new optional check in
TimestampSanitizer
that compares the detected time resolution of a time series with
the nominal (expected) time resolution. Runs automatically when reading files withReadFileType
, in which case
theFREQUENCY
from the filetype configs is used as the nominal time
resolution. (diive.core.times.times.TimestampSanitizer
,diive.core.io.filereader.ReadFileType
) - Added: application of
TimestampSanitizer
after inserting a timestamp and setting it as index with
functioninsert_timestamp
, this makes sure the freq/freqstr info is available for the new timestamp
index (diive.core.times.times.insert_timestamp
)
Notebooks
- General: Ran all notebook examples to make sure they work with this version of
diive
- Added: new notebook for reading EddyPro fluxnet output file with
DataFileReader
parameters (notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_DataFileReader.ipynb
) - Added: new notebook for reading EddyPro fluxnet output file with
ReadFileType
and pre-defined
filetypeEDDYPRO-FLUXNET-CSV-30MIN
(notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_ReadFileType.ipynb
) - Added: new notebook for reading multiple EddyPro fluxnet output files with
MultiDataFileReader
and pre-defined
filetypeEDDYPRO-FLUXNET-CSV-30MIN
(notebooks/ReadFiles/Read_multiple_EddyPro_fluxnet_output_files_with_MultiDataFileReader.ipynb
)
Changes
- Renamed: function
get_len_header
toparse_header
(diive.core.dfun.frames.parse_header
) - Renamed: exampledata files (
diive/configs/exampledata
) - Renamed: filetypes YAML files to always include the file extension in the file name (
diive/configs/filetypes
) - Reduced: file size for most example data files
Tests
- Added: various test cases for loading filetypes (
tests/test_loaddata.py
) - Added: test case for loading and merging multiple
files (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_multiple_EDDYPRO_FLUXNET_CSV_30MIN
) - Added: test case for reading EddyPro fluxnet output file with
DataFileReader
parameters (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_EDDYPRO_FLUXNET_CSV_30MIN_datafilereader_parameters
) - Added: test case for resampling series to 30MIN time
resolution (tests.test_time.TestTime.test_resampling_to_30MIN
) - Added: test case for inserting timestamp with a different convention (middle, start,
end) (tests.test_time.TestTime.test_insert_timestamp
) - Added: test case for inserting timestamp as index (
tests.test_time.TestTime.test_insert_timestamp_as_index
)
Bugfixes
- Fixed: bug in class
DetectFrequency
when inferred frequency isNone
(diive.core.times.times.DetectFrequency
) - Fixed: bug in class
DetectFrequency
wherepd.Timedelta()
would crash if the input frequency does not have a
number.Timedelta
does not accept e.g. the frequency stringmin
for minutely time resolution, even though
e.g.pd.infer_freq()
outputsmin
for data in 1-minute time resolution.TimeDelta
requires a number, in this
case1min
. Results frominfer_freq()
are now checked if they contain a number and if not,1
is added at the
beginning of the frequency string. (diive.core.times.times.DetectFrequency
) - Fixed: bug in notebook
WindDirectionOffset
, related to frequency detection during heatmap plotting - Fixed: bug in
TimestampSanitizer
where the script would crash if the timestamp contained an element that could
not be converted to datetime, e.g., when there is a string mixed in with the regular timestamps. Data rows with
invalid timestamps are now parsed asNaT
by usingerrors='coerce'
inpd.to_datetime(data.index, errors='coerce')
. (diive.core.times.times.convert_timestamp_to_datetime
anddiive.core.times.times.TimestampSanitizer
) - Fixed: bug when plotting heatmap (
diive.core.plotting.heatmap_datetime.HeatmapDateTime
)
What's Changed
- Update read csv and notebooks by @holukas in #93
- Added new and updated test cases by @holukas in #94
Full Changelog: v0.73.0...v0.74.0
v0.73.0
v0.73.0 | 17 Apr 2024
New features
- Added new function
trim_frame
that allows to trim the start and end of a dataframe based on available records of a
variable (diive.core.dfun.frames.trim_frame
) - Added new option to export borderless
heatmaps (diive.core.plotting.heatmap_base.HeatmapBase.export_borderless_heatmap
)
Additions
- Added more info in comments of class
WindRotation2D
(diive.pkgs.echires.windrotation.WindRotation2D
) - Added example data for EddyPro full_output
files (diive.configs.exampledata.load_exampledata_eddypro_full_output_CSV_30MIN
) - Added code in an attempt to harmonize frequency detection from data: in class
DetectFrequency
the detected
frequency strings are now converted fromTimedelta
(pandas) tooffset
(pandas) to.freqstr
. This will yield
the frequency string as seen by (the current version of) pandas. The idea is to harmonize between different
representations e.g.T
ormin
for minutes. Currently it seems that pandas is not consistent with e.g. the
represenation of minutes, usingT
in.infer_freq()
butmin
forTimedelta
(
see here). (diive.core.times.times.DetectFrequency
)
Changes
- Updated class
DataFileReader
to comply with newpandas
kwargs when
using.read_csv()
(diive.core.io.filereader.DataFileReader._parse_file
) - Environment: updated
pandas
to v2.2.2 andpyarrow
to v15.0.2 - Updated date offsets in config filetypes to be compliant with
pandas
version 2.2+ (
see here and here), e.g.,30T
was changed
to30min
. This seems to work without raising a warning, however, if frequency is inferred from available data,
the resulting frequency string shows e.g.30T
, i.e. still showingT
for minutes instead
ofmin
. (diive/configs/filetypes
) - Changed variable names in
WindRotation2D
to be in line with the variable names given in the paper by Wilczak et
al. (2001) https://doi.org/10.1023/A:1018966204465
Removals
- Removed function
timedelta_to_string
because this can be done with pandasto_offset().freqstr
- Removed function
generate_freq_str
(unused)
Tests
- Added test case for reading EddyPro full_output
files (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_eddypro_full_output_CSV_30MIN
) - Updated test for frequency detection (
tests.test_timestamps.TestTime.test_detect_freq
)
What's Changed
Full Changelog: v0.72.1...v0.73.0
v0.72.1
v0.72.1 | 26 Mar 2024
pyproject.toml
now uses the inequality syntax>=
instead of caret syntax^
because the version capping is
restrictive and prevents compatibility in conda installations. See #74- Added badges in
README.md
- Smaller
diive
logo inREADME.md
What's Changed
- Update pyproject.toml by @inkenbrandt in #74
- Minor updates by @holukas in #77
Full Changelog: v0.72.0...v0.72.1
v0.72.0
v0.72.0 | 25 Mar 2024
New feature
- Added new heatmap plotting class
HeatmapYearMonth
that allows to plot a variable in year/month
classes(diive.core.plotting.heatmap_datetime.HeatmapYearMonth
)
Changes
- Refactored code for class
HeatmapDateTime
(diive.core.plotting.heatmap_datetime.HeatmapDateTime
) - Added new base class
HeatmapBase
for heatmap plots. Currently used byHeatmapYearMonth
andHeatmapDateTime
(diive.core.plotting.heatmap_base.HeatmapBase
)
Notebooks
- Added new notebook for
HeatmapDateTime
(notebooks/Plotting/HeatmapDateTime.ipynb
) - Added new notebook for
HeatmapYearMonth
(notebooks/Plotting/HeatmapYearMonth.ipynb
)
Bugfixes
- Fixed bug in
HeatmapDateTime
where the last record of each day was not shown
What's Changed
Full Changelog: v0.71.6...v0.72.0
v0.71.6
v0.71.6 | 23 Mar 2024
Notebooks
- Added new notebook for
Percentiles
(notebooks/Analyses/Percentiles.ipynb
) - Added new notebook for
LinearInterpolation
(notebooks/GapFilling/LinearInterpolation.ipynb
) - Added new notebook for calculating z-aggregates in quantiles (classes) of x and
y (notebooks/Analyses/CalculateZaggregatesInQuantileClassesOfXY.ipynb
) - Updated notebook for
DaytimeNighttimeFlag
(notebooks/CalculateVariable/DaytimeNighttimeFlag.ipynb
)
What's Changed
Full Changelog: v0.71.5...v0.71.6
v0.71.5
v0.71.5 | 22 Mar 2024
Changes
- Updated notebook for
SortingBinsMethod
(diive.pkgs.analyses.decoupling.SortingBinsMethod
)
Plot showing vapor pressure deficit (y) in 10 classes of short-wave incoming radiation (x), separate for 5 classes of
air temperature (z). All values shown are medians of the respective variable. The shaded errorbars refer to the
interquartile range for the respective class. Plot was generated using the class SortingBinsMethod
.
v0.71.4
v0.71.4 | 20 Mar 2024
Changes
- Refactored class
LongtermAnomaliesYear
(diive.core.plotting.bar.LongtermAnomaliesYear
)
Notebooks
- Added new notebook for
LongtermAnomaliesYear
(notebooks/Plotting/LongTermAnomalies.ipynb
)
What's Changed
Full Changelog: v0.71.3...v0.71.4
v0.71.3
v0.71.3 | 19 Mar 2024
Changes
- Refactored class
SortingBinsMethod
: Allows to investigate binned aggregates of a variable z in binned classes of x
and y (see plot below). All bins now show medians and interquartile
ranges. (diive.pkgs.analyses.decoupling.SortingBinsMethod
)
Notebooks
- Added new notebook for
SortingBinsMethod
Bugfixes
- Added absolute links to example notebooks in
README.md
Other
- From now on,
diive
is officially published on pypi
What's Changed
Full Changelog: v0.71.2...v0.71.3
v0.71.2
v0.71.2 | 18 Mar 2024
Notebooks
- Added new notebook for
daily_correlation
function (notebooks/Analyses/DailyCorrelation.ipynb
) - Added new notebook for
Histogram
class (notebooks/Analyses/Histogram.ipynb
)
Bugfixes & changes
- Daily correlations are now returned with daily (
1d
) timestamp
index (diive.pkgs.analyses.correlation.daily_correlation
) - Updated README
- Environment: Added ruff to dev dependencies for linting
What's Changed
Full Changelog: v0.71.1...v0.71.2