Releases: holukas/diive
v0.82.1
v0.82.1 | 22 Sep 2024
Notebooks
- Added notebook showing an example for
LongTermGapFillingRandomForestTS
(
notebooks/GapFilling/LongTermRandomForestGapFilling.ipynb
) - Added notebook example for
MeasurementOffset
(notebooks/Corrections/MeasurementOffset.ipynb
)
Tests
- Added unittest for
LongTermGapFillingRandomForestTS
(
tests.test_gapfilling.TestGapFilling.test_gapfilling_longterm_randomforest
) - Added unittest for
WindDirOffset
(tests.test_corrections.TestCorrections.test_winddiroffset
) - Added unittest for
DaytimeNighttimeFlag
(tests.test_createvar.TestCreateVar.test_daytime_nighttime_flag
) - Added unittest for
calc_vpd_from_ta_rh
(tests.test_createvar.TestCreateVar.test_calc_vpd
) - Added unittest for
percentiles101
(tests.test_analyses.TestAnalyses.test_percentiles
) - Added unittest for
GapFinder
(tests.test_analyses.TestAnalyses.test_gapfinder
) - Added unittest for
SortingBinsMethod
(tests.test_analyses.TestAnalyses.test_sorting_bins_method
) - Added unittest for
daily_correlation
(tests.test_analyses.TestAnalyses.test_daily_correlation
) - Added unittest for
QuantileXYAggZ
(tests.test_analyses.TestCreateVar.test_quantilexyaggz
) - 49/49 unittests ran successfully
Bugfixes
- Fixed bug that caused results from long-term gap-filling to be inconsistent despite using a fixed random state. I
found the following: when reducing features across years, the removal of duplicate features from a list of found
features created a list where the order of elements changed each run. This in turn produced slightly different
gap-filling results each time the long-term gap-filling was executed. Used Python version where this issue occurred
was3.9.19
.- Here is a simplified example, where
input_list
is a list of elements with some duplicate elements: - Running
output_list = list(set(input_list))
generatesoutput_list
where the elements would have a different
output order each run. The elements were otherwise the same, only their order changed. - To keep the order of elements consistent it was necessary to
output_list.sort()
. - (
diive.pkgs.gapfilling.longterm.LongTermGapFillingBase.reduce_features_across_years
)
- Here is a simplified example, where
- Corrected wind direction could be 360°, but will now be 0° (
diive.pkgs.corrections.winddiroffset.WindDirOffset._correct_degrees
)
What's Changed
Full Changelog: v0.82.0...v0.82.1
v0.82.0
v0.82.0 | 19 Sep 2024
Long-term gap-filling
It is now possible to gap-fill multi-year datasets using the class LongTermGapFillingRandomForestTS
. In this approach,
data from neighboring years are pooled together before training the random forest model for gap-filling a specific year.
This is especially useful for long-term, multi-year datasets where environmental conditions and drivers might change
over years and decades.
Why random forest? Because it performed well and to me it looks like the first choice for gap-filling ecosystem fluxes,
at least at the moment.
Long-term gap-filling using random forest is now also built into the flux processing chain (Level-4.1). This allows to
quickly gap-fill the different USTAR scenarios and to create some useful plots (I
hope). See the flux processing chain notebook for how this looks like.
In a future update it will be possible to either directly switch to XGBoost
for gap-filling, or to use it (and other
machine-learning models) in combination with random forest in the flux processing chain.
Example
Here is an example for a dataset containing CO2 flux (NEE
) measurements from 2005 to 2023:
- for gap-filling the year 2005, the model is trained on data from 2005, 2006 and 2007 (2005 has no previous year)
- for gap-filling the year 2006, the model is trained on data from 2005, 2006 and 2007 (same model as for 2005)
- for gap-filling the year 2007, the model is trained on data from 2006, 2007 and 2008
- ...
- for gap-filling the year 2012, the model is trained on data from 2011, 2012 and 2013
- for gap-filling the year 2013, the model is trained on data from 2012, 2013 and 2014
- for gap-filling the year 2014, the model is trained on data from 2013, 2014 and 2015
- ...
- for gap-filling the year 2021, the model is trained on data from 2020, 2021 and 2022
- for gap-filling the year 2022, the model is trained on data from 2021, 2022 and 2023 (same model as for 2023)
- for gap-filling the year 2023, the model is trained on data from 2021, 2022 and 2023 (2023 has no next year)
New features
- Added new method for long-term (multiple years) gap-filling using random forest to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level41_gapfilling_longterm
) - Added new class for long-term (multiple years) gap-filling using random forest (
diive.pkgs.gapfilling.longterm.LongTermGapFillingRandomForestTS
) - Added class for plotting cumulative sums across all data, for multiple columns (
diive.core.plotting.cumulative.Cumulative
) - Added class to detect a constant offset between two measurements (
diive.pkgs.corrections.measurementoffset.MeasurementOffset
)
Changes
- Creating lagged variants creates gaps which then leads to incomplete features in machine learning models. Now, gaps
are filled using simple forward and backward filling, limited to the number of values defined in lag. For example,
if variable TA is lagged by -2 value this creates two missing values for this variant at the start of the time series,
which then are then gap-filled using the simple backwards fill withlimit=2
. (
diive.core.dfun.frames.lagged_variants
)
Notebooks
- Updated flux processing chain notebook to include long-term gap-filling using random forest (
notebooks/FluxProcessingChain/FluxProcessingChain.ipynb
) - Added new notebook for plotting cumulative sums across all data, for multiple columns (
notebooks/Plotting/Cumulative.ipynb
)
Tests
- Unittest for flux processing chain now includes many more methods (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain
) - 39/39 unittests ran successfully
Bugfixes
- Fixed deprecation warning in (
diive.core.ml.common.prediction_scores_regr
)
What's Changed
Full Changelog: v0.81.0...v0.82.0
v0.81.0
v0.81.0 | 11 Sep 2024
Expanding Flux Processing Capabilities
This update brings advancements for post-processing eddy covariance data in the context of the FluxProcessingChain
.
The goal is to offer a complete chain for post-processing ecosystem flux data, specifically designed to work seamlessly
with the standardized _fluxnet
output file from the
widely-used EddyPro software.
Now, diive offers the option for USTAR filtering based on known constant thresholds across the entire dataset (similar
to the CUT
scenarios in FLUXNET data). While seasonal (DJF, MAM, JJA, SON) thresholds are calculated internally,
applying them on a seasonal basis or using variable thresholds per year (like FLUXNET's VUT
scenarios) isn't yet
implemented.
With this update, the FluxProcessingChain
class can handle various data processing steps:
- Level-2: Quality flag expansion
- Level-3.1: Storage correction
- Level-3.2: Outlier removal
- Level-3.3: (new) USTAR filtering (with constant thresholds for now)
- (upcoming) Level-4.1: long-term gap-filling using random forest and XGBoost
- For info about the different flux levels
see Swiss FluxNet flux processing chain
New features
- Added class to apply multiple known constant USTAR (friction velocity) thresholds, creating flags that indicate time
periods characterized by low turbulence for multiple USTAR scenarios. The constant thresholds must be known
beforehand, e.g., from an earlier USTAR detection run, or from results from FLUXNET (
diive.pkgs.flux.ustarthreshold.FlagMultipleConstantUstarThresholds
) - Added class to apply one single known constant USTAR thresholds (
diive.pkgs.flux.ustarthreshold.FlagSingleConstantUstarThreshold
) - Added
FlagMultipleConstantUstarThresholds
to the flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level33_constant_ustar
) - Added USTAR detection algorithm based on Papale et al., 2006 (
diive.pkgs.flux.ustarthreshold.UstarDetectionMPT
) - Added function to analyze high-quality ecosystem fluxes that helps in understanding the range of highest-quality data(
diive.pkgs.flux.hqflux.analyze_highest_quality_flux
)
Additions
LocalSD
outlier detection can now use a constant SD:- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
upper and lower limits that define outliers in the median rolling window (
diive.pkgs.outlierdetection.localsd.LocalSD
) - Added to step-wise outlier detection (
diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.flag_outliers_localsd_test
) - Added to meteoscreening from database (
diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test
) - Added to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level32_flag_outliers_localsd_test
)
- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
Changes
- Replaced
.plot_date()
from the Matplotlib library with.plot()
due to deprecation
Notebooks
- Added notebook for plotting cumulative sums per year (
notebooks/Plotting/CumulativesPerYear.ipynb
) - Added notebook for removing outliers based on the z-score in rolling time window (
notebooks/OutlierDetection/zScoreRolling.ipynb
)
Bugfixes
- Fixed bug when saving a pandas Series to parquet (
diive.core.io.files.save_parquet
) - Fixed bug when plotting
doy_mean_cumulative
: no longer crashes when years defined in parameter
excl_years_from_reference
are not in dataset (diive.core.times.times.doy_mean_cumulative
) - Fixed deprecation warning when plotting in
bokeh
(interactive plots)
Tests
- Added unittest for
LocalSD
using constant SD (
tests.test_outlierdetection.TestOutlierDetection.test_localsd_with_constantsd
) - Added unittest for rolling z-score outlier removal (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_rolling
) - Improved check if figure and axis were created in (
tests.test_plots.TestPlots.test_histogram
) - 39/39 unittests ran successfully
Environment
- Added new package
scikit-optimize
- Added new package
category_encoders
What's Changed
Full Changelog: v0.80.0...v0.81.0
v0.80.0
v0.80.0 | 28 Aug 2024
Additions
- Added outlier tests to step-wise meteoscreening from database:
Hampel
,HampelDaytimeNighttime
andTrimLow
(
diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
) - Added parameter to control whether or not to output the middle timestamp when loading parquet files with
load_parquet()
. By default,output_middle_timestamp=True
. (diive.core.io.files.load_parquet
)
Environment
- Re-created environment and created new
lock
file - Currently using Python 3.9.19
Notebooks
- Added new notebook for creating a flag that indicates missing values (
notebooks/OutlierDetection/MissingValues.ipynb
) - Updated notebook for meteoscreening from database (
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb
) - Updated notebook for loading and saving parquet files (
notebooks/Formats/LoadSaveParquetFile.ipynb
)
Tests
- Added unittest for flagging missing values (
tests.test_outlierdetection.TestOutlierDetection.test_missing_values
) - 37/37 unittests ran successfully
Bugfixes
- Fixed links in README, needed absolute links to notebooks
- Fixed issue with return list in (
diive.pkgs.analyses.histogram.Histogram.peakbins
)
What's Changed
Full Changelog: v0.79.1...v0.80.0
v0.79.1
v0.79.1 | 26 Aug 2024
Additions
- Added new function to apply quality flags to certain time periods only (
diive.pkgs.qaqc.flags.restrict_application
) - Added to option to restrict the application of the angle-of-attack flag to certain time periods (
diive.pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsEddyPro.angle_of_attack_test
)
Changes
- Test options in
FluxProcessingChain
are now always passed as dict. This has the advantage that in addition to run
the test by setting the dict keyapply
toTrue
, various other test settings can be passed, for example the new
parameterapplication dates
for the angle-of-attack flag. (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
)
Tests
- Added unittest for Flux Processing Chain up to Level-2 (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain_level2
) - 36/36 unittests ran successfully
What's Changed
Full Changelog: v0.79.0...v0.79.1
v0.79.0
v0.79.0 | 22 Aug 2024
This version introduces a histogram plot that has the option to display z-score as vertical lines superimposed on the
distribution, which helps in assessing z-score settings used by some outlier removal functions.
Histogram plot of half-hourly air temperature measurements at the ICOS Class 1 ecosystem
station Davos between 2013 and 2022, displayed in
20 equally-spaced bins. The dashed vertical lines show the z-score and the corresponding value calculated based on the
time series. The bin with most counts is highlighted orange.
New features
- Added new class
HistogramPlot
for plotting histograms, based on the Matplotlib
implementation (diive.core.plotting.histogram.HistogramPlot
) - Added function to calculate the value for a specific z-score, e.g., based on a time series it calculates the value
where z-score =3
etc. (diive.core.funcs.funcs.val_from_zscore
)
Additions
- Added histogram plots to
FlagBase
, histograms are now shown for all outlier methods (diive.core.base.flagbase.FlagBase.defaultplot
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.zscore.zScoreDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.lof.LocalOutlierFactorDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.absolutelimits.AbsoluteLimitsDaytimeNighttime
) - Added option to calculate the z-score with sign instead of absolute (
diive.core.funcs.funcs.zscore
)
Changes
- Improved daytime/nighttime outlier plot used by various outlier removal classes (
diive.core.base.flagbase.FlagBase.plot_outlier_daytime_nighttime
)
Notebooks
- Added notebook for plotting histograms (
notebooks/Plotting/Histogram.ipynb
) - Added notebook for manual removal of data points (
notebooks/OutlierDetection/ManualRemoval.ipynb
) - Added notebook for outlier detection using local outlier factor, separately during daytime and nighttime (
notebooks/OutlierDetection/LocalOutlierFactorDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/HampelDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb
)
Tests
- Added unittest for plotting histograms (
tests.test_plots.TestPlots.test_histogram
) - Added unittest for calculating histograms (without plotting) (
tests.test_analyses.TestCreateVar.test_histogram
)
What's Changed
Full Changelog: v0.78.1.1...v0.79.0
v0.78.1.1
v0.78.1
v0.78.1 | 19 Aug 2024
Changes
- Added option to set different
n_sigma
for daytime and nightime data
inHampelDaytimeNighttime
(diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Updated
flag_outliers_hampel_dtnt_test
in step-wise outlier detection - Updated
level32_flag_outliers_hampel_dtnt_test
in flux processing chain
Notebooks
- Updated notebook
HampelDaytimeNighttime
- Updated notebook
FluxProcessingChain
Tests
- Updated unittest
test_hampel_filter_daytime_nighttime
What's Changed
Full Changelog: v0.78.0...v0.78.1
v0.78.0
v0.78.0 | 18 Aug 2024
New features
- Added new class for outlier removal, based on the rolling z-score. It can also be used in step-wise outlier detection
and during meteoscreening from the
database. (diive.pkgs.outlierdetection.zscore.zScoreRolling
,diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection
,diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
). - Added Hampel filter for outlier removal (
diive.pkgs.outlierdetection.hampel.Hampel
) - Added Hampel filter (separate daytime, nighttime) for outlier
removal (diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added function to plot daytime and nighttime outliers during outlier
tests (diive.core.plotting.outlier_dtnt.outlier_daytime_nighttime
)
Changes
- Flux processing chain:
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
EddyPro. The classFluxProcessingChain
can now handle files that have a different format than the two EddyPro
output filesEDDYPRO-FLUXNET-CSV-30MIN
andEDDYPRO-FULL-OUTPUT-CSV-30MIN
. See following notes. - Removed option to process EddyPro
_full_output_
files, since it as an older format and its variables do not
follow FLUXNET conventions. - Removed keyword
filetype
in classFluxProcessingChain
. It is now assumed that the variable names follow the
FLUXNET convention. Variables used in FLUXNET are
listed here (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
) - When detecting the base variable from which a flux variable was calculated, the variables defined for
filetypeEDDYPRO-FLUXNET-CSV-30MIN
are now assumed by default. (diive.pkgs.flux.common.detect_basevar
) - Renamed function that detects the base variable that was used to calculate the respective
flux (diive.pkgs.flux.common.detect_fluxbasevar
) - Renamed
gas
in functions related to completeness tests tofluxbasevar
to better reflect that the completeness
test does not necessarily require a gas (e.g.T_SONIC
is used to calculate the completeness for sensible heat
flux) (flag_fluxbasevar_completeness_eddypro_test
)
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
- Removing the radiation offset now uses
0.001
(W m-2) instead of50
as the threshold value to flag nighttime values
for the correction (diive.pkgs.corrections.offsetcorrection.remove_radiation_zero_offset
) - The database tag for meteo data screened with
diive
is
nowmeteoscreening_diive
(diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample
) - During noise generation, function now uses the absolute values of the min/max of a series to calculate minimum noise
and maximum noise (diive.pkgs.createvar.noise.add_impulse_noise
)
Notebooks
- Added new notebook for outlier detection using class
zScore
(notebooks/OutlierDetection/zScore.ipynb
) - Added new notebook for outlier detection using
classzScoreDaytimeNighttime
(notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Added new notebook for outlier removal using trimming (
notebooks/OutlierDetection/TrimLow.ipynb
) - Updated notebook (
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase_v7.0.ipynb
) - When uploading screened meteo data to the database using the notebook
StepwiseMeteoScreeningFromDatabase
, variables
with the same name, measurement and data version as the screened variable(s) are now deleted from the database before
the new data are uploaded. Implemented in the Python packagedbc-influxdb
to avoid duplicates in the database. Such
duplicates can occur when one of the tags of an otherwise identical variable changed, e.g., when one of the tags of
the originally uploaded data was wrong and needed correction. The databaseInfluxDB
stores a new time series
alongside the previous time series when one of the tags is different in an otherwise identical time series.
Tests
- Added test case for
Hampel
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter
) - Added test case for
HampelDaytimeNighttime
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter_daytime_nighttime
) - Added test case for
zScore
(tests.test_outlierdetection.TestOutlierDetection.test_zscore
) - Added test case for
TrimLow
(tests.test_outlierdetection.TestOutlierDetection.test_trim_low_nt
) - Added test case
forzScoreDaytimeNighttime
(tests.test_outlierdetection.TestOutlierDetection.test_zscore_daytime_nighttime
) - 33/33 unittests ran successfully
Environment
- Added package sktime, a unified framework for machine learning with
time series.
What's Changed
Full Changelog: v0.77.0...v0.78.0
v0.77.0
v0.77.0 | 11 Jun 2024
Additions
- Plotting cumulatives with
CumulativeYear
now also shows the cumulative for the reference, i.e. for the mean over the
reference years (diive.core.plotting.cumulative.CumulativeYear
) - Plotting
DielCycle
now acceptsylim
parameter (diive.core.plotting.dielcycle.DielCycle
) - Added long-term dataset for local testing purposes (internal
only) (diive.configs.exampledata.load_exampledata_parquet_long
) - Added several classes in preparation for long-term gap-filling for a future update
Changes
- Several updates and changes to the base class for regressor decision
trees (diive.core.ml.common.MlRegressorGapFillingBase
):- The data are now split into training set and test set at the very start of regressor setup. This test set is used
to evaluate models on unseen data. The default split is 80% training and 20% test data. - Plotting (scores, importances etc.) is now generally separated from the method where they are calculated.
- the same
random_state
is now used for all processing steps - refactored code
- beautified console output
- The data are now split into training set and test set at the very start of regressor setup. This test set is used
- When correcting for relative humidity values above 100%, the maximum of the corrected time series is now set to 100,
after the (daily) offset was removed (diive.pkgs.corrections.offsetcorrection.remove_relativehumidity_offset
) - During feature reduction in machine learning regressors, features with permutation importance < 0 are now always
removed (diive.core.ml.common.MlRegressorGapFillingBase._remove_rejected_features
) - Changed default parameters for quick random forest gap-filling (
diive.pkgs.gapfilling.randomforest_ts.QuickFillRFTS
) - I tried to improve the console output (clarity) for several functions and methods
Environment
- Added package dtreeviz to visualize decision trees
Notebooks
- Updated notebook (
notebooks/GapFilling/RandomForestGapFilling.ipynb
) - Updated notebook (
notebooks/GapFilling/LinearInterpolation.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb
) - Updated notebook (
notebooks/GapFilling/RandomForestParamOptimization.ipynb
) - Updated notebook (
notebooks/GapFilling/QuickRandomForestGapFilling.ipynb
)
Tests
- Updated and fixed test case (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments
) - Updated and fixed test case (
tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
What's Changed
Full Changelog: v0.76.2...v0.77.0