Tigramite 5.1
jakobrunge
released this
23 Feb 18:16
·
371 commits
to master
since this release
Tigramite 5.1 contains many new features and improvements, hence the change in the main version number. The major new features are:
-
CausalEffects class that covers causal effect estimation from given graphical models
-
LPCMCI method for constraint-based causal discovery on stationary times series with latent confounders
-
New "multiple datasets"-mode that allows to learn one causal graph jointly from multiple datasets, see tutorials
-
A sliding window function for all causal discovery methods
-
Speedups of CMIknn and GPDC conditional independence tests
-
Parallelized script for PCMCIplus
-
A GUI for basic functionality is available at https://github.com/stbachinger/TigramiteGui.
Installation
- Option 1: Install via "pip install tigramite"
- Option 2: Pull new version from master branch, install via python setup.py install
- Take a look at the tutorials to see all functionality
- environment_py3.yml for conda only installs the necessary barebone packages. The optional packages are to be installed according to your needs. See setup.py for optional packages and the recommended version numbers.
Breaking Changes
- var_process() and structural_causal_process() are now in a new toymodels module
- For run_pcmci() and variants: p_matrix and val_matrix are now symmetrized for contemporaneous links and orientation information is encoded in new graph array (consistent with PCMCI+, LPCMCI).
- new parameter alpha_level in run_pcmci() and variants to enable returning 'graph' which is constructed from thresholding the p_matrix while taking into account selected_links
- removed returning of q_matrix, now fdr-correction will change p_matrix directly
- Removed link_matrix from plotting.py which is now replaced by graph
- Due to API changes, now Python>=3.7 and matplotlib>=3.4 is required
- The distance correlation function that comes from another package in GPDC was erroneous and was fixed with a different package (dcor). Hence, the GPDC test statistic values will be different while the p-values (via a shuffle test) should roughly stay the same.
- In CMIknn n_jobs was replaced by workers due to API changes within scipy.
- environment_py3.yml for conda only installs the necessary barebone packages. The optional packages are to be installed according to your needs. See setup.py for optional packages and the recommended version numbers.
New Features
- New class causal_effect.py which covers causal effect estimation and mediation given various graph types. The class allows to automatically select adjustment sets and estimate linear or nonlinear causal effects, including conditionals. Please see the tutorial for all functionality.
- New class LPCMCI for constraint-based causal discovery on stationary times series with latent confounders. NOTE: This method is still EXPERIMENTAL since the default settings for the hyperparameters are still being fine-tuned. We release it to invite feedback and will add a tutorial and unit tests soon. We advise to read the paper first in order to correctly interpret the resulting graphs.
- New "multiple datasets"-mode that allows multiple time series arrays as input to the DataFrame and then learn a single causal graph across all datasets. This can be useful for datasets collected at different locations or across different subjects. Furthermore, this may allow to overcome non-stationarity as illustrated in the respective tutorial
- New function "run_sliding_window_of" that allows to run all PCMCI functions on sliding windows and generate summary results
- Removed cython-dependency, which caused many installation problems. This affects the CMIknn and distance correlation functions, as well as the ordinal pattern generator. The methods are now run with numba with similar to better runtime performances.
- CMIknn's _get_nearest_neighbors() is now run with scipy's CKDTree entirely and numba and can use its parallelization through the workers parameter. Further, get_restricted_permutation() is also now run with numba. Both lead to significant speedups.
- Faster GPDC using GPytorch (CPU and GPU models). Can handle samples > 50k using a multi-GPU model (with LBFGS). According to our numerical tests, it should be orders of magnitude faster than GPDC using sklearn.
- Faster "ordinal_patt_array function"
- Symmetrization of p-val matrices based on selected_links.
- False Discovery Rate (q_matrix): get_corrected_pvalues now accounts for selected_links, tau_max and tau_min.
- Function return_significant_links() now also handles "graph" array as needed for PCMCI+ and LPCMCI.
- User configurable random seed (with a default) added to all conditional independence tests to get deterministic results.
- get_conditional_entropy - Returns the nearest-neighbor conditional entropy estimate of H(X|Y).
- GUI covering basic functionality is available on https://github.com/stbachinger/TigramiteGui
- Several more under the hood, take a look at the new tutorials to see all functionality.
Bug Fixes
- Adapted run_pcmci_parallel.py script which uses mpi4py
- Import warnings for optional packages.
- Symmetrizing p-val matrices when val matrix entries for links of lag 0 are same.
- The doc string for mask_type was wrong, the default is None and means that the mask will NOT be used. Also see the tutorial on masking and missing values.
- Small fixes in pcmci.py and plotting.py
Improvements
- Comprehensive test suite expanded
- Take a look at the new tutorials to see all functionality