Equations to define causal graph including causal mechanism #1066

bhatt-priyadutt · 2023-11-07T19:03:12Z

This feature will enable user to write custom equations for each and node and get a causal model back with causal mechanisms assigned.
The Usage is mainly targeted for when the function/relationship model between nodes is known and allows user to specify it in a equation form as demonstrated below -

X = empirical()
Y = 12*exp(X) + halfnorm()
Z = 3*Y + empirical()

List of Supported functions for specifying parent-child relationships - here
List of Supported functions for specifying noise models here

bhatt-priyadutt · 2023-11-07T19:04:07Z

some tasks like writing test cases, packaging new libraries used, handling for disconnected node is still remaining...

* fixed frontdoor bug Signed-off-by: Amit Sharma <amit_sharma@live.com> * fixed formatting issues Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

This should work better with multivariate data and mixed data types. However, it is generally slower than the knn appraoch. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Before, when creating a linear regressor with fixed parameters, these parameters are overridden when fit to data. Now, the parameters remain fixed. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

This change aims at providing a better overview of the notebooks by displaying them as separate cards instead of a card carousel. Other changes: - Introductory examples and Real world-inspired examples are now more prominent with individual images and a grid layout by 2 per-row. - All other examples are now in a grid layout with 3 examples per row. - Clear outputs of some notebooks. - Fix issue with rendering counterfactual example notebook. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Also slightly change citation hint. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

The build sometimes randomly fails due to a timeout issue in the unit tests of the unit change methods of the GCM module. While this only happens in the github builds, this is most likely due to the prallelization of the underlying RandomForestRegressors being fitted. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

This module adds a new method for evaluating a fitted gcm. Here, we evaluate the performance of causal mechanisms, the underlying modeling assumptions (if possible), the goodness of the generated joint distribution and the graph structure. This utilizes some of the existing methods, but also introduces new ones. This further adds a new user guide and notebook entries demonstrating the usage. Part of introducing the module required to make some changes in other modules and implementatins, which are mostly fixes and improvements. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

These methods are now available in the feature_relevance.py module. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Before, the method threw an error when all samples were equal. However, in these cases, it should rather return a KL divergence of 0. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

… added a test case to start

NaN values are now correctly counted when estimating the anomaly score. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

This is an updated and slightly modified version of the blog post: https://aws.amazon.com/blogs/opensource/root-cause-analysis-with-dowhy-an-open-source-python-library-for-causal-machine-learning/ Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Bumps [actions/github-script](https://github.com/actions/github-script) from 6 to 7. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v6...v7) --- updated-dependencies: - dependency-name: actions/github-script dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Before, the scorer was not able to handle numpy object types directly. However, GCM often uses the object dtype to ensure support of mixing categorical and float values. This fixes the handling of object dtypes by explicitly converting them to floats first. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

If the confidence intervals are misspecified, e.g., greater lower bound than upper bound, the method threw an error before. This, however, can sometimes happen due to precision errors in some algorithms and lead to random build fails. This change fixes the issue and ignores invalid intervals accordingly. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Before, the Support Vector Classifier did not produce probabilities, which are required for different algorithms in the GCM module. This changes the 'probability' parameter to True. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

In addition to CRPS and depending on the node data type, it now also reports the MSE, NMSE, R2 and F1 score. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

…de and so on.

…nML estimators (py-why#1061) * auto identify the effect modifier columns Signed-off-by: Amit Sharma <amit_sharma@live.com> * fixed formatting errors Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

…y-why#943) * Deprecate CausalGraph The effect estimation API is now based on an functional API that expects a networkx graph as input. - The graph should now be defined via a networkx graph. Most identification methods now expect an additional "observed_nodes" parameter accordingly. - CausalModel and CausalGraph still exist and should be compatible with the old API. --------- Signed-off-by: Patrick Bloebaum <bloebp@amazon.com> Signed-off-by: Amit Sharma <amit_sharma@live.com> Co-authored-by: Amit Sharma <amit_sharma@live.com>

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

- Slightly update and revise existing GCM notebooks - Moving mediation analysis, direct arrow strength and ICC to their own "Quantify Causal Influence" section - Adding brief overview to describe differences between the quantification methods - Change navigation image to reflect newest changes - Adding related notebooks links to some of the causal task entries - Adding a direct arrow strength example to the ICC notebook - Adding a brief overview of the available root cause analysis and explanation methods - Smaller revision of other GCM entries, such as the basic example - Smaller typos and missing refernce fixes Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

…ed_conditional_estimates is True (py-why#1092) * fixed bug where CATE is not returned by lr Signed-off-by: Amit Sharma <amit_sharma@live.com> * added test Signed-off-by: Amit Sharma <amit_sharma@live.com> * formatted file Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

) * fixed frontdoor bug and added tests Signed-off-by: Amit Sharma <amit_sharma@live.com> * updated docstring Signed-off-by: Amit Sharma <amit_sharma@live.com> * reformatted file Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

* linked to up-to-date list of estimators Signed-off-by: Amit Sharma <amit_sharma@live.com> * updated docs Signed-off-by: Amit Sharma <amit_sharma@live.com> * using absolute paths Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

…hy#1091) * removed deepiv and updated flaky test Signed-off-by: Amit Sharma <amit_sharma@live.com> * black reformattingb Signed-off-by: Amit Sharma <amit_sharma@live.com> * removed all outputs from nb Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

It now does not raise a division by zero error anymore. Other changes: - Add new parameter indicating whether the method requires data for all nodes in the graph or also allows a subset of data. - If no tests were performed, the summary now returns "Cannot be evaluated". Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

… added a test case to start

…de and so on.

…' into equations-to-define-causal-graph # Conflicts: # dowhy/gcm/__init__.py # dowhy/gcm/causal_models.py

bhatt-priyadutt added 2 commits November 7, 2023 15:55

added initial graph to equation logic

b94a3ec

corrected some logic and refactored, modularized

9b1d042

bhatt-priyadutt marked this pull request as draft November 7, 2023 19:04

amit-sharma and others added 26 commits November 10, 2023 10:08

Fix frontdoor estimation bug (py-why#1060)

395d1fa

* fixed frontdoor bug Signed-off-by: Amit Sharma <amit_sharma@live.com> * fixed formatting issues Signed-off-by: Amit Sharma <amit_sharma@live.com> --------- Signed-off-by: Amit Sharma <amit_sharma@live.com>

Add new method to estimate KL divergence using classifier

eb88735

This should work better with multivariate data and mixed data types. However, it is generally slower than the knn appraoch. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Fix issue with auto assignment with imbalanced classes

b9ae10b

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Fix issue with linear regressor with fixed parameters

57656af

Before, when creating a linear regressor with fixed parameters, these parameters are overridden when fit to data. Now, the parameters remain fixed. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

fixed fit error

3f8bf25

Remove 'experimental' disclaimer from GCM modules

cf14caa

Also slightly change citation hint. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

fixed spacing error

4d83d89

Remove deprecated feature.py module from GCM

601c2ae

These methods are now available in the feature_relevance.py module. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Fix issue in KL estimation using knn

b2e75a7

Before, the method threw an error when all samples were equal. However, in these cases, it should rather return a KL divergence of 0. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

fixed some errors and reformatted and added some validation logic and…

18b0f9b

… added a test case to start

Fix handling of NaN values in MedianCDFQuantileScorer

ced5d72

NaN values are now correctly counted when estimating the anomaly score. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Add GCM online shop example notebook

0b8f418

This is an updated and slightly modified version of the blog post: https://aws.amazon.com/blogs/opensource/root-cause-analysis-with-dowhy-an-open-source-python-library-for-causal-machine-learning/ Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Add new example notebook demonstrating the use of the ICC methond in GCM

bd4f95f

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Extend GCM model evaluation by additional metrics

7c015b7

In addition to CRPS and depending on the node data type, it now also reports the MSE, NMSE, R2 and F1 score. Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

added sanitize and some other logics to cater to disconnected root no…

132cf95

…de and so on.

added mode checks and validaion and removed compile func

ae9ccd9

added handling for undefined nodes and node redudancy

af61e42

bloebp and others added 29 commits December 1, 2023 06:58

Update

bddbb3a

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Update

031d49e

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Fix documentation box

3475b19

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Add pywhy refrence

591e9d9

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Make example sections more prominent

f455c56

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

Change readme notebook links to compiled HTML versions

918efc6

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>

updated extract parent nodes logic

357c092

added unknown mech logic support and did some refactoring

5ccc808

added more test cases

da8141e

added initial graph to equation logic

ebc8dca

corrected some logic and refactored, modularized

3375122

fixed fit error

9e63ed3

fixed spacing error

8aa873d

fixed some errors and reformatted and added some validation logic and…

7b543d7

… added a test case to start

added sanitize and some other logics to cater to disconnected root no…

4edb3a4

…de and so on.

added mode checks and validaion and removed compile func

01bc536

added handling for undefined nodes and node redudancy

e38bdd5

updated extract parent nodes logic

1458a05

added unknown mech logic support and did some refactoring

516c14c

added more test cases

4213703

Merge remote-tracking branch 'origin/equations-to-define-causal-graph…

dee6b9a

…' into equations-to-define-causal-graph # Conflicts: # dowhy/gcm/__init__.py # dowhy/gcm/causal_models.py

removed comment

619d894

cmit

7190e30

bhatt-priyadutt closed this Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equations to define causal graph including causal mechanism #1066

Equations to define causal graph including causal mechanism #1066

bhatt-priyadutt commented Nov 7, 2023

bhatt-priyadutt commented Nov 7, 2023

Equations to define causal graph including causal mechanism #1066

Equations to define causal graph including causal mechanism #1066

Conversation

bhatt-priyadutt commented Nov 7, 2023

bhatt-priyadutt commented Nov 7, 2023