Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphEM improvements #4

Open
4 of 8 tasks
CommonClimate opened this issue Jul 15, 2022 · 0 comments
Open
4 of 8 tasks

GraphEM improvements #4

CommonClimate opened this issue Jul 15, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@CommonClimate
Copy link
Collaborator

CommonClimate commented Jul 15, 2022

Now that we've confirmed that the cfr implementation of GraphEM can run on non-pathological cases, it needs to be upgraded to the next level:

Cross-Validation

The choice of regression model in GraphEM (the graph) is still very unsatisfactory: whether the cutoff radius for a neighborhood graph or the target sparsities of a graphical LASSO ("glasso") graph, the only way to do it now is by trial & error which is unscientific, error-prone, and, frankly, a little embarrassing. We can do a lot better than that with cross-validation.

  • implement k-fold, block-style cross-validation over the instrumental period (this might entail retooling verif_stats to look at more than the average MSE over the field, for instance). Use the 1-sigma rule. k=5 by default.
  • cross-check with pre-instrumental data in pseudoproxy experiments

glasso capabilities

Neighborhood graphs are a quick and dirty way to get a reconstruction, but they underuse the available information. If enough data are available for calibration, glasso can do much better at extracting structure and capturing spatial dependencies. However, glasso is in need of the following updates:

  • defining reasonable sparsity levels for cross-validation.
  • allow for hybrid graphs where the climate field graph is obtained through glasso but the proxy-field graph can be neighborhood-based
  • compare gains to neighborhood graphs on standard test cases

temperature assumption

as in #2 , this code was written with the assumption that temperature is the only field of interest. Math stays the same, so changing the nomenclature won't change any numerical behavior, but I will still try to:

  • replace all mentions of "temperature" by "field"

Next level (optional)

  • explore the use of skggm, which uses the scikit-learn API to fit GGMs.
  • explore the family-wise error rate method to help choosing a sensible graph for the climate part of the covariance matrix, and maybe the climate-proxy part as well. (need Dominique's input)
@CommonClimate CommonClimate added the enhancement New feature or request label Jul 15, 2022
@CommonClimate CommonClimate self-assigned this Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant