Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan to depend on refactored arviz #7499

Open
OriolAbril opened this issue Sep 11, 2024 · 0 comments
Open

Plan to depend on refactored arviz #7499

OriolAbril opened this issue Sep 11, 2024 · 0 comments
Labels

Comments

@OriolAbril
Copy link
Member

Description

With PyMC being the main user of ArviZ, I would like to coordinate regarding the ongoing refactor on the arviz side as it has a lot of breaking changes.

General idea

Split ArviZ into multiple smaller subpackages, so it isn't such a huge monolithical block but a more modular thing. Each of these smaller libraries: arviz-base, arviz-stats and arviz-plots has as dependencies only the minimal set strictily needed, anything that extends functionality or that does things that can happen via different alternatives (like plotting backend or idata io engine) is an optional dependency.

We still plan to have an arviz package which would install all 3 of them (unclear if along with some "default" optional dependencies to have a feel closer to what it is now) which exposes the functions from all 3 libraries through a common namespace. But for people running a model on a cloud for example, it is might best to install pymc and arviz-base only, save the output as zarr or netcdf and download it. Then locally or on a smaller machine run convergence checks and analyze the results.

Module/library highlight of breaking changes

arviz-base

Uses DataTree instead of InferenceData. This will probably be the main pain point but also a source of nice new features.
New features, more io backends and support for nested hierarchies. Potential pain points idata[group] will be a DataTree instead of a Dataset even if there are no nested groups. DataTree is new so it will probably have some rough edges for a bit, plus the custom methods like .map or .extend won't exist anymore (there are things like merge, map_over_subtree...).

A bit more flexible in general, especially when it comes to groups, no warnings anymore for "unrecognized" ones things like that.

Small ask for help. DataTree supports nested groups, but I don't have an example of this nor I am sure how should nested groups behave.

arviz-stats

Very unclear as of now, it is the last module to be worked on. For now it mostly has what we need for arviz-plots to work.

arviz-plots

The main focus on this end has actually been easing development and maintenance, but thanks to the refactor it is more flexible when it comes to facetting/aesthetics mappings as well as more homogeneous plotting backend support (instead of nice matplotlib and barely working bokeh stuff) having now support for matplotlib, bokeh and plotly.

Several plots have been renamed such as plot_posterior -> plot_dist, plot_trace -> plot_trace_dist (plot_trace continues to exist but plots only the traces now). And all plots return a new class defined in arviz-plots called PlotCollection which contains the figure, axes and artist objects in matplotlib lingo.

This is the more advanced out of the 3 libraries in my opinion and it is ready to use, so it would be great to get people to test it out. My recommendation is install arviz-plots from github along with pymc+arviz, then you can pass arviz.InferenceData to arviz-plots functions. Useful docs: example gallery of updated plots (showing all 3 backends) and main intro notebook


Regarding PyMC itself. What would you like PyMC to depend on? And how would you like PyMC to behave?

For me, continuing to depend on arviz (provided it only installs the 3 arviz-xyz, numpy, scipy and xarray) would probably be best so functionality continues to be the same, convergence checks continue to be run by default and stats and plots can continue to be exposed if desired (even if plotting won't work unless at least one of the plotting backends is installed).

And how would you coordinate updates in pymc to account for the breaking changes that will happen? Keep in mind the still unclear timeline on arviz-xyz so I don't think it is nothing urgent and there is a lot of room to do things however we want on this end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant