-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministics not resampled in posterior predictive #5302
Comments
Does this also happen without InferenceData? |
Yes, the same thing happens with MultiTrace objects. I updated the gist notebook: https://gist.github.com/ericmjl/e82b40805805d9feeaff9f39182ff32c |
Here is a minimal example: import numpy as np
import pymc as pm
with pm.Model() as m:
x = pm.Data('x', np.ones(5))
y = pm.Data('y', np.ones(5))
b = pm.Normal('b')
# mu = b*x # This works fine
mu = pm.Deterministic('mu', b*x)
like = pm.Normal('like', mu, observed=y)
trace = pm.sample()
pm.set_data({'x': np.ones(3)})
ppc = pm.sample_posterior_predictive(trace)
assert ppc.posterior_predictive['like'].shape == (4, 1000, 3) # Fails |
I think this has to do with the fact that This also fails, (now for a good reason, as we wouldn't know what the posterior of the new import numpy as np
import pymc as pm
with pm.Model() as m:
x = pm.Data('x', np.ones(5))
y = pm.Data('y', np.ones(5))
mu = pm.Normal('mu', x)
like = pm.Normal('like', mu, observed=y)
trace = pm.sample()
pm.set_data({'x': np.ones(3)})
ppc = pm.sample_posterior_predictive(trace)
assert ppc.posterior_predictive['like'].shape == (4, 1000, 3) # Fails |
Hello When trying to reproduce the minimal example above, I noticed that it is more than a shape problem: mu is not re-computed when sample_posterior_predictive is called. Indeed, if we consider the following variant of it:
we don't have the shape issue anymore (and the assertion is verified) but the predicted value is around [1, 2, 3] instead of the expected [4, 5, 6]. This suggests a workaround which is to force the computation of mu by adding 'mu' to the var_names parameter of sample_posterior_predictive (having var_names=['mu', 'like']) to force its computation. I checked and it works. I think the reason why mu is not computed in sample_posterior_predictive is related to the following piece of code in this method which adds mu to the list of inputs of the aseara function used to make the predictions (as long as it it not explicitly specified as a variable to sample) because it is neither a constant nor a shared variable:
I think it is because of this that the computations rely on the old values of mu rather than on new ones computed from the new values of x. I am no expert of pymc or aesara so I am not 100% sure of myself but an argument in favor of this is that if we run the minimal example with a modified version of sample_posterior_predictive that calls the method compile_pymc without the parameter
Not very conclusive but I hope it helps... Regards, |
People seem to be finding this issue repeatedly. I think we could do the following:
I don't think point 2 is as critical as 1, but I am afraid people will eventually try it and not realize it doesn't and shouldn't work |
I think this would be necessary (not sure it is possible though). Deterministics could be expensive and if this were done there wouldn't be a way to prevent resampling for deterministics that are expensive and don't change. Even if they are in between a mutable data and an observed, we might want to sample straight away to perform posterior predictive checks (without resampling from the deterministic) and after that updating the data to generate some predictions. Not directly related to the issue itself, but I think the abuse of |
I would err on the side of assuming most people want such Deterministics to be resampled. It should be quite feasible to walk the graph to check that MutableData dependency. Such variables would be automatically resampled if they were not part of a Deterministic, and I doubt many people are using Deterministics as a means to prevent resampling of costly variables. At least that idea had never occurred to me. One can always make a more specialized posterior_predictive function, and hopefully our API is not terribly difficult if that's required. I agree with your other point regarding abuse of Deterministics, but that's a separate question. |
I was bitten by this today. We should also add if the Deterministic has dims that have changed, it should also be resampled. |
My guess is you are seeing a side-effect of #6876 I imagine you are building a distinct model with constant Data / coords for posterior predictive? |
Description of your problem
I noticed that when I want to use
pm.set_data()
, if my model has apm.Deterministic
transformation that gets passed into the likelihood, I will get shape issues.Without the use of
pm.Deterministic
in my model, the shapes of posterior predictive samples, when I pass in new data of length 1, are of shape(chain, draw, 1)
.With the use of
pm.Deterministic
in my model, the shapes of posterior predictive samples, when I pass in new data, are of shape(chain, draw, n)
, wheren
== "length of data that I used to fit the model".I believe this is a bug with
pm.Deterministic
s interacting withpm.set_data
, but I'm not quite sure how to pinpoint the problem.Please provide a minimal, self-contained, and reproducible example.
I found it much easier to provide a reprex as a Jupyter notebook available here: https://gist.github.com/ericmjl/e82b40805805d9feeaff9f39182ff32c
Please provide the full traceback.
No traceback necessary.
Please provide any additional information below.
Versions and main components
The text was updated successfully, but these errors were encountered: