How does `swyft` handle simulations that fail in some regime? #103

erinaldi · 2022-02-22T07:32:27Z

erinaldi
Feb 22, 2022

Hello,

I have been using swyft (and sbi) for simulation-based inference in different physical systems (mostly for my own curiosity and investigating the connection with parameter inference of ODEs and PDEs).

I would like to ask a question that might be more general than how I formulated it in the title of this discussion.
Let me try to introduce the problem with an example:

A simulator models the dynamics of a system and outputs a vector quantity at several time steps.
The dynamics has a start time t₀ and a final time t₁
The observational data available from the system, which I want to use as my x₀ when computing the posterior is always in a fixed range [t₀ , t₂ ] where t₂ can be different and larger than t₁
The simulator may stop at t₁ < t₂ depending on the parameter values in the dynamical model (so it depends on the sampled parameters from the prior)
The reason why the simulator stops is not necessarily a failure of the model, but it could also be, depending on the parameter values

In swyft it is possible to output nan from the simulator in some cases where there is failure or when we do not want those parameters to be considered in inference (and sbi does the same).
It is possible to have the simulator output data with different shapes (e.g. t₁-t₀ values or t₂-t₀ values) at each step. However, I don't think this would be ideal for training and "padding" the remaining values (e.g. t₂-t₁ values) with nan might result in completely discarding those samples during training. "padding" with different values might work in some cases, but I wonder if swyft has some built-in mechanism for this.
I would also be interested in understanding if having different output shapes for different samples than the actual observational data we need for the posterior actually makes sense at all in the context of simulation-based inference (or bayesian inference in general, for example).

Thank you,
Enrico

cranmer · 2022-02-22T13:11:54Z

cranmer
Feb 22, 2022

Hi Enrico,

Only some partial answers and questions here...

First, regarding the shape of the data. You wrote:

It is possible to have the simulator output data with different shapes (e.g. t₁-t₀ values or t₂-t₀ values) at each step.
I'm not 100% sure how to interpret that. I'm guessing you mean that you have a sequence of vectors {x(t_i)} where x(t) is the output vector of the simulator at time t and by different shape you mean that the index i can run over different ranges and the. sequence can have different length. Is that right?

If so, I don't see any problem in principle with the simulator producing sequences of different lengths. In one way of thinking about it all that data has the same "shape" in that it lives in the same abstract space. Similar to a Poisson point process that emits a variable number of events where each event might be vector-valued. But if this is the case, the likelihood you are trying to model is going to be more challenging and you will need a NN model that can handle your shape of data. I don't think you want to imagine this data as being in R^N and pad, you want to think of it like a sequence and probably use an autoregressive model.

If the termination time of the simulator is physical / meaningful then that time is important part of the data. In that case you might imagine the output data being {(t_i, x(t_i)} pairs where the last entry is your t_2. It's like the end of a sentence and modeling p(stop) and 1-p(stop) are important parts of the implicit likelihood. If you are randomly sampling some time between t_0 and t_2 to act as your observation x_0 = x(t=t_obs), then that is also part of the model and you want to make sure the distribution for when the dynamical system is sampled -- e.g. t_obs ~ p_obs(t_0, t_2) -- is a reasonable model for what the actual data you will be fitting.

Finally, the nan question. If there are parts of the parameter space where the simulator fails and you want to regard these as inadmissible / unallowed regions of parameter space (as opposed to some avoidable computational problem), then I think what you want to do is regard the likelihood as 0 (or -log likelihood = inf) so that the posterior there is 0. That's not the same as having the simulator output a nan in the data space. I'm not sure if swift offers anyway to somehow signal this, but it does seem like a nice feature to add. As a hack you could have the model output some dummy data that is very different from any possible observation. Then I would guess that the model would learn a dummy density that would assign 0 likelihood to the observed data.

It's not totally clear to me if t_2 is deterministic (depending on parameters) or not. I would guess so if it's a dynamical system.
If t_2 is deterministic and t_2 < t_obs then you also have the situation of 0 likelihood. If your simulated data includes t_obs then the model should be able to learn that distribution has a cutoff, but that kind of discontinuity might be better handled with some special approach (like use a NN to predict t_2(parameters) and add if t_obs > t_2_predicted(parameters): return 0.

1 reply

erinaldi Feb 23, 2022
Author

@cranmer just answering in this thread about the questions you had (but also check my complete answer below in the page).

My simulator outputs x(t) where t runs over different values and the sequence of t can be different (the end point specifically)
On the other hand, the data is observed is over a fixed range of t.

Regarding your last paragraph, yes, t_1 is deterministic (t_2 is my data, so it’s fixed). I can actually analytically compute it given the parameters of the model. In fact, the model does not completely fail after t_1, it just becomes “less faithful”. Do I want to give it zero likelihood? That is one choice, but maybe I could still give it some “weight” … difficult to model though.

cweniger · 2022-02-23T05:12:57Z

cweniger
Feb 23, 2022
Maintainer

Hi Enrico,

I will add a few more swyft-specific aspects to the discussion:

Simulator output with different shapes is something that can be handled with simulation-based inference or Bayesian inference in general. In that case, the shape of the output (for instance the number of bins in a time sequence, or the number of observed gamma-ray photons during some observational period) is then itself a random variable. Observing an event with a specific shape would carry information about the underlying model parameters.

In your case, however, I'm not sure if you are in that situation. What matters is the shape of the observational data, not the shape of the physics simulator output. If I understand correctly, in your example your observational data would always cover [t_0, t_2], no matter whether t_1 < t_2 or t_1 > t_2. For values t_1 < t < t_2 you then simply have to simulate what the detector would measure if the event as shorter than your observational period. If you simulated SN light curves, for instance, I guess that would mean doing zero padding. In the case where other complementary observations would provide additional information about t_1, you could feed these observations into swyft in addition to the time sequence data, but details will depend on the specifics of your situation. In that case you would have multiple observational states, which in swyft means the dictionary-valued output of the simulator would provide separately the time-sequence data as well as the estimator for t_1.

In cases where the observational data itself varies in shape, e.g. where some of the time bins are missing, and you know for a given observation deterministically that they are missing, you could provide the zero-padded time sequence, and additionally a boolean vector that indicates which bins are measured and which ones are not. You then need some appropriate network structure, in swyft in the Head network, to handle this information and compress it into feature vectors. Out of the top of my head I'm not sure what network structures are state-of-the-art for situations with missing data, but I suspect this will be a relatively common situation and there must be standard solutions by now.

Implementing custom "head networks" for handling such situations is possible in swyft right now, but can be a bit tricky. Please ask if you have questions! You can look for examples with CustomHeadNetwork in the code and example notebooks. We are working on updates for swyft that will make this kind of customization more straightforward.

Best,
Christoph

2 replies

erinaldi Feb 23, 2022
Author

In the case where other complementary observations would provide additional information about t_1, you could feed these observations into swyft in addition to the time sequence data, but details will depend on the specifics of your situation. In that case you would have multiple observational states, which in swyft means the dictionary-valued output of the simulator would provide separately the time-sequence data as well as the estimator for t_1.

@cweniger could you elaborate on this?
When you have multiple keys in the output dictionary, how are they interpreted by swyft?

cweniger Feb 26, 2022
Maintainer

@erinaldi The multiple keys are provided as input to the head network. In the head network this has to be then compressed down to some feature vector. In this way, the impact of multiple observables on the posterior can be taken into account.

erinaldi · 2022-02-23T08:46:17Z

erinaldi
Feb 23, 2022
Author

Thanks Kyle and Christoph,

despite my very vague and general question you both touched upon very important issues that I was considering.

Let me make some remarks which would make the problem above more specific and I will also try to refer to each of your answers.

My original idea was to simply assign zero likelihood to all the model parameters resulting in a t_1 shorter than t_2, as mentioned by @cranmer, and actually my t_2 is given by observation data while t_1 is deterministically determined by other parameters of the simulator.

This is a rather common choice if you think that the simulator is the entire generative model for the data: to compare generated data and observed data they need to have the same shape (I think this is a requirement).
As mentioned by @cweniger my observational data is always of fixed shape, with initial time and final time given by the instruments used to observe the system. (I think you can pretty much guess what I’m talking about but I’ll remain vague and general as much as possible for now.)

With observation data for a fixed t_i sequence (no sampling of t_obs in this case @cranmer ) and a simulator that may fail to cover the entire time sequence for physical reasons, I have done simulation based inference with sbi where the “failing” simulations have nan values.
These “special” values can be used to train a classifier to identify automatically which regions of parameter space should have zero likelihood (effectively the classifier is used to restrict the prior and only sample from regions of the parameter space where the simulator does not fail to cover the entire time range). This is a feature included in the sbi package and it is the reason why I was mentioning introducing nan in the simulator output.

However, this will “throw away” the entire set of parameters, while sometimes I still would like to check what data the simulator outputs and how it compares to the observations, even if only partially.
Regarding this situation, if I understand correctly, @cweniger mentioned that I would still need the simulator to output “something” for the “missing times in the sequence”: this is a possibility, where my simulator is only a partial model of the data when t_1 < t_2, and some other simulator should be used for t_1 to t_2 data points…if I had that model.
Do you know what would be the way to treat this “missing” simulated data in a Bayesian inference flow? I could model the output as random variables for those times larger than t_1 but it would really complicate the entire procedure.

Let me briefly comment about the Head network. Those can really be useful to compress information and automatically extract summary statistics, even in complicated cases like missing data. A priori, there is no “optimal” network structure, so one can craft something and do trials and errors, I guess.

One more thing for @cweniger : observations come with error bars due to instrumental precision and other modeling effects on the experimental side (common in astrophysics and particle physics) Am I correct in my understanding that swyft will handle this with the simhook in the simulator? The posterior does not know anything about the error bars on the observational data, is that right?

Thank you very much. I am really appreciative of all the work you guys are putting in developing these tools for simulated based inference!

3 replies

cweniger Feb 26, 2022
Maintainer

@erinaldi I still don't understand your actual setup and need more information to provide a useful answer. The scenario that I had in mind was that you simulate the time-dependent emission signal from some astrophysical source, which just switches off at t_1. Apparently, this not the case, since you mention above that for t > t_1 your simulations just become less reliable. Is that correct? The challenge is then that you don't know for a given observation what that critical value of t_1 is, since it is itself dependent on the unknown parameters.

I think in that case good approaches are either (a) Figure out what is the minimal t_1 for your entire parameter space of interest, and always neglect data after that minimal t_1. or (b) try to come up with some model for the systematic uncertainties for times larger than t_1, and somehow build this into the training strategy. This might be however quite difficult, since you need to come up with some probabilistic model for your systematic uncertainties.

About your other questions: The simhook can be used to re-simulate noise, so that you add different noise realizations to your training data in each training round. That helps to increase the training data variance. In general, Gaussian or Poissonian measurement errors etc would be included here.

erinaldi Feb 27, 2022
Author

Your interpretation is correct, my t_1 from the simulator may, or may not, be the true end of the physical process.
The way we are going to address this problem are actually very similar to what you mention in (a) and (b). We had some internal discussions and that is what we will attempt to do in swyft.
We are limited to the particular simulator we have, and when that becomes unreliable, we can choose to (a) make sure we always consider ONLY data for t_2<=t_1 or (b) add a new “model” for t>t_1 with some uncertainty (which, like you said, could be difficult…).

Thanks re: simhook. Systematic error could be included too, right?

cweniger Feb 27, 2022
Maintainer

Yes, the `systematic error would be there as well. There is no practical difference between sys or stat errors in this context, except that systematic errors typically have a complex correlation structure. If you want to explore the impact of different systematic uncertainty models for your option (b), then it definitively makes sense to put these systematics into the simhook since this allows you to explore the effect on posteriors without actually rerunning the simulation code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does `swyft` handle simulations that fail in some regime? #103

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How does swyft handle simulations that fail in some regime? #103

erinaldi Feb 22, 2022

Replies: 3 comments · 6 replies

cranmer Feb 22, 2022

erinaldi Feb 23, 2022 Author

cweniger Feb 23, 2022 Maintainer

erinaldi Feb 23, 2022 Author

cweniger Feb 26, 2022 Maintainer

erinaldi Feb 23, 2022 Author

cweniger Feb 26, 2022 Maintainer

erinaldi Feb 27, 2022 Author

cweniger Feb 27, 2022 Maintainer

How does `swyft` handle simulations that fail in some regime? #103

erinaldi
Feb 22, 2022

Replies: 3 comments 6 replies

cranmer
Feb 22, 2022

erinaldi Feb 23, 2022
Author

cweniger
Feb 23, 2022
Maintainer

erinaldi Feb 23, 2022
Author

cweniger Feb 26, 2022
Maintainer

erinaldi
Feb 23, 2022
Author

cweniger Feb 26, 2022
Maintainer

erinaldi Feb 27, 2022
Author

cweniger Feb 27, 2022
Maintainer