Possibility of training a noiseless DSPP / DGP model? #1695

nathanohara · 2021-07-11T18:52:24Z

nathanohara
Jul 11, 2021

Hello,

I have been curious about fitting GPs and Deep GPs / Deep Sigma Point Processes without an additional noise term in the likelihood. I am looking for the behavior shown in Figures 4 and 7 in this paper: https://arxiv.org/pdf/1905.03350.pdf which I will also paste below:

The exact GP and the Deep GP in these figures both interpolate through all the training points exactly, given the observations are noiseless. I want to replicate this behavior in GPyTorch.

Following the Simple GP Regression tutorial, I am able to achieve this behavior by replacing the GaussianLikelihood with a FixedNoiseGaussianLikelihood and specifying the observation noise to be 0. (I am also curious if there is a preferred way to accomplish this!)

After this, I followed the DSPP tutorial, and after changing the likelihood to a FixedNoiseGaussianLikelihood with zero noise as before, I am still unable to achieve the "interpolation" behavior. The fit still appears to consider the observations to be noisy. Here is a link to a plot demonstrating the DSPP model fit on some sampled data from the same function as in the linked paper:

The model was specified just as in the tutorial, except with the following specifications:

batch_size = 10 # Size of minibatch milestones = [20, 150, 300] # Epochs at which we will lower the learning rate by a factor of 0.1 num_inducing_pts = 20 # Number of inducing points in each hidden layer num_epochs = 200 # Number of epochs to train for initial_lr = 0.01 # Initial learning rate hidden_dim = 10 # Number of GPs (i.e., the width) in the hidden layer. num_quadrature_sites = 8 # Number of quadrature sites (see paper for a description of this. 5-10 generally works well).

I have tried a number of different settings and each time the model seems to converge, but never achieves the interpolation behavior.

Thank you for any input on this!

gpleiss · 2021-07-13T15:04:25Z

gpleiss
Jul 13, 2021
Maintainer

Following the Simple GP Regression tutorial, I am able to achieve this behavior by replacing the GaussianLikelihood with a FixedNoiseGaussianLikelihood and specifying the observation noise to be 0. (I am also curious if there is a preferred way to accomplish this!)

In general, 0 noise may cause some numerical instabilities. It's better to do something like 1e-4 or 1e-6.
Another way to accomplish this is to use a normal GaussianLikelihood, and after initialization run:

likelihood.initialize(noise=1e-6)
likelihood.raw_noise.requires_grad_(False)

1 reply

nathanohara Jul 14, 2021
Author

Hi Geoff,

Thank you for the response. This does produce much better fits on the exact GP. Do you have any insight about replicating this behavior (GP mean interpolating between data points) in a deep model such as DSPP? Changing the likelihood as such does not achieve this.

nathanohara · 2021-07-17T18:49:56Z

nathanohara
Jul 17, 2021
Author

Hi all -- I have still been unable to achieve interpolation with the DSPP model. Please let me know if you have any guidance! Thanks again,
Nathan

1 reply

gpleiss Oct 23, 2021
Maintainer

See my suggestion below for ApproximateGPs. This is an optimization/initialization issue - the DSPP is getting stuck in a local optimum. Try changing the lengthscale initialization to be something small - and play around with the initial values of other hyperparameters.

More generally, the DSPP models in some sense are better suited to explaining function changes as variance (see, for example, Section 3.1 in https://arxiv.org/pdf/1910.07123.pdf). If you want low-variance interpolation, a DeepGP might be the better option.

nathanohara · 2021-07-21T21:01:27Z

nathanohara
Jul 21, 2021
Author

Hi @gpleiss and GPyTorch team,

I have still have not achieved this behavior using DGP / DSPP models. Today I took a look at fitting an exact GP with no additional noise term compared to fitting a SVGP with no additional noise term, and these are the fits obtained:

This approximate GP fit is using the code from the tutorial except uses the entire (small) training set as inducing points.

I am wondering if this suggests that there is something inherent to the approximation that causes the smoothing effect and disallows fitting a function without noise.

Thanks again for any input!

1 reply

gpleiss Oct 23, 2021
Maintainer

I think that this is more likely an issue with the optimization and the initialization of the hyper parameters. This is a bit of a dark art, so I'm not sure if there's any strategy that would always make sense. However, I would try initializing the lengthscale to be something small (i.e. model.covar_moduel.base_kernel.initialize(lengthscale=0.1). This might prevent the model from converging to the "smoothed" local minimum.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility of training a noiseless DSPP / DGP model? #1695

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Possibility of training a noiseless DSPP / DGP model? #1695

nathanohara Jul 11, 2021

Replies: 3 comments · 3 replies

gpleiss Jul 13, 2021 Maintainer

nathanohara Jul 14, 2021 Author

nathanohara Jul 17, 2021 Author

gpleiss Oct 23, 2021 Maintainer

nathanohara Jul 21, 2021 Author

gpleiss Oct 23, 2021 Maintainer

nathanohara
Jul 11, 2021

Replies: 3 comments 3 replies

gpleiss
Jul 13, 2021
Maintainer

nathanohara Jul 14, 2021
Author

nathanohara
Jul 17, 2021
Author

gpleiss Oct 23, 2021
Maintainer

nathanohara
Jul 21, 2021
Author

gpleiss Oct 23, 2021
Maintainer