Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugIs there a problem calculating the covariance matrix of different samples in the batch? #2584

Open
a504140398 opened this issue Sep 11, 2024 · 2 comments
Labels

Comments

@a504140398
Copy link

My data dimensions are [B, N, D], the first dimension is batchsize, the second dimension is the sequence length in the sample, and the third dimension is the feature channel.
Before feeding into the Approximate Gaussian process, I flatten the first and second dimensions into [BN, D] and feed into the Gaussian process. The output of my Gaussian process is [BN, T], where T is the number of tasks. But is there a problem in this case? So I have 2 issues:

  1. Because the covariance matrix between all samples in a mini_batch is calculated, but in fact there is no relationship between each of my samples, and there is no need to calculate the covariance between different samples.
  2. Because of this problem, I can only perform a for loop according to the batch dimension and use a defined Gaussian process to process all samples one by one. Is this reasonable?
@a504140398 a504140398 added the bug label Sep 11, 2024
@gpleiss
Copy link
Member

gpleiss commented Sep 11, 2024

Are you using a multitask model? And why are you flattening the data? There should be no need to flatten the data; GPyTorch can compute batches of covariance matrices.

A contained reproducible code example would also be helpful.

@a504140398
Copy link
Author

a504140398 commented Sep 12, 2024

Are you using a multitask model? And why are you flattening the data? There should be no need to flatten the data; GPyTorch can compute batches of covariance matrices.

A contained reproducible code example would also be helpful.
I did use a multitask model.The reason I flattened the input data is because if I don't flatten it, I get a dimensionality error.

This is my code for defining GP. Because I didn't find any examples of unflattened data in the tutorials. All I saw were two-dimensional data such as [B,D] input into the Gaussian process, and I didn't see any examples of [B,N,D]

`class MultitaskDeepGP(ApproximateGP):

def __init__(self, input_dims, output_dims, num_inducing=32, linear_mean=True,num_latents=30):
    inducing_points = torch.randn(num_latents, num_inducing, input_dims)
    batch_shape = torch.Size([num_latents])
    variational_distribution = CholeskyVariationalDistribution(
        num_inducing_points=num_inducing,
        batch_shape=batch_shape
    )
    variational_strategy = gpytorch.variational.LMCVariationalStrategy(
        gpytorch.variational.VariationalStrategy(
            self, inducing_points, variational_distribution, learn_inducing_locations=True
        ),
        num_tasks=output_dims,
        num_latents=num_latents,
        latent_dim=-1
    )
    super().__init__(variational_strategy)
    base_mean = ConstantMean() if not linear_mean else LinearMean(input_dims)
    self.mean_module = base_mean
    base_covar = ScaleKernel(
        gpytorch.kernels.RBFKernel(batch_shape=batch_shape),
        batch_shape=batch_shape
    )
    self.covar_module = base_covar
def forward(self, x):
    mean_x = self.mean_module(x)
    covar_x = self.covar_module(x)
    return MultivariateNormal(mean_x, covar_x)`

If I input data shape as [B,N,D], it will report this error
QQ_1726219761065
Could you please give me some suggestions for modifications? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants