You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Multi-Device example "Exact GP Regression with Multiple GPU" is currently producing a runtime error on a standard 8-v100 node. The error is Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
To reproduce
Run the Exact GP Regression with Multiple GPUsnotebook and it fails during the call to train
** Stack trace/error message **
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[7], [line 1](vscode-notebook-cell:?execution_count=7&line=1)
----> [1](vscode-notebook-cell:?execution_count=7&line=1) model, likelihood = train(train_x, train_y,
[2](vscode-notebook-cell:?execution_count=7&line=2) n_devices=n_devices, output_device=output_device,
[3](vscode-notebook-cell:?execution_count=7&line=3) preconditioner_size=100,
[4](vscode-notebook-cell:?execution_count=7&line=4) n_training_iter=20)
Cell In[6], [line 42](vscode-notebook-cell:?execution_count=6&line=42)
[39](vscode-notebook-cell:?execution_count=6&line=39) loss = -mll(output, train_y)
[40](vscode-notebook-cell:?execution_count=6&line=40) return loss
---> [42](vscode-notebook-cell:?execution_count=6&line=42) loss = closure()
[43](vscode-notebook-cell:?execution_count=6&line=43) loss.backward()
[45](vscode-notebook-cell:?execution_count=6&line=45) for i in range(n_training_iter):
Cell In[6], [line 39](vscode-notebook-cell:?execution_count=6&line=39)
[37](vscode-notebook-cell:?execution_count=6&line=37) optimizer.zero_grad()
[38](vscode-notebook-cell:?execution_count=6&line=38) output = model(train_x)
---> [39](vscode-notebook-cell:?execution_count=6&line=39) loss = -mll(output, train_y)
[40](vscode-notebook-cell:?execution_count=6&line=40) return loss
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/module.py:31, in Module.__call__(self, *inputs, **kwargs)
[30](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/module.py:30) def __call__(self, *inputs, **kwargs) -> Union[Tensor, Distribution, LinearOperator]:
---> [31](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/module.py:31) outputs = self.forward(*inputs, **kwargs)
[32](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/module.py:32) if isinstance(outputs, list):
[33](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/module.py:33) return [_validate_module_outputs(output) for output in outputs]
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:82, in ExactMarginalLogLikelihood.forward(self, function_dist, target, *params, **kwargs)
[79](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:79) raise ValueError("NaN observation policy 'fill' is not supported by ExactMarginalLogLikelihood!")
[81](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:81) # Get the log prob of the marginal distribution
---> [82](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:82) res = output.log_prob(target)
[83](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:83) res = self._add_other_terms(res, params)
[85](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py:85) # Scale by the amount of data we have
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:193, in MultivariateNormal.log_prob(self, value)
[191](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:191) # Get log determininant and first part of quadratic form
[192](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:192) covar = covar.evaluate_kernel()
--> [193](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:193) inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
[195](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:195) res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)])
[196](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/gpytorch/distributions/multivariate_normal.py:196) return res
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1709, in LinearOperator.inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad)
[1707](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1707) will_need_cholesky = False
[1708](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1708) if will_need_cholesky:
-> [1709](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1709) cholesky = CholLinearOperator(TriangularLinearOperator(self.cholesky()))
[1710](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1710) return cholesky.inv_quad_logdet(
[1711](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1711) inv_quad_rhs=inv_quad_rhs,
[1712](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1712) logdet=logdet,
[1713](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1713) reduce_inv_quad=reduce_inv_quad,
[1714](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1714) )
[1716](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1716) # Short circuit to inv_quad function if we're not computing logdet
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1311, in LinearOperator.cholesky(self, upper)
[1301](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1301) @_implements(torch.linalg.cholesky)
[1302](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1302) def cholesky(
[1303](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1303) self: Float[LinearOperator, "*batch N N"], upper: bool = False
[1304](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1304) ) -> Float[LinearOperator, "*batch N N"]: # returns TriangularLinearOperator
[1305](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1305) """
[1306](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1306) Cholesky-factorizes the LinearOperator.
[1307](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1307)
[1308](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1308) :param upper: Upper triangular or lower triangular factor (default: False).
[1309](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1309) :return: Cholesky factor (lower or upper triangular)
[1310](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1310) """
-> [1311](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1311) chol = self._cholesky(upper=False)
[1312](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1312) if upper:
[1313](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:1313) chol = chol._transpose_nonbatch()
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:59, in _cached.<locals>.g(self, *args, **kwargs)
[57](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:57) kwargs_pkl = pickle.dumps(kwargs)
[58](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:58) if not _is_in_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl):
---> [59](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:59) return _add_to_cache(self, cache_name, method(self, *args, **kwargs), *args, kwargs_pkl=kwargs_pkl)
[60](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:60) return _get_from_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl)
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:521, in LinearOperator._cholesky(self, upper)
[518](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:518) if any(isinstance(sub_mat, KeOpsLinearOperator) for sub_mat in evaluated_kern_mat._args):
[519](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:519) raise RuntimeError("Cannot run Cholesky with KeOps: it will either be really slow or not work.")
--> [521](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:521) evaluated_mat = evaluated_kern_mat.to_dense()
[523](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:523) # if the tensor is a scalar, we can just take the square root
[524](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/_linear_operator.py:524) if evaluated_mat.size(-1) == 1:
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:59, in _cached.<locals>.g(self, *args, **kwargs)
[57](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:57) kwargs_pkl = pickle.dumps(kwargs)
[58](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:58) if not _is_in_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl):
---> [59](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:59) return _add_to_cache(self, cache_name, method(self, *args, **kwargs), *args, kwargs_pkl=kwargs_pkl)
[60](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/utils/memoize.py:60) return _get_from_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl)
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:81, in SumLinearOperator.to_dense(self)
[79](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:79) @cached
[80](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:80) def to_dense(self: Float[LinearOperator, "*batch M N"]) -> Float[Tensor, "*batch M N"]:
---> [81](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:81) return (sum(linear_op.to_dense() for linear_op in self.linear_ops)).contiguous()
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:81, in <genexpr>(.0)
[79](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:79) @cached
[80](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:80) def to_dense(self: Float[LinearOperator, "*batch M N"]) -> Float[Tensor, "*batch M N"]:
---> [81](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/sum_linear_operator.py:81) return (sum(linear_op.to_dense() for linear_op in self.linear_ops)).contiguous()
File ~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/cat_linear_operator.py:384, in CatLinearOperator.to_dense(self)
[383](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/cat_linear_operator.py:383) def to_dense(self: Float[LinearOperator, "*batch M N"]) -> Float[Tensor, "*batch M N"]:
--> [384](https://vscode-remote+ssh-002dremote-002b104-002e171-002e203-002e247.vscode-resource.vscode-cdn.net/home/ubuntu/gpytorch/examples/02_Scalable_Exact_GPs/~/micromamba/envs/newenv/lib/python3.12/site-packages/linear_operator/operators/cat_linear_operator.py:384) return torch.cat([to_dense(L) for L in self.linear_ops], dim=self.cat_dim)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Expected Behavior
The example should run with multiple GPUs
System information
Please complete the following information:
GPytorch version 1.12
PyTorch version 2.4.0+cu121
Ubuntu 22.04.3 LTS
GPU node provisioned via Lambda Labs 8x Tesla V100 (16 GB), 92 CPU cores, 460.1 GB RAM, 6.5 TB SSD
Additional context
Here is the output of nvidia-smi in case GPUs or drivers are relevant
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB On | 00000000:00:04.0 Off | 0 |
| N/A 39C P0 55W / 300W | 1033MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-16GB On | 00000000:00:05.0 Off | 0 |
| N/A 40C P0 59W / 300W | 1061MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-16GB On | 00000000:00:06.0 Off | 0 |
| N/A 44C P0 62W / 300W | 965MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-16GB On | 00000000:00:07.0 Off | 0 |
| N/A 38C P0 55W / 300W | 969MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2-16GB On | 00000000:00:08.0 Off | 0 |
| N/A 40C P0 55W / 300W | 1041MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2-16GB On | 00000000:00:09.0 Off | 0 |
| N/A 43C P0 55W / 300W | 1057MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2-16GB On | 00000000:00:0A.0 Off | 0 |
| N/A 43C P0 56W / 300W | 1009MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2-16GB On | 00000000:00:0B.0 Off | 0 |
| N/A 41C P0 59W / 300W | 985MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 1030MiB |
| 1 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 1058MiB |
| 2 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 962MiB |
| 3 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 966MiB |
| 4 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 1038MiB |
| 5 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 1054MiB |
| 6 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 1006MiB |
| 7 N/A N/A 6876 C ...u/micromamba/envs/newenv/bin/python 982MiB |
+---------------------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered:
🐛 Bug
The Multi-Device example "Exact GP Regression with Multiple GPU" is currently producing a runtime error on a standard 8-v100 node. The error is
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
To reproduce
Run the
Exact GP Regression with Multiple GPUs
notebook and it fails during the call totrain
** Stack trace/error message **
Expected Behavior
The example should run with multiple GPUs
System information
Please complete the following information:
Additional context
Here is the output of
nvidia-smi
in case GPUs or drivers are relevantThe text was updated successfully, but these errors were encountered: