Skip to content

Commit

Permalink
Bugs fixed for multi-GPU support
Browse files Browse the repository at this point in the history
JaggedTensor slicing and indexing moved to CUDA kernels
Small updates to basic_concepts docs and README

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
  • Loading branch information
swahtz committed Sep 2, 2024
1 parent a108bd6 commit 3ba3e63
Show file tree
Hide file tree
Showing 20 changed files with 1,062 additions and 378 deletions.
4 changes: 2 additions & 2 deletions fvdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Lastly, our [documentation](docs) provides deeper details on the concepts as wel

## Installing *f*VDB

fVDB is provided as an installable python package from *[todo: insert package distributor]*. We provide pre-built packages of the latest *f*VDB version for the following dependent library configurations:
fVDB is provided as an installable python package from conda. We provide pre-built packages of the latest *f*VDB version for the following dependent library configurations:

| PyTorch | Python | CUDA |
| -------------- | ---------- | ------- |
Expand All @@ -34,7 +34,7 @@ fVDB is provided as an installable python package from *[todo: insert package di
Use the following command to install `fvdb` into your environment.

```bash
conda install -c jswartz fvdb
conda install [TBD]
```

If you intend to use our learning material such as the [notebooks](notebooks) or [examples](examples), we recommend you start from the `fvdb_learn` conda environment which contains all the dependencies needed to run the learning material as well as build *f*VDB from source. To create this environment, run the following commands from the root of this repository:
Expand Down
16 changes: 5 additions & 11 deletions fvdb/ci/Dockerfile.runner
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,9 @@ ARG CUDNN_VERSION=8

FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu20.04

ENV PATH /usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH /usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:${LD_LIBRARY_PATH}

# # nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics

RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-peer.conf \
&& if [ -f /etc/apt/sources.list.d/cuda.list ]; then \
Expand All @@ -26,8 +23,6 @@ RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-p
git \
unzip \
gfortran \
libopenblas-dev \
liblapack-dev \
ssh \
rsync \
iputils-ping \
Expand All @@ -37,15 +32,14 @@ RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-p
WORKDIR /tmp
RUN mkdir actions-runner && \
cd actions-runner && \
curl -o actions-runner-linux-x64-2.316.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.316.0/actions-runner-linux-x64-2.316.0.tar.gz && \
tar xzf ./actions-runner-linux-x64-2.316.0.tar.gz && \
curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz && \
tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz && \
DEBIAN_FRONTEND=noninteractive ./bin/installdependencies.sh && \
rm actions-runner-linux-x64-2.316.0.tar.gz
rm actions-runner-linux-x64-2.319.1.tar.gz

# used for cross-compilation in docker build
ENV FORCE_CUDA=1
ENV RUNNER_ALLOW_RUNASROOT=1
ENV TORCH_CUDA_ARCH_LIST "6.1;7.0;7.5;8.0;8.6+PTX"

# Install AWS CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
Expand Down
32 changes: 16 additions & 16 deletions fvdb/docs/tutorials/basic_concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ Every operation in fVDB is built upon this kind of query (e.g. Sparse Convolutio

Each grid in a `GridBatch` can have a different number of voxels (****e.g.**** in the mini batch of four cars above, each car has a different number of voxels). This means that unlike the dense case, fVDB needs to handle parallel operations over ***jagged batches***. I.e. batches containing different numbers of elements.

To handle jagged batches, fVDB provides a `JaggedTensor` class. Conceptually, a `JaggedTensor` is a list of tensors with shapes $[N_0, *], [N_1, *], \ldots, [N_B, *]$ where $B$ is the number of elements in the batch, $N_i$ is the number of elements in the $i^\text{th}$ batch item and $*$ is an arbitrary numer of additional dimensions that all match between the tensors. The figure below illustrates such a list of tensors pictorially.
To handle jagged batches, fVDB provides a `JaggedTensor` class. Conceptually, a `JaggedTensor` is a list of tensors with shapes $[N_0, *], [N_1, *], \ldots, [N_{B-1}, *]$ where $B$ is the number of elements in the batch, $N_i$ is the number of elements in the $i^\text{th}$ batch item and $*$ is an arbitrary numer of additional dimensions that all match between the tensors. The figure below illustrates such a list of tensors pictorially.

![jaggedtensor1.png](../imgs/fig/jaggedtensor1.png)

In practice, `JaggedTensor`s are represented in memory by concatenating each tensor in the list into a single `jdata` (for Jagged Data) tensor of shape $[N_0 + N_1 + \ldots + N_B, *]$. Additionally, each `JaggedTensor` stores an additional `jidx` tensor (for Jagged Indexes) of shape $[N_0 + N_1 + \ldots + N_B]$ containing one int per element in the jagged tensor. `jidx[i]` is the batch index of the $i^\text{th}$ element of `jdata`. Finally, a `JaggedTensor` contains a `joffsets` tensor (for Jagged Offsets) of shape $[B, 2]$ which indicates the start and end positions of the $i^\text{th}$ tensor in the batch.
In practice, `JaggedTensor`s are represented in memory by concatenating each tensor in the list into a single `jdata` (for Jagged Data) tensor of shape $[N_0 + N_1 + \ldots + N_{B-1}, *]$. Additionally, each `JaggedTensor` stores an additional `jidx` tensor (for Jagged Indexes) of shape $[N_0 + N_1 + \ldots + N_{B-1}]$ containing one int per element in the jagged tensor. `jidx[i]` is the batch index of the $i^\text{th}$ element of `jdata`. Finally, a `JaggedTensor` contains a `joffsets` tensor (for Jagged Offsets) of shape $[B, 2]$ which indicates the start and end positions of the $i^\text{th}$ tensor in the batch.

![jaggedtensor4.png](../imgs/fig/jaggedtensor4.png)

Expand All @@ -36,6 +36,8 @@ Similarly, each `GridBatch` also has `jidx` and `joffsets` corresponding to the

To illustrate the use of `GridBatch`and `JaggedTensor`, consider a simple example where we build a grid from a point cloud, splat some values onto the voxels of that grid, and then sample them again using a different set of points.

First, we construct a minibatch of grids using the input points. These input points have corresponding color attributes.

```python
import fvdb
import torch
Expand All @@ -48,7 +50,7 @@ pts2, clrs2 = pcu.load_mesh_vn("points2.ply")
pts1, clrs1 = torch.from_numpy(pts1).cuda(), torch.from_numpy(clrs1).cuda()
pts2, clrs2 = torch.from_numpy(pts2).cuda(), torch.from_numpy(clrs2).cuda()

# JaggedTensors of points and normals
# Creating JaggedTensors: one for points and one for colors
points = fvdb.JaggedTensor([pts1, pts2])
colors = fvdb.JaggedTensor([clrs1, clrs2])

Expand All @@ -60,29 +62,27 @@ print(points[0].jdata.shape)
print(points[1].jdata.shape)
```

![We construct a minibatch of grids using the input points. These input points have corresponding color attributes](../imgs/fig/screenshot_000000.png.trim.png)
![Minibatch of grids constructed from the input points. These input points have corresponding color attributes.](../imgs/fig/screenshot_000000.png.trim.png)

We construct a minibatch of grids using the input points. These input points have corresponding color attributes
Next, we splat the colors at the points to the constructed grid, yielding per-voxel colors.

```python
# Splat the normals into the grid with trilinear interpolation
# vox_normals is a JaggedTensor of per-voxel normas
# Splat the colors into the grid with trilinear interpolation
# vox_colors is a JaggedTensor of per-voxel normas
vox_colors = grid.splat_trilinear(points, colors)
```

![We then splat the colors at the points to the constructed grid, yielding per-voxel colors.](../imgs/fig/screenshot_000006.png.trim.png)
![Colors splat at the input points to grid, yielding per-voxel colors.](../imgs/fig/screenshot_000006.png.trim.png)

We then splat the colors at the points to the constructed grid, yielding per-voxel colors.
Finally, we generate a new set of noisy points and sample the grid to recover colors at those new samples.

```python
# Now let's generate some random points and sample the grid at those points
samples = fvdb.JaggedTensor([torch.rand(10_000, 3), torch.rand(11_000, 3)]).cuda()
sample_points = fvdb.JaggedTensor([torch.rand(10_000, 3), torch.rand(11_000, 3)]).cuda()

# sampled_normals is a JaggedTensor with the same shape as samples with
# one normal sampled from the grid at each point in samples
sampled_normals = grid.sample_trilinear(samples)
# sampled_colors is a JaggedTensor with the same shape as sample_points with
# one color sampled from the grid at each point
sampled_colors = grid.sample_trilinear(sample_points, vox_colors)
```

![We now generate a new set of noisy points and sample the grid colors to recover colors at those new samples.](../imgs/fig/screenshot_000004.png.trim.png)

We now generate a new set of noisy points and sample the grid colors to recover colors at those new samples.
![Colors resampled at random locations from the grid.](../imgs/fig/screenshot_000004.png.trim.png)
8 changes: 7 additions & 1 deletion fvdb/fvdb/nn/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
# SPDX-License-Identifier: MPL-2.0
#
import math
from typing import Optional, Union, List, Sequence
from typing import List, Optional, Sequence, Union

import torch
import torch.nn as nn
from torch.profiler import record_function

import fvdb
from fvdb import GridBatch, JaggedTensor

from .vdbtensor import VDBTensor


Expand Down Expand Up @@ -267,6 +268,11 @@ def _dispatch_conv(self, in_feature, in_grid, in_kmap, out_grid):

backend = self.backend

if self.allow_tf32 and self.weight.is_cuda:
assert (
torch.cuda.get_device_capability()[0] >= 8
), "TF32 requires GPU with compute capability >= 8.0. Please set fvdb.nn.SparseConv3d.allow_tf32 = False."

if backend == "cutlass" and (
(not self.weight.is_cuda) or (self.in_channels, self.out_channels) not in self.CUTLASS_SUPPORTED_CHANNELS
):
Expand Down
2 changes: 2 additions & 0 deletions fvdb/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ def download_and_install_cudnn():
"--extended-lambda",
"--diag-suppress=186",
"-diag-suppress=3189",
"-Xfatbin",
"-compress-all",
]
user_nvcc_flags = os.getenv("NVCC_FLAGS", "").split()
nvcc_flags += user_nvcc_flags
Expand Down
Loading

0 comments on commit 3ba3e63

Please sign in to comment.