Bugs fixed for multi-GPU support

JaggedTensor slicing and indexing moved to CUDA kernels Small updates to basic_concepts docs and README Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
AcademySoftwareFoundation · Sep 2, 2024 · 3ba3e63 · 3ba3e63
1 parent a108bd6
commit 3ba3e63
Show file tree

Hide file tree

Showing 20 changed files with 1,062 additions and 378 deletions.
diff --git a/fvdb/README.md b/fvdb/README.md
@@ -19,7 +19,7 @@ Lastly, our [documentation](docs) provides deeper details on the concepts as wel
 
 ## Installing *f*VDB
 
-fVDB is provided as an installable python package from *[todo: insert package distributor]*.  We provide pre-built packages of the latest *f*VDB version for the following dependent library configurations:
+fVDB is provided as an installable python package from conda.  We provide pre-built packages of the latest *f*VDB version for the following dependent library configurations:
 
 |   PyTorch      | Python     | CUDA |
 | -------------- | ---------- | ------- |
@@ -34,7 +34,7 @@ fVDB is provided as an installable python package from *[todo: insert package di
 Use the following command to install `fvdb` into your environment.
 
 ```bash
-conda install -c jswartz fvdb
+conda install [TBD]
 ```
 
 If you intend to use our learning material such as the [notebooks](notebooks) or [examples](examples), we recommend you start from the `fvdb_learn` conda environment which contains all the dependencies needed to run the learning material as well as build *f*VDB from source. To create this environment, run the following commands from the root of this repository:

diff --git a/fvdb/ci/Dockerfile.runner b/fvdb/ci/Dockerfile.runner
@@ -3,12 +3,9 @@ ARG CUDNN_VERSION=8
 
 FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu20.04
 
-ENV PATH /usr/local/cuda/bin:$PATH
-ENV LD_LIBRARY_PATH /usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:${LD_LIBRARY_PATH}
-
 # # nvidia-container-runtime
-ENV NVIDIA_VISIBLE_DEVICES all
-ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
+ENV NVIDIA_VISIBLE_DEVICES=all
+ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics
 
 RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-peer.conf \
     && if [ -f /etc/apt/sources.list.d/cuda.list ]; then \
@@ -26,8 +23,6 @@ RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-p
          git \
          unzip \
          gfortran \
-         libopenblas-dev \
-         liblapack-dev \
          ssh \
          rsync \
          iputils-ping \
@@ -37,15 +32,14 @@ RUN echo "Acquire { https::Verify-Peer false }" > /etc/apt/apt.conf.d/99verify-p
 WORKDIR /tmp
 RUN mkdir actions-runner && \
     cd actions-runner && \
-    curl -o actions-runner-linux-x64-2.316.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.316.0/actions-runner-linux-x64-2.316.0.tar.gz && \
-    tar xzf ./actions-runner-linux-x64-2.316.0.tar.gz && \
+    curl -o actions-runner-linux-x64-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz && \
+    tar xzf ./actions-runner-linux-x64-2.319.1.tar.gz && \
     DEBIAN_FRONTEND=noninteractive ./bin/installdependencies.sh && \
-    rm actions-runner-linux-x64-2.316.0.tar.gz
+    rm actions-runner-linux-x64-2.319.1.tar.gz
 
 # used for cross-compilation in docker build
 ENV FORCE_CUDA=1
 ENV RUNNER_ALLOW_RUNASROOT=1
-ENV TORCH_CUDA_ARCH_LIST "6.1;7.0;7.5;8.0;8.6+PTX"
 
 # Install AWS CLI
 RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \

diff --git a/fvdb/docs/tutorials/basic_concepts.md b/fvdb/docs/tutorials/basic_concepts.md
@@ -22,11 +22,11 @@ Every operation in fVDB is built upon this kind of query (e.g. Sparse Convolutio
 
 Each grid in a `GridBatch` can have a different number of voxels (****e.g.**** in the mini batch of four cars above, each car has a different number of voxels). This means that unlike the dense case, fVDB needs to handle parallel operations over ***jagged batches***. I.e. batches containing different numbers of elements.
 
-To handle jagged batches, fVDB provides a `JaggedTensor` class. Conceptually, a `JaggedTensor` is a list of tensors with shapes $[N_0, *], [N_1, *], \ldots, [N_B, *]$ where $B$ is the number of elements in the batch, $N_i$ is the number of elements in the $i^\text{th}$ batch item and $*$ is an arbitrary numer of additional dimensions that all match between the tensors. The figure below illustrates such a list of tensors pictorially.
+To handle jagged batches, fVDB provides a `JaggedTensor` class. Conceptually, a `JaggedTensor` is a list of tensors with shapes $[N_0, *], [N_1, *], \ldots, [N_{B-1}, *]$ where $B$ is the number of elements in the batch, $N_i$ is the number of elements in the $i^\text{th}$ batch item and $*$ is an arbitrary numer of additional dimensions that all match between the tensors. The figure below illustrates such a list of tensors pictorially.
 
 ![jaggedtensor1.png](../imgs/fig/jaggedtensor1.png)
 
-In practice, `JaggedTensor`s are represented in memory by concatenating each tensor in the list into a single `jdata` (for Jagged Data) tensor of shape $[N_0 + N_1 + \ldots + N_B, *]$. Additionally, each `JaggedTensor` stores an additional `jidx` tensor (for Jagged Indexes) of shape $[N_0 + N_1 + \ldots + N_B]$ containing one int per element in the jagged tensor. `jidx[i]` is the batch index of the $i^\text{th}$ element of `jdata`. Finally, a `JaggedTensor` contains a `joffsets` tensor (for Jagged Offsets) of shape $[B, 2]$ which indicates the start and end positions of the $i^\text{th}$ tensor in the batch.
+In practice, `JaggedTensor`s are represented in memory by concatenating each tensor in the list into a single `jdata` (for Jagged Data) tensor of shape $[N_0 + N_1 + \ldots + N_{B-1}, *]$. Additionally, each `JaggedTensor` stores an additional `jidx` tensor (for Jagged Indexes) of shape $[N_0 + N_1 + \ldots + N_{B-1}]$ containing one int per element in the jagged tensor. `jidx[i]` is the batch index of the $i^\text{th}$ element of `jdata`. Finally, a `JaggedTensor` contains a `joffsets` tensor (for Jagged Offsets) of shape $[B, 2]$ which indicates the start and end positions of the $i^\text{th}$ tensor in the batch.
 
 ![jaggedtensor4.png](../imgs/fig/jaggedtensor4.png)
 
@@ -36,6 +36,8 @@ Similarly, each `GridBatch` also has `jidx` and `joffsets` corresponding to the
 
 To illustrate the use of `GridBatch`and `JaggedTensor`, consider a simple example where we build a grid from a point cloud, splat some values onto the voxels of that grid, and then sample them again using a different set of points.
 
+First, we construct a minibatch of grids using the input points. These input points have corresponding color attributes.
+
 ```python
 import fvdb
 import torch
@@ -48,7 +50,7 @@ pts2, clrs2 = pcu.load_mesh_vn("points2.ply")
 pts1, clrs1 = torch.from_numpy(pts1).cuda(), torch.from_numpy(clrs1).cuda()
 pts2, clrs2 = torch.from_numpy(pts2).cuda(), torch.from_numpy(clrs2).cuda()
 
-# JaggedTensors of points and normals
+# Creating JaggedTensors: one for points and one for colors
 points = fvdb.JaggedTensor([pts1, pts2])
 colors = fvdb.JaggedTensor([clrs1, clrs2])
 
@@ -60,29 +62,27 @@ print(points[0].jdata.shape)
 print(points[1].jdata.shape)
 ```
 
-![We construct a minibatch of grids using the input points. These input points have corresponding color attributes](../imgs/fig/screenshot_000000.png.trim.png)
+![Minibatch of grids constructed from the input points. These input points have corresponding color attributes.](../imgs/fig/screenshot_000000.png.trim.png)
 
-We construct a minibatch of grids using the input points. These input points have corresponding color attributes
+Next, we splat the colors at the points to the constructed grid, yielding per-voxel colors.
 
 ```python
-# Splat the normals into the grid with trilinear interpolation
-# vox_normals is a JaggedTensor of per-voxel normas
+# Splat the colors into the grid with trilinear interpolation
+# vox_colors is a JaggedTensor of per-voxel normas
 vox_colors = grid.splat_trilinear(points, colors)
 ```
 
-![We then splat the colors at the points to the constructed grid, yielding per-voxel colors.](../imgs/fig/screenshot_000006.png.trim.png)
+![Colors splat at the input points to grid, yielding per-voxel colors.](../imgs/fig/screenshot_000006.png.trim.png)
 
-We then splat the colors at the points to the constructed grid, yielding per-voxel colors.
+Finally, we generate a new set of noisy points and sample the grid to recover colors at those new samples.
 
 ```python
 # Now let's generate some random points and sample the grid at those points
-samples = fvdb.JaggedTensor([torch.rand(10_000, 3), torch.rand(11_000, 3)]).cuda()
+sample_points = fvdb.JaggedTensor([torch.rand(10_000, 3), torch.rand(11_000, 3)]).cuda()
 
-# sampled_normals is a JaggedTensor with the same shape as samples with
-# one normal sampled from the grid at each point in samples
-sampled_normals = grid.sample_trilinear(samples)
+# sampled_colors is a JaggedTensor with the same shape as sample_points with
+# one color sampled from the grid at each point
+sampled_colors = grid.sample_trilinear(sample_points, vox_colors)
 ```
 
-![We now generate a new set of noisy points and sample the grid colors to recover colors at those new samples.](../imgs/fig/screenshot_000004.png.trim.png)
-
-We now generate a new set of noisy points and sample the grid colors to recover colors at those new samples.
+![Colors resampled at random locations from the grid.](../imgs/fig/screenshot_000004.png.trim.png)
diff --git a/fvdb/fvdb/nn/modules.py b/fvdb/fvdb/nn/modules.py
@@ -2,14 +2,15 @@
 # SPDX-License-Identifier: MPL-2.0
 #
 import math
-from typing import Optional, Union, List, Sequence
+from typing import List, Optional, Sequence, Union
 
 import torch
 import torch.nn as nn
 from torch.profiler import record_function
 
 import fvdb
 from fvdb import GridBatch, JaggedTensor
+
 from .vdbtensor import VDBTensor
 
 
@@ -267,6 +268,11 @@ def _dispatch_conv(self, in_feature, in_grid, in_kmap, out_grid):
 
         backend = self.backend
 
+        if self.allow_tf32 and self.weight.is_cuda:
+            assert (
+                torch.cuda.get_device_capability()[0] >= 8
+            ), "TF32 requires GPU with compute capability >= 8.0. Please set fvdb.nn.SparseConv3d.allow_tf32 = False."
+
         if backend == "cutlass" and (
             (not self.weight.is_cuda) or (self.in_channels, self.out_channels) not in self.CUTLASS_SUPPORTED_CHANNELS
         ):

diff --git a/fvdb/setup.py b/fvdb/setup.py
@@ -265,6 +265,8 @@ def download_and_install_cudnn():
         "--extended-lambda",
         "--diag-suppress=186",
         "-diag-suppress=3189",
+        "-Xfatbin",
+        "-compress-all",
     ]
     user_nvcc_flags = os.getenv("NVCC_FLAGS", "").split()
     nvcc_flags += user_nvcc_flags