Releases: pyg-team/pyg-lib
pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling
pyg-lib==0.4.0
brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG πππ
Highlights
PyTorch 2.2 Support
pyg-lib==0.4.0
is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu118
or cu121
The following combinations are supported:
PyTorch 2.2 | cpu |
cu118 |
cu121 |
---|---|---|---|
Linux | β | β | β |
macOS | β |
Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md
.
Distributed Sampling
pyg-lib==0.4.0
integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed
in PyG 2.5 (#246, #252, #253, #254).
Sparse Softmax Implementation
pyg-lib==0.4.0
supports a fast sparse softmax_csr
implementation based on CSR input representation (#264, #282):
from pyg_lib.ops import softmax_csr
src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)
Edge-level Temporal Sampling
pyg-lib==0.4.0
brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample
and hetero_neighbor_sample
now support the edge_time
attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time
.
Additional Features
- Added support for
bfloat16
data type insegment_matmul
andgrouped_matmul
on CPU (#272) - Improved the runtime of biased sampling in
neighbor_sample
andhetero_neighbor_sample
(#270)
Bugfixes
- Dropped the MKL code path in
neighbor_sample
andhetero_neighbor_sample
withreplace=False
since it did not correctly prevent duplicates (#275) - Fixed
grouped_matmul
in case input tensors are not contiguous (#290)
New Contributors
Full Changelog: 0.3.0...0.4.0
pyg-lib 0.3.1: Bugfixes
pyg-lib==0.3.1
includes a variety of bugfixes and improvements.
Bug Fixes
- Fixed an issue introduced in
pyg-lib==0.3.0
in which thereplace=False
option was not correctly respected duringneighbor_sample
(#275) - Fixed support for older
GLIBC
versions (#276)
Improvements
- Biased
neighbor_sample
has been made approximately twice as fast (#270) segment_matmul
andgrouped_matmul
now supportbfloat16
CPU tensors (#271)
Full Changelog: 0.3.0...0.3.1
pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements
pyg-lib==0.3.0
brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG πππ
Highlights
PyTorch 2.1 Support
pyg-lib==0.3.0
is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu118
or cu121
The following combinations are supported:
PyTorch 2.1 | cpu |
cu118 |
cu121 |
---|---|---|---|
Linux | β | β | β |
macOS | β |
Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md
. PyTorch 1.11 support has been dropped.
METIS partioning
pyg-lib==0.3.0
enables METIS partioning by introducing pyg_lib.partition
(#229).
from pyg_lib.partition import metis
cluster = metis(rowptr, col, num_partitions)
Neighbor Sampling Improvements
pyg-lib==0.3.0
brings various improvements to our neighbor sampling routine:
- Support for biased/weighted sampling:
pyg_lib.sampler.neighbor_sample
andpyg_lib.sampler.hetero_neighbor_sample
now support the additionaledge_weight
argument (#247, #251) pyg_lib.sampler.hetero_neighbor_sample
now performs neighborhood sampling across edge types in parallel (#211)- Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)
Additional Features
- Added dispatch for XPU device in
index_sort
(#243) - Updated
cutlass
version for speed boosts insegment_matmul
andgrouped_matmul
(#235)
Bugfixes
- Fixed vector-based mapping issue in
Mapping
(#244) - Fixed performance issues reported by Coverity Tool (#240)
- Fixed TorchScript support in
grouped_matmul
(#220)
New Contributors
- @yaox12 made their first contribution in #213
- @yanbing-j made their first contribution in #231
- @akihironitta made their first contribution in #248
Full Changelog: 0.2.0...0.3.0
pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations
pyg-lib==0.2.0
brings PyTorch 2.0 support, sampled operations and further accelerations to PyG πππ
Highlights
PyTorch 2.0 Support
pyg-lib==0.2.0
is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu117
or cu118
The following combinations are supported:
PyTorch 2.0 | cpu |
cu117 |
cu118 |
---|---|---|---|
Linux | β | β | β |
macOS | β |
Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md
.
Sampled Operations
We added support for sampled_op
implementations (#156, #159, #160), which implements the scheme
out = left_tensor[left_index] (op) right_tensor[right_index]
efficiently without materializing intermediate representations:
from pyg_lib.ops import sampled_add
edge_index = ...
row, col = edge_index
# Replace ...
out = x[row] + x[col]
# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)
Supported operations are sampled_add
, sampled_sub
, sampled_mul
and sampled_div
.
Further Accelerations
index_sort
implements a (way) faster alternative to sorting one-dimensional indices compared totorch.sort()
(#181, #192). This heavily increases dataset loading times in PyG:
- Optimized
segment_matmul
andgrouped_matmul
CPU implementations via MKL BLASgemm_batch
(#146, #172):
Breaking Changes
- Temporal
neighbor_sample
andhetero_neighbor_sample
will now sample nodes with the same or smaller timestamp than the seed node (changed from only sampling nodes with a smaller timestamp) (#187)
Full Changelog
Added
- Added PyTorch 2.0 support (#214)
neighbor_sample
routines now also return information about the number of sampled nodes/edges per layer (#197)- Added
index_sort
implementation (#181, #192) - Added
triton>=2.0
support (#171) - Added
bias
term togrouped_matmul
andsegment_matmul
(#161) - Added
sampled_op
implementation (#156, #159, #160)
Changed
- Sample the nodes with the same timestamp as seed nodes (#187)
- Added
write-csv
(saves benchmark results as csv file) andlibraries
(determines which libraries will be used in benchmark) parameters (#167) - Enable benchmarking of neighbor sampler on temporal graphs (#165)
- Improved
[segment|grouped]_matmul
CPU implementation viaat::matmul_out
and MKL BLASgemm_batch
(#146, #172)
Full commit list: 0.1.0...0.2.0
pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration
We are proud to release pyg-lib==0.1.0
, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG πππ
Extensive documentation is provided here. Once pyg-lib
is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.
Installation
You can install pyg-lib
as described in our README.md
:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
where
${TORCH}
should be replaced by either1.11.0
,1.12.0
or1.13.0
${CUDA}
should be replaced by eithercpu
,cu102
,cu113
,cu115
,cu116
orcu117
The following combinations are supported:
PyTorch 1.13 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | |||
Windows | ||||||
macOS | β |
PyTorch 1.12 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | β | ||
Windows | ||||||
macOS | β |
PyTorch 1.11 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | β | ||
Windows | ||||||
macOS | β |
Highlights
pyg_lib.sampler
: Optimized homogeneous and heterogeneous neighborhood sampling
pyg-lib
provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:
pyg_lib.sampler.neighbor_sample(
rowptr: Tensor,
col: Tensor,
seed: Tensor,
num_neighbors: List[int],
time: Optional[Tensor] = None,
seed_time: Optional[Tensor] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
and
pyg_lib.sampler.hetero_neighbor_sample(
rowptr_dict: Dict[EdgeType, Tensor],
col_dict: Dict[EdgeType, Tensor],
seed_dict: Dict[NodeType, Tensor],
num_neighbors_dict: Dict[EdgeType, List[int]],
time_dict: Optional[Dict[NodeType, Tensor]] = None,
seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
pyg_lib.sampler.neighbor_sample
and pyg_lib.sampler.hetero_neighbor_sample
recursively sample neighbors from all node indices in seed
in the graph given by (rowptr, col)
. Also supports temporal sampling via the time
argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time
.
pyg_lib.ops
: Heterogeneous GNN acceleration
pyg-lib
provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:
segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor
pyg_lib.ops.segment_matmul
performs dense-dense matrix multiplication according to segments along the first dimension of inputs
as given by ptr
.
inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)
out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]
Full Changelog
Added
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul
(#137) - Added
fused_scatter_reduce
operation for multiple reductions (#141, #142) - Added
triton
dependency (#133, #134) - Enable
pytest
testing (#132) - Added C++-based autograd and TorchScript support for
segment_matmul
(#120, #122) - Allow overriding
time
for seed nodes viaseed_time
inneighbor_sample
(#118) - Added
[segment|grouped]_matmul
CPU implementation (#111) - Added
temporal_strategy
option toneighbor_sample
(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapper
benchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sample
andpyg::sampler::hetero_neighbor_sample
(#95, #96) - Speed up
pyg::sampler::neighbor_sample
viaIndexTracker
implementation (#84) - Added
pyg::sampler::hetero_neighbor_sample
implementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vector
implementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmul
andsegment_matmul
CUDA implementations viacutlass
(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sample
implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapper
utility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased sampling
utils (#38) - Added
CHANGELOG.md
(#39) - Added
pyg.subgraph()
(#31) - Added nightly builds ([#28](https://github.com...