OutOfMemoryError when >= 200 entities & using GPU #17

Gintasz · 2023-08-25T20:03:19Z

I was investigating models to obtain rankings for pairwise comparison and wondered if this model would suit my needs.
In my project, I'd have < 10,000 entities that could be pairwise compared with each other.

I tried the current implementation on this repository and there is a memory allocation issue when the number of entities is larger (e.g. 200) and GPU is used. I don't have the theoretical background to evaluate the algorithm, but would this issue be related only to the implementation and not the algorithm, thus potentially resolvable, or not?

import ranking_models.asap.asap_gpu as asap_gpu
import numpy as np
N = 200
pwc_mat = np.random.randint(0, 100, size=(N, N))
pairs, scores_mean, scores_std = asap_gpu.ASAP(pwc_mat, mst_mode=True, cuda=True, get_scores = True)

print("Indeces from pwc_mat to compare:")
print(pairs)
print("Scores means \n",scores_mean)
print("Scores standard deviaion \n", scores_std)

Traceback (most recent call last):
  File "/root/test/test_asap_gpu.py", line 5, in <module>
    pairs, scores_mean, scores_std = asap_gpu.ASAP(pwc_mat, mst_mode=True, cuda=True, get_scores = True)
  File "/root/test/asap/asap_gpu.py", line 158, in ASAP
    G = torch.zeros(G0.size(0),I.size(0),G0.size(1)+1).to(G0)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 35.23 GiB (GPU 0; 23.65 GiB total capacity; 1.52 MiB already allocated; 23.14 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

machur · 2023-08-29T10:42:45Z

@Gintasz I have the same issue with GPU version - the implementation doesn't scale well in terms of memory footprint. I have a couple of A100s and 128GB RAM at hand, but I still cannot run it for matrices bigger than 150x150 due to the allocation of that huge G matrix that you mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfMemoryError when >= 200 entities & using GPU #17

OutOfMemoryError when >= 200 entities & using GPU #17

Gintasz commented Aug 25, 2023

machur commented Aug 29, 2023 •

edited

Loading

OutOfMemoryError when >= 200 entities & using GPU #17

OutOfMemoryError when >= 200 entities & using GPU #17

Comments

Gintasz commented Aug 25, 2023

machur commented Aug 29, 2023 • edited Loading

machur commented Aug 29, 2023 •

edited

Loading