Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError when >= 200 entities & using GPU #17

Open
Gintasz opened this issue Aug 25, 2023 · 1 comment
Open

OutOfMemoryError when >= 200 entities & using GPU #17

Gintasz opened this issue Aug 25, 2023 · 1 comment

Comments

@Gintasz
Copy link

Gintasz commented Aug 25, 2023

I was investigating models to obtain rankings for pairwise comparison and wondered if this model would suit my needs.
In my project, I'd have < 10,000 entities that could be pairwise compared with each other.

I tried the current implementation on this repository and there is a memory allocation issue when the number of entities is larger (e.g. 200) and GPU is used. I don't have the theoretical background to evaluate the algorithm, but would this issue be related only to the implementation and not the algorithm, thus potentially resolvable, or not?

import ranking_models.asap.asap_gpu as asap_gpu
import numpy as np
N = 200
pwc_mat = np.random.randint(0, 100, size=(N, N))
pairs, scores_mean, scores_std = asap_gpu.ASAP(pwc_mat, mst_mode=True, cuda=True, get_scores = True)

print("Indeces from pwc_mat to compare:")
print(pairs)
print("Scores means \n",scores_mean)
print("Scores standard deviaion \n", scores_std)
Traceback (most recent call last):
  File "/root/test/test_asap_gpu.py", line 5, in <module>
    pairs, scores_mean, scores_std = asap_gpu.ASAP(pwc_mat, mst_mode=True, cuda=True, get_scores = True)
  File "/root/test/asap/asap_gpu.py", line 158, in ASAP
    G = torch.zeros(G0.size(0),I.size(0),G0.size(1)+1).to(G0)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 35.23 GiB (GPU 0; 23.65 GiB total capacity; 1.52 MiB already allocated; 23.14 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@machur
Copy link

machur commented Aug 29, 2023

@Gintasz I have the same issue with GPU version - the implementation doesn't scale well in terms of memory footprint. I have a couple of A100s and 128GB RAM at hand, but I still cannot run it for matrices bigger than 150x150 due to the allocation of that huge G matrix that you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants