Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU run out of Storage #5

Open
lunachern opened this issue Mar 29, 2024 · 13 comments
Open

GPU run out of Storage #5

lunachern opened this issue Mar 29, 2024 · 13 comments

Comments

@lunachern
Copy link

lunachern commented Mar 29, 2024

When downloading the model "bigcode/starcoder" and embedder "bert-nli-mean-tokens", although I have deleted the HuggingFace folder and all things of assignment 1 from the device, the disk still run out of storage. Is there any download else I didn't find which takes a lot space?

@lunachern lunachern changed the title Base Model is gated. Where is download.py called? Mar 29, 2024
@lunachern lunachern changed the title Where is download.py called? GPU run out of Storage Mar 29, 2024
@tengwang0318
Copy link

startcoder is so huge that the normal machine couldn't afford it. Try some light-weight model.

@lunachern
Copy link
Author

startcoder is so huge that the normal machine couldn't afford it. Try some light-weight model.

First thanks a lot!
I use it because it is the advised one in the given code.

However now I'm in a much bigger trouble. When I tried to delete sth for more spaces, I saw some unknown folders through FIleZilla. At that time I thought I downloaded sth mistakenly but now I release they are other students' folder on cs2 gpu. I tried to delete everything unknown but turns out that it deleted sth about my settings and anaconda3.

Now I cannot use conda or even reinstall it.

I don't know what to do. So despairing. How I hope I can restart everything!

There are the error message.
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[547266] Failed to execute script 'entry_point' due to unhandled exception!

@tengwang0318
Copy link

trust me. I remember that bigcoder reqiures almost 30 GB for RAM(HBM) when you use float 16 version. Even if you download it and config it successfully, you couldn't run it in the HKU GPU FARM.

What's more, when you delete someone's files, you will face permission denied error. Don't worry about it. Try to exit and config the nevironment again. Maybe it works.

@lunachern
Copy link
Author

trust me. I remember that bigcoder reqiures almost 30 GB for RAM(HBM) when you use float 16 version. Even if you download it and config it successfully, you couldn't run it in the HKU GPU FARM.

What's more, when you delete someone's files, you will face permission denied error. Don't worry about it. Try to exit and config the nevironment again. Maybe it works.

Thanks. However, I failed to delete others while I seems to successfully delete mine! T^T

So I cannot use conda now. And I cannot reinstall it because of "unhandled exception".

@tengwang0318
Copy link

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully.
Source:link

@lunachern
Copy link
Author

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully. Source:link

So, can I fix it by deleting sth? Can I delete the previous Ananconda folder, which seems so big?

Actually it has already been my third time to install it so already two Anaconda files. And it is so big that it has been deleting a long time but still there. And I'm so afraid that I might distroyed something. T.T

@lunachern
Copy link
Author

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully. Source:link

I tried to use conda but failed. Message as below. Could you please help to have a look? Thanks!

Do you accept the license terms? [yes|no]
[no] >>> yes

Anaconda3 will now be installed into this location:
/userhome/cs2/mchenal/anaconda3

  • Press ENTER to confirm the location
  • Press CTRL-C to abort the installation
  • Or specify a different location below

[/userhome/cs2/mchenal/anaconda3] >>>
PREFIX=/userhome/cs2/mchenal/anaconda3
mchenal@gpu2-comp-111:~$ conda create -n nlp_env python=3.10.9
conda: command not found

@tengwang0318
Copy link

tengwang0318 commented Mar 29, 2024

try to remove your files under /userhome/cs2/your_name folder, by using rm -rf. Don't worry about it, you won't destory anything, due to permission.
误删的话,运维背锅XD

Search how to install and config conda in linux.

@lunachern
Copy link
Author

try to remove your files under /userhome/cs2/your_name folder, by using rm -rf. Don't worry about it, you won't destory anything, due to permission. 误删的话,运维背锅XD

Search how to install and config conda in linux.

mchenal@gpu2-comp-111:$ conda create -n nlp_env python=3.10.9
conda: command not found
mchenal@gpu2-comp-111:
$ pip install torch==2.0.1
Command 'pip' not found, but can be installed with:
apt install python3-pip
Please ask your administrator.
mchenal@gpu2-comp-111:~$ apt install python3-pip
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?

Not only conda, even pip is missing.... crying face

@tengwang0318
Copy link

try bash Miniconda3-latest-Linux-x86_64.sh

you don't have the permission to use apt in the HKU GPU server.

@lunachern
Copy link
Author

try bash Miniconda3-latest-Linux-x86_64.sh

you don't have the permission to use apt in the HKU GPU server.

Trying. Conda is back now! Still installing packages. Hope everything goes well. Millions of thanks!!!!

@tengwang0318
Copy link

miniconda is mini version of conda, whereas ananconda installs more packages than miniconda has. No difference.

@lunachern
Copy link
Author

miniconda is mini version of conda, whereas ananconda installs more packages than miniconda has. No difference.

Solved. THANK YOU! 大神太感谢了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants