Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

Closed
fangxu622 opened this issue Jan 20, 2025 · 3 comments

Comments

@fangxu622
Copy link

seven RTX 3090 GPUs.

the console:

@szpu:~/Code/CityGaussian/LargeLightGaussian$ bash ./scripts/run_prune_finetune_rubble.sh
GPU 3 is available. Starting prune_finetune.py with dataset 'rubble_c9_r4', prune_percent '0.4', prune_type 'v_important_score', prune_decay '1', and v_pow '0.1' on port 7041

rubble_c9_r4_light_40_prunned.log :

Reading camera 1656/1657
Reading camera 1657/1657ic| self.max_sh_degree: 3
ic| 3 * (self.max_sh_degree + 1) ** 2 - 3: 45
ic| gaussians.get_xyz.shape: torch.Size([9642167, 3])
[20/01 12:07:44]
Number of points at initialisation : 1694315 [20/01 12:07:45]
#722754 dataloader seed to 42 [20/01 12:07:45]

Training progress: 0%| | 0/30000 [00:00<?, ?it/s]ic| "Before prune iteration, number of gaussians: " + str(len(gaussians.get_xyz)): 'Before prune iteration, number of gaussians: 9642167'

got stuck in Training progress phase

how to resolve it

@DekuLiuTesla
Copy link
Owner

Hi, @fangxu622, this part can stuck for a while, since the model is calculating importance score and decide which points to prune.

@fangxu622
Copy link
Author

fangxu622 commented Jan 21, 2025

Hi, @fangxu622, this part can stuck for a while, since the model is calculating importance score and decide which points to prune.
thanks for your reply !

I encountered some problems while executing the following steps

cd LargeLightGaussian
bash scripts/run_prune_finetune_$your_scene.sh
bash scripts/run_distill_finetune_$your_scene.sh
bash scripts/run_vectree_quantize_$your_scene.sh
cd ..

after the script /run_prune_finetune_rubble.sh was executed, the corresponding model file was not saved except for chkpt0.pth. and the console show : all completed . But it looks like uncompeled.

then:
when executing next script such as scripts/run_distill_finetune_rubble.sh , it will show the error "No such file XXX 30000 XXX "

Image

Image

and then GPU Server will restart. It looks like consuming a lot of memory(RAM) and result in restarting.

@DekuLiuTesla
Copy link
Owner

@fangxu622 one possible solution is first to try if 60% compression works well. If so, VRAM might be the key problem. Consiering RAM, you can refer to issue #53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants