The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

fangxu622 · 2025-01-20T04:09:39Z

seven RTX 3090 GPUs.

the console:

@szpu:~/Code/CityGaussian/LargeLightGaussian$ bash ./scripts/run_prune_finetune_rubble.sh
GPU 3 is available. Starting prune_finetune.py with dataset 'rubble_c9_r4', prune_percent '0.4', prune_type 'v_important_score', prune_decay '1', and v_pow '0.1' on port 7041

rubble_c9_r4_light_40_prunned.log :

Reading camera 1656/1657
Reading camera 1657/1657ic| self.max_sh_degree: 3
ic| 3 * (self.max_sh_degree + 1) ** 2 - 3: 45
ic| gaussians.get_xyz.shape: torch.Size([9642167, 3])
[20/01 12:07:44]
Number of points at initialisation : 1694315 [20/01 12:07:45]
#722754 dataloader seed to 42 [20/01 12:07:45]

Training progress: 0%| | 0/30000 [00:00<?, ?it/s]ic| "Before prune iteration, number of gaussians: " + str(len(gaussians.get_xyz)): 'Before prune iteration, number of gaussians: 9642167'

got stuck in Training progress phase

how to resolve it

DekuLiuTesla · 2025-01-21T03:04:01Z

Hi, @fangxu622, this part can stuck for a while, since the model is calculating importance score and decide which points to prune.

fangxu622 · 2025-01-21T14:18:18Z

Hi, @fangxu622, this part can stuck for a while, since the model is calculating importance score and decide which points to prune.
thanks for your reply !

I encountered some problems while executing the following steps

cd LargeLightGaussian
bash scripts/run_prune_finetune_$your_scene.sh
bash scripts/run_distill_finetune_$your_scene.sh
bash scripts/run_vectree_quantize_$your_scene.sh
cd ..

after the script /run_prune_finetune_rubble.sh was executed, the corresponding model file was not saved except for chkpt0.pth. and the console show : all completed . But it looks like uncompeled.

then:
when executing next script such as scripts/run_distill_finetune_rubble.sh , it will show the error "No such file XXX 30000 XXX "

and then GPU Server will restart. It looks like consuming a lot of memory(RAM) and result in restarting.

DekuLiuTesla · 2025-02-14T06:12:48Z

@fangxu622 one possible solution is first to try if 60% compression works well. If so, VRAM might be the key problem. Consiering RAM, you can refer to issue #53.

DekuLiuTesla closed this as completed Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

fangxu622 commented Jan 20, 2025

DekuLiuTesla commented Jan 21, 2025

fangxu622 commented Jan 21, 2025 •

edited

Loading

DekuLiuTesla commented Feb 14, 2025

The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

The script run_prune_finetune_rubble.sh got stuck during execution, causing the server to crash and restart. #90

Comments

fangxu622 commented Jan 20, 2025

DekuLiuTesla commented Jan 21, 2025

fangxu622 commented Jan 21, 2025 • edited Loading

DekuLiuTesla commented Feb 14, 2025

fangxu622 commented Jan 21, 2025 •

edited

Loading