how to train the model #16

Sally551 · 2024-01-16T16:05:04Z

hi, i am training the model using kitti dataset only, but i faced a problem. When I trained the model, it appears that the ./ssd/kitti scene/training/calib/000044.txt doesn't exists, and may i ask that is there a calib file for kitti dataset training, or calib cam_to_cam is the calib file. Here is a list of files i can fetch in the training section. Is a calib file provided but not in my list. If there is, could you please share me the link of that calib file?

gengshan-y · 2024-01-16T16:41:31Z

Here you go calib.zip

Sally551 · 2024-01-17T01:36:59Z

./ssd//kitti_scene/training/disp_occ_0_ganet/000044_10.png
what about this file /disp_occ_0_ganet/

Sally551 · 2024-01-17T01:43:00Z

gengshan-y · 2024-01-17T04:20:42Z

What about this: https://drive.google.com/file/d/1hBBrhq8lvSEmQNOAqY_xZ98NZ5AD65Ct/view?usp=sharing

Sally551 · 2024-01-17T14:53:32Z

RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0
how can i deal with this problem. I am currently training only using 1 GPU. Also, when I am training with 4 GPUs, there is a stuck with distributed dataparallel training as well. How to cope with this problem?

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1603729062494/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8
(this is also a problem i am facing when i am running with 4 CUDA)

gengshan-y · 2024-01-18T15:49:32Z

Typically you can solve the first one by moving the tensor that was on CPU to cuda with .cuda().

I haven't seen the second issue before but with a google search there seems to be some solutions like this

Sally551 · 2024-01-22T13:56:53Z

thx, i have already run some niters, but not all of them becasue of time limit. I used one of the checkpoint from the predefined logname ( finetune_49999.pth for example). I just want to evaluate kitti-sceneflow (stereo tab 6), and also generate results for kitti-sceneflow benchmark (stereo setup, Tab. 3),
but i failed to do so

it said i have mismatch of the size, but i can do it with the pretrained weights. i don't know where is the problem, and here is my command:CUDA_VISIBLE_DEVICES=1 python submission.py --dataset 2015test --datapath ./ssd/kitti_scene/testing/ --outdir ./weights/test1/ --loadmodel ./weights/test1/finetune_49999.pth --disp_path input/disp/kittisf-test-ganet-disp/ --fac 2 --maxdisp 512 --refine --sensor stereo

here is my check point file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to train the model #16

how to train the model #16

Sally551 commented Jan 16, 2024

gengshan-y commented Jan 16, 2024

Sally551 commented Jan 17, 2024

Sally551 commented Jan 17, 2024

gengshan-y commented Jan 17, 2024

Sally551 commented Jan 17, 2024

gengshan-y commented Jan 18, 2024

Sally551 commented Jan 22, 2024

how to train the model #16

how to train the model #16

Comments

Sally551 commented Jan 16, 2024

gengshan-y commented Jan 16, 2024

Sally551 commented Jan 17, 2024

Sally551 commented Jan 17, 2024

gengshan-y commented Jan 17, 2024

Sally551 commented Jan 17, 2024

gengshan-y commented Jan 18, 2024

Sally551 commented Jan 22, 2024