Skip to content

Depth Anything V2 for KITTI  #8

@parallelsucc

Description

@parallelsucc

Hello,

I have tested the model using depth_anything_v2_vits.pth on the KITTI dataset to evaluate its zero-shot performance. The dataset path used is Depth_Anything_V2/metric_depth/dataset/splits/kitti/val2.txt. The resulting abs_rel is around 0.31, which significantly deviates from the 0.078 value reported in Table 2 of the paper.

Here is the code I used for testing:

` kitti_path = "/Depth_Anything_V2/metric_depth/dataset/splits/kitti/val.txt"
kitti_dataset = KITTI(filelist_path=kitti_path, mode='val', size=(518, 518))

    dataloader = DataLoader(kitti_dataset, batch_size=1, shuffle=False, num_workers=4)

    results = {'d1': torch.tensor([0.0]).cuda(), 'd2': torch.tensor([0.0]).cuda(), 'd3': torch.tensor([0.0]).cuda(), 
                'abs_rel': torch.tensor([0.0]).cuda(), 'sq_rel': torch.tensor([0.0]).cuda(), 'rmse': torch.tensor([0.0]).cuda(), 
                'rmse_log': torch.tensor([0.0]).cuda(), 'log10': torch.tensor([0.0]).cuda(), 'silog': torch.tensor([0.0]).cuda()}


    nums_image = len(dataloader)
    with tqdm(total=nums_image) as bar:
        for sample in dataloader:
            bar.update(1)
            img, depth, valid_mask = sample['image'].cuda().float(), sample['depth'].cuda()[0], sample['valid_mask'].cuda()[0]
            
            valid_mask = (valid_mask == 1) & (depth > 0) & (depth < 80)

            eval_mask = torch.zeros_like(valid_mask.squeeze()).bool()
            gt_height, gt_width = eval_mask.shape

            valid_mask_crop = "eigen"

            if "garg" == valid_mask_crop:
                eval_mask[
                    int(0.40810811 * gt_height) : int(0.99189189 * gt_height),
                    int(0.03594771 * gt_width) : int(0.96405229 * gt_width),
                ] = 1
            elif "eigen" == valid_mask_crop:
                eval_mask[
                    int(0.3324324 * gt_height) : int(0.91351351 * gt_height),
                    int(0.0359477 * gt_width) : int(0.96405229 * gt_width),
                ] = 1

            
            eval_mask.reshape(valid_mask.shape)
            eval_mask = eval_mask.cuda()
            valid_mask = torch.logical_and(valid_mask, eval_mask)

            with torch.no_grad():
                pred = model(img)
                pred = F.interpolate(pred[:, None], depth.shape[-2:], mode='bilinear', align_corners=True)[0, 0]

            pred = (pred - pred.min()) / (pred.max() - pred.min())

            A = 1 / depth.max()
            B = 1 / max(depth.min(), 1) - 1 / depth.max()

            pred = B * pred + A
            pred = 1 / pred

            cur_results = eval_depth(pred[valid_mask], depth[valid_mask])

            for k in results.keys():
                results[k] += cur_results[k]
            
    for k in results:
        results[k] /= nums_image
        print(k, results[k].item())

`

Do you have any suggestions on why this might be the case? Is there any additional preprocessing, postprocessing, or testing configuration I should be aware of? Also, would it be possible for you to share the testing code you used in the paper for comparison?

Thank you for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions