Depth Anything V2 for KITTI 

Hello,

I have tested the model using depth_anything_v2_vits.pth on the KITTI dataset to evaluate its zero-shot performance. The dataset path used is Depth_Anything_V2/metric_depth/dataset/splits/kitti/val2.txt. The resulting abs_rel is around 0.31, which significantly deviates from the 0.078 value reported in Table 2 of the paper.

Here is the code I used for testing:

`        kitti_path = "/Depth_Anything_V2/metric_depth/dataset/splits/kitti/val.txt"
        kitti_dataset = KITTI(filelist_path=kitti_path, mode='val', size=(518, 518))

        dataloader = DataLoader(kitti_dataset, batch_size=1, shuffle=False, num_workers=4)

        results = {'d1': torch.tensor([0.0]).cuda(), 'd2': torch.tensor([0.0]).cuda(), 'd3': torch.tensor([0.0]).cuda(), 
                    'abs_rel': torch.tensor([0.0]).cuda(), 'sq_rel': torch.tensor([0.0]).cuda(), 'rmse': torch.tensor([0.0]).cuda(), 
                    'rmse_log': torch.tensor([0.0]).cuda(), 'log10': torch.tensor([0.0]).cuda(), 'silog': torch.tensor([0.0]).cuda()}


        nums_image = len(dataloader)
        with tqdm(total=nums_image) as bar:
            for sample in dataloader:
                bar.update(1)
                img, depth, valid_mask = sample['image'].cuda().float(), sample['depth'].cuda()[0], sample['valid_mask'].cuda()[0]
                
                valid_mask = (valid_mask == 1) & (depth > 0) & (depth < 80)

                eval_mask = torch.zeros_like(valid_mask.squeeze()).bool()
                gt_height, gt_width = eval_mask.shape

                valid_mask_crop = "eigen"

                if "garg" == valid_mask_crop:
                    eval_mask[
                        int(0.40810811 * gt_height) : int(0.99189189 * gt_height),
                        int(0.03594771 * gt_width) : int(0.96405229 * gt_width),
                    ] = 1
                elif "eigen" == valid_mask_crop:
                    eval_mask[
                        int(0.3324324 * gt_height) : int(0.91351351 * gt_height),
                        int(0.0359477 * gt_width) : int(0.96405229 * gt_width),
                    ] = 1

                
                eval_mask.reshape(valid_mask.shape)
                eval_mask = eval_mask.cuda()
                valid_mask = torch.logical_and(valid_mask, eval_mask)

                with torch.no_grad():
                    pred = model(img)
                    pred = F.interpolate(pred[:, None], depth.shape[-2:], mode='bilinear', align_corners=True)[0, 0]

                pred = (pred - pred.min()) / (pred.max() - pred.min())

                A = 1 / depth.max()
                B = 1 / max(depth.min(), 1) - 1 / depth.max()

                pred = B * pred + A
                pred = 1 / pred

                cur_results = eval_depth(pred[valid_mask], depth[valid_mask])

                for k in results.keys():
                    results[k] += cur_results[k]
                
        for k in results:
            results[k] /= nums_image
            print(k, results[k].item())
`

 Do you have any suggestions on why this might be the case? Is there any additional preprocessing, postprocessing, or testing configuration I should be aware of? Also, would it be possible for you to share the testing code you used in the paper for comparison?

Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depth Anything V2 for KITTI #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Depth Anything V2 for KITTI #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions