-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hello,
I have tested the model using depth_anything_v2_vits.pth on the KITTI dataset to evaluate its zero-shot performance. The dataset path used is Depth_Anything_V2/metric_depth/dataset/splits/kitti/val2.txt. The resulting abs_rel is around 0.31, which significantly deviates from the 0.078 value reported in Table 2 of the paper.
Here is the code I used for testing:
` kitti_path = "/Depth_Anything_V2/metric_depth/dataset/splits/kitti/val.txt"
kitti_dataset = KITTI(filelist_path=kitti_path, mode='val', size=(518, 518))
dataloader = DataLoader(kitti_dataset, batch_size=1, shuffle=False, num_workers=4)
results = {'d1': torch.tensor([0.0]).cuda(), 'd2': torch.tensor([0.0]).cuda(), 'd3': torch.tensor([0.0]).cuda(),
'abs_rel': torch.tensor([0.0]).cuda(), 'sq_rel': torch.tensor([0.0]).cuda(), 'rmse': torch.tensor([0.0]).cuda(),
'rmse_log': torch.tensor([0.0]).cuda(), 'log10': torch.tensor([0.0]).cuda(), 'silog': torch.tensor([0.0]).cuda()}
nums_image = len(dataloader)
with tqdm(total=nums_image) as bar:
for sample in dataloader:
bar.update(1)
img, depth, valid_mask = sample['image'].cuda().float(), sample['depth'].cuda()[0], sample['valid_mask'].cuda()[0]
valid_mask = (valid_mask == 1) & (depth > 0) & (depth < 80)
eval_mask = torch.zeros_like(valid_mask.squeeze()).bool()
gt_height, gt_width = eval_mask.shape
valid_mask_crop = "eigen"
if "garg" == valid_mask_crop:
eval_mask[
int(0.40810811 * gt_height) : int(0.99189189 * gt_height),
int(0.03594771 * gt_width) : int(0.96405229 * gt_width),
] = 1
elif "eigen" == valid_mask_crop:
eval_mask[
int(0.3324324 * gt_height) : int(0.91351351 * gt_height),
int(0.0359477 * gt_width) : int(0.96405229 * gt_width),
] = 1
eval_mask.reshape(valid_mask.shape)
eval_mask = eval_mask.cuda()
valid_mask = torch.logical_and(valid_mask, eval_mask)
with torch.no_grad():
pred = model(img)
pred = F.interpolate(pred[:, None], depth.shape[-2:], mode='bilinear', align_corners=True)[0, 0]
pred = (pred - pred.min()) / (pred.max() - pred.min())
A = 1 / depth.max()
B = 1 / max(depth.min(), 1) - 1 / depth.max()
pred = B * pred + A
pred = 1 / pred
cur_results = eval_depth(pred[valid_mask], depth[valid_mask])
for k in results.keys():
results[k] += cur_results[k]
for k in results:
results[k] /= nums_image
print(k, results[k].item())
`
Do you have any suggestions on why this might be the case? Is there any additional preprocessing, postprocessing, or testing configuration I should be aware of? Also, would it be possible for you to share the testing code you used in the paper for comparison?
Thank you for your help!