AnyCalib:
On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín Javier Civera
I3A, University of Zaragoza

Camera calibration from a single perspective/edited/distorted image using a freely chosen camera model

Usage (pretrained models)

The only requirements are Python (≥3.10) and PyTorch. The project, in development mode, can be installed with:

git clone https://github.com/javrtg/AnyCalib.git && cd AnyCalib
pip install -e .

Alternatively, and optionally, a compatible version of xformers can also be installed for better efficiency by running the following instead of pip install -e .:

pip install -e .[eff]

Minimal usage example

import numpy as np
import torch
from PIL import Image  # the library of choice to load images

from anycalib import AnyCalib


dev = torch.device("cuda")

# load input image and convert it to a (3, H, W) tensor with RGB values in [0, 1]
image = np.array(Image.open("path/to/image.jpg").convert("RGB"))
image = torch.tensor(image, dtype=torch.float32, device=dev).permute(2, 0, 1) / 255

# instantiate AnyCalib according to the desired model_id. Options:
# "anycalib_pinhole": model trained with *only* perspective (pinhole) images,
# "anycalib_gen": trained with perspective, distorted and strongly distorted images,
# "anycalib_dist": trained with distorted and strongly distorted images,
# "anycalib_edit": Trained on edited (stretched and cropped) perspective images.
model = AnyCalib(model_id="anycalib_pinhole").to(dev)

# Alternatively, the weights can be loaded from the huggingface hub as follows:
# NOTE: huggingface_hub (https://pypi.org/project/huggingface-hub/) needs to be installed
# model = AnyCalib().from_pretrained(model_id=<model_id>).to(dev)

# predict according to the desired camera model. Implemented camera models are detailed further below.
output = model.predict(image, cam_id="pinhole")
# output is a dictionary with the following key-value pairs:
# {
#      "intrinsics": (D,) tensor with the estimated intrinsics for the selected camera model,
#      "fov_field": (N, 2) tensor with the regressed FoV field by the network. N≈320^2 (resolution close to the one seen during training),
#      "tangent_coords": alias for "fov_field",
#      "rays": (N, 3) tensor with the corresponding (via the exponential map) ray directions in the camera frame (x right, y down, z forward),
#      "pred_size": (H, W) tuple with the image size used by the network. It can be used e.g. for resizing the FoV/ray fields to the original image size.
# }

The weights of the selected model_id, if not already downloaded, will be automatically downloaded to the:

torch hub cache directory (torch.hub.get_dir()) if AnyCalib(model_id=<model_id>) is used, or
huggingface cache directory if AnyCalib().from_pretrained(model_id=<model_id>) is used.

Additional configuration options are indicated in the docstring of AnyCalib:

help(AnyCalib)

    """AnyCalib class.

    Args for instantiation:
        model_id: one of {'anycalib_pinhole', 'anycalib_gen', 'anycalib_dist', 'anycalib_edit'}.
            Each model differes in the type of images they seen during training:
                * 'anycalib_pinhole': Perspective (pinhole) images,
                * 'anycalib_gen': General images, including perspective, distorted and
                    strongly distorted images, and
                * 'anycalib_dist': Distorted images using the Brown-Conrady camera model
                    and strongly distorted images, using the EUCM camera model,
                * 'anycalib_edit': Trained on edited (stretched and cropped) perspective
                    images.
            Default: 'anycalib_pinhole'.
        nonlin_opt_method: nonlinear optimization method: 'gauss_newton' or 'lev_mar'.
            Default: 'gauss_newton'
        nonlin_opt_conf: nonlinear optimization configuration.
            This config can be used to control the number of iterations and the space
            where the residuals are minimized. See the classes `GaussNewtonCalib` or
            `LevMarCalib` under anycalib/optim for details. Default: None.
        init_with_sac: use RANSAC instead of nonminimal fit for initializating the
            intrinsics. Default: False.
        fallback_to_sac: use RANSAC if nonminimal fit fails. Default: True.
        ransac_conf: RANSAC configuration. This config can be used to control e.g. the
            inlier threshold or the number of minimal samples to try. See the class
            `RANSAC` in anycalib/ransac.py for details. Default: None.
        rm_borders: border size of the dense FoV fields to ignore during fitting.
            Default: 0.
        sample_size: approximate number of 2D-3D correspondences to use for fitting the
            intrinsics. Negative value -> no subsampling. Default: -1.
    """

Minimal batched example

AnyCalib can also be executed in batch and using possibly different camera models for each image. For example:

images = ... # (B, 3, H, W)
# NOTE: if cam_ids is a list, then len(cam_ids) must be equal to B
cam_ids = ["pinhole", "radial:1", "kb:4"]  # different camera models for each image
cam_ids = "pinhole"  # same camera model across images
output = model.predict(images, cam_id=cam_ids)
# corresponding batched output dictionary:
# {
#      "intrinsics": List[(D_i,) tensors] for each camera model "i",
#      "fov_field": (B, N, 2) tensor,
#      "tangent_coords": alias for "fov_field",
#      "rays": (B, N, 3) tensor,
#      "pred_size": (H, W).
# }

Currently implemented camera models

cam_id represents the camera model identifier(s) that can be used in the predict method.
D corresponds to the number of intrinsics of the camera model. It determines the length of each intrinsics tensor in the output dictionary.

`cam_id`	Description	`D`	Intrinsics
`pinhole`	Pinhole camera model	4	$f_x,~f_y,~c_x,~c_y$
`simple_pinhole`	`pinhole` with one focal length	3	$f,~c_x,~c_y$
`radial:k`	Radial (Brown-Conrady) [1] camera model with `k` $\in$ [1, 4] distortion coefficients	4+`k`	$f_x,~f_y,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$
`simple_radial:k`	`radial:k` with one focal length	3+`k`	$f,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$
`kb:k`	Kannala-Brandt [2] camera model with `k` $\in$ [1, 4] distortion coefficients	4+`k`	$f_x,~f_y,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$
`simple_kb:k`	`kb:k` with one focal length	3+`k`	$f,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$
`ucm`	Unified Camera Model [3]	5	$f_x,~f_y,~c_x,~c_y$ $k$
`simple_ucm`	`ucm` with one focal length	4	$f,~c_x,~c_y$ $k$
`eucm`	Enhanced Unified Camera Model [4]	6	$f_x,~f_y,~c_x,~c_y$ $k_1,~k_2$
`simple_eucm`	`eucm` with one focal length	5	$f,~c_x,~c_y$ $k_1,~k_2$
`division:k`	Division camera model [5] with `k` $\in$ [1, 4] distortion coefficients	4+`k`	$f_x,~f_y,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$
`simple_division:k`	`division:k` with one focal length	3+`k`	$f,~c_x,~c_y$ $k_1[,~k_2[,~k_3[,~k_4]]]$

In addition to the original works, we recommend the works of Usenko et al. [6] and Lochman et al. [7] for a comprehensive comparison of the different camera models.

Evaluation

The evaluation and training code is built upon the siclib library from GeoCalib, which can be installed as:

pip install -e siclib

Running the evaluation commands will write the results to outputs/results/.

LaMAR

Running the evaluation commands will download the dataset to data/lamar2k which will take around 400 MB of disk space.

AnyCalib trained on $\mathrm{OP_{p}}$:

python -m siclib.eval.lamar2k_rays --conf anycalib_pretrained --tag anycalib_p --overwrite