Multi-Cali Anything: Dense Feature Multi-Frame Structure-from-Motion for Large-Scale Camera Array Calibration

1CMU, 2Harvard, 3HKU, 4Cornell, 5Meta Reality Labs, *Equal Contributions
IROS 2025
Multi-Cali Anything Architecture

The overall pipeline of our proposed method. Inputs include multi-frame, multi-view images, and camera extrinsics, with outputs being camera intrinsics and SfM sparse reconstructions. (a) to (d) illustrate several key components.

Abstract

Calibrating large-scale camera arrays, such as those used in dome-based setups, is time-intensive and often relies on dedicated checkerboard captures. While extrinsic parameters are typically fixed due to the physical structure, intrinsic parameters can vary across sessions because of lens adjustments or environmental factors like temperature. We introduce Multi-Cali Anything, a dense-feature-driven, multi-frame calibration method tailored for large-scale camera arrays. Unlike traditional methods, our approach refines intrinsics directly from scene data—eliminating the need for additional calibration captures. Built as a plug-and-play add-on to existing Structure-from-Motion (SfM) pipelines (e.g., COLMAP, Pixel-Perfect SfM), Multi-Cali Anything utilizes sparse reconstruction results and enhances them through dense feature refinement. Our method incorporates: (1) an extrinsics regularization term to progressively align estimated extrinsics with ground-truth values, (2) a dense feature reprojection loss to reduce keypoint errors in the feature space, and (3) an intrinsics variance term to ensure consistency across multiple frames. Experiments on the Multiface dataset demonstrate that our method achieves calibration precision comparable to dedicated calibration procedures, while significantly improving intrinsic parameter estimates and 3D reconstruction accuracy. Efficient, scalable, and fully compatible with existing SfM workflows, Multi-Cali Anything offers a practical solution for challenging calibration scenarios where traditional methods are impractical or infeasible.


Why Multi-Cali Anything?

No Calibration Patterns

Uses scene data directly, eliminating the need for dedicated calibration patterns and captures.

Seamless Integration

Works as a simple, powerful add-on for existing SfM pipelines like COLMAP or Pixel-Perfect SfM.

Dense Feature Refinement

Reduces keypoint errors before optimization, leading to higher accuracy.

Multi-Frame Optimization

Ensures consistent and globally optimal camera intrinsics across multiple captured frames.

Efficient Processing

Suitable for large-scale camera arrays and extensive frame captures.

High Accuracy

Achieves precision that is nearly identical to dedicated calibration processes.

How It Works

First, users run any SfM pipeline, such as COLMAP or Pixel-Perfect SfM, to generate initial sparse reconstructions, including camera parameters and sparse 3D models. Then, they apply Multi-Cali Anything to refine the intrinsics and generate improved 3D reconstructions using dense feature optimization.

Quantitative Comparison

The table below presents a quantitative comparison on the Multiface Dataset, evaluating state-of-the-art methods alongside ours across multiple metrics.
Darker colors indicate better performance (i.e., lower values), while lighter colors represent worse performance.
For each baseline method, we also report results when ground-truth extrinsic parameters are provided. These are marked with an asterisk "*" following the method name.

Results Gallery

Reprojection Errors

Visualization of reprojection errors using intrinsics estimated by different methods: COLMAP (1st row), Pixel-Perfect SfM (2nd row), VGGSfM (3rd row), and our method (4th row). A few 3D points from the ground-truth mesh are projected onto images using both ground-truth intrinsics (red crossings) and estimated intrinsics (green dots). Our method produces the smallest reprojection error, demonstrating superior calibration accuracy.

Reprojection Errors Comparison

Multi-View Stereo Reconstructions

Comparative visualization of multi-view stereo (MVS) reconstruction using the COLMAP MVS pipeline with intrinsics from COLMAP, Pixel-Perfect SfM, VGGSfM, and our method. Point-to-ground-truth mesh distances are computed and color-encoded: red for negative deviation, blue for positive deviation, and green for near-zero error. Our method shows more green points and tighter error distributions in histograms, indicating better alignment with the ground-truth and superior intrinsic calibration.

Multi-View Stereo Results

DUSt3R Reconstructions

Comparative DUSt3R reconstruction results using ground-truth extrinsics. All models are aligned and scaled to the same pose for visualization. Reconstructions with DUSt3R’s intrinsics show significant noise, depth inconsistencies, and scale mismatch. In contrast, reconstructions using our refined intrinsics yield more accurate geometry and consistent scale, demonstrating improved reconstruction quality for both heads.

DUSt3R Reconstruction Results

BibTeX

@article{you2025multi,
    title={Multi-Cali Anything: Dense Feature Multi-Frame Structure-from-Motion for Large-Scale Camera Array Calibration},
    author={You, Jinjiang and Wang, Hewei and Li, Yijie and Huo, Mingxiao and Ha, Long Van Tran and Ma, Mingyuan and Xu, Jinfeng and Wu, Puzhen and Garg, Shubham and Pu, Wei},
    journal={arXiv preprint arXiv:2503.00737},
    year={2025}
}