{"paper":{"title":"CalibAnyView: Beyond Single-View Camera Calibration in the Wild","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"CalibAnyView enables camera calibration from any number of views in the wild by enforcing cross-view geometric consistency.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Boying Li, Cheng Zhang, Daniel Cremers, Hamid Rezatofighi, Ian Reid, Weirong Chen","submitted_at":"2026-05-14T09:32:12Z","abstract_excerpt":"Camera calibration is a fundamental prerequisite for reliable geometric perception, yet classical approaches rely on controlled acquisition setups that are impractical for in-the-wild imagery. Recent learning-based methods have shown promising results for single-view calibration, but inherently neglect geometric consistency across multiple views. We introduce CalibAnyView, a unified formulation that supports an arbitrary number of input views ($N \\geq 1$) by explicitly modeling cross-view geometric consistency. To facilitate this, we construct a large-scale multi-view video dataset covering di"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"CalibAnyView consistently outperforms state-of-the-art methods, achieves strong robustness under single-view settings, and further improves with multi-view inference.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The constructed large-scale multi-view video dataset sufficiently covers the diversity of real-world camera models, dynamic scenes, motion trajectories, and lens distortions so that the learned model generalizes beyond the training distribution.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A multi-view transformer predicts dense perspective fields that feed a geometric optimizer to estimate camera intrinsics and gravity from arbitrary numbers of real-world views.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"CalibAnyView enables camera calibration from any number of views in the wild by enforcing cross-view geometric consistency.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c1953e2c62b12106d16d10a88780c5208acdca1c2b998be04e02baa03ad21a48"},"source":{"id":"2605.14615","kind":"arxiv","version":1},"verdict":{"id":"1f5bf76f-6aa7-46d1-96f5-0f13ca415c11","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T05:56:58.310994Z","strongest_claim":"CalibAnyView consistently outperforms state-of-the-art methods, achieves strong robustness under single-view settings, and further improves with multi-view inference.","one_line_summary":"A multi-view transformer predicts dense perspective fields that feed a geometric optimizer to estimate camera intrinsics and gravity from arbitrary numbers of real-world views.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The constructed large-scale multi-view video dataset sufficiently covers the diversity of real-world camera models, dynamic scenes, motion trajectories, and lens distortions so that the learned model generalizes beyond the training distribution.","pith_extraction_headline":"CalibAnyView enables camera calibration from any number of views in the wild by enforcing cross-view geometric consistency."},"references":{"count":61,"sample":[{"doi":"","year":2011,"title":"Communications of the ACM54(10), 105–112 (2011)","work_id":"e98ad0ee-c84a-4110-a036-4198d635688d","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Joint 2d-3d-semantic data for indoor scene understanding","work_id":"cd49417b-13e3-4652-87f2-c992e78d093a","ref_index":2,"cited_arxiv_id":"1702.01105","is_internal_anchor":true},{"doi":"","year":2025,"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","ref_index":3,"cited_arxiv_id":"2502.13923","is_internal_anchor":true},{"doi":"","year":2011,"title":"In: CVPR 2011","work_id":"ececda63-0d1a-4aa7-8188-c69cee730905","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Bogdan, O., Eckstein, V., Rameau, F., Bazin, J.C.: Deepcalib: A deep learning ap- proach for automatic intrinsic calibration of wide field-of-view cameras. In: CVMP. pp. 1–10 (2018)","work_id":"31c5c3a5-a4b7-4f79-a177-d23da30048ce","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":61,"snapshot_sha256":"d6da3f5bdb80452c222ee150623c4a3b2d4c16f901f0b0e0a405ded8da8ced13","internal_anchors":5},"formal_canon":{"evidence_count":2,"snapshot_sha256":"3922add496d94c357b78d0eb30a471ac4439a70c70f15f0f9cd19fa30e8e809b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}