A linear method for camera pair self-calibration and multi-view reconstruction with geometrically verified correspondences
Pith reviewed 2026-05-25 13:57 UTC · model grok-4.3
The pith
A linear method recovers metric reconstructions of camera pairs from projective ones when focal lengths are unknown and different.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the stated assumptions a linear method yields two mirror-image metric camera configurations from the projective pair; the viewing directions of the two solutions stand in explicit geometric relation; the correct configuration is identified by the cheirality condition; and a separate ordering test on image-axis coordinates removes inconsistent correspondences before reconstruction proceeds.
What carries the argument
Linear recovery of the two mirror-position camera configurations from the projective pair, followed by cheirality disambiguation and axis-order verification of correspondences.
If this is right
- The two recovered solutions are always mirror images of each other with fixed relations between their viewing directions.
- Point correspondences that violate order along both image axes can be rejected before metric reconstruction.
- Multiple verified pair reconstructions can be fused into a consistent multi-view model by rotation averaging and focal-length averaging.
- The linear solver runs at accuracy comparable to the nonlinear Kruppa self-calibration plus five-point algorithm pipeline.
Where Pith is reading between the lines
- Pair-wise results could serve as input to global bundle adjustment that refines all focal lengths simultaneously.
- The axis-order test might improve match filtering in any two-view geometry task that assumes locally consistent image ordering.
- The mirror-solution property could simplify initialization of larger camera graphs by propagating orientation constraints.
Load-bearing premise
All camera internal parameters except the two focal lengths are known in advance and a projective reconstruction of the pair is already given.
What would settle it
On a collection of image pairs supplied with ground-truth metric poses, measure whether the linear solver selects the cheirality-correct solution and produces median rotation error no larger than that of the Kruppa baseline.
Figures
read the original abstract
We examine 3D reconstruction of architectural scenes in unordered sets of uncalibrated images. We introduce a linear method to self-calibrate and find the metric reconstruction of a camera pair. We assume unknown and different focal lengths but otherwise known internal camera parameters and a known projective reconstruction of the camera pair. We recover two possible camera configurations in space and use the Cheirality condition, that all 3D scene points are in front of both cameras, to disambiguate the solution. We show in two Theorems, first that the two solutions are in mirror positions and then the relations between their viewing directions. Our new method performs on par (median rotation error $\Delta R = 3.49^{\circ}$) with the standard approach of Kruppa equations ($\Delta R = 3.77^{\circ}$) for self-calibration and 5-Point algorithm for calibrated metric reconstruction of a camera pair. We reject erroneous image correspondences by introducing a method to examine whether point correspondences appear in the same order along $x, y$ image axes in image pairs. We evaluate this method by its precision and recall and show that it improves the robustness of point matches in architectural and general scenes. Finally, we integrate all the introduced methods to a 3D reconstruction pipeline. We utilize the numerous camera pair metric recontructions using rotation-averaging algorithms and a novel method to average focal length estimates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a linear method to upgrade a known projective reconstruction of a camera pair (unknown distinct focal lengths, other intrinsics known) to metric form by recovering two mirror configurations disambiguated via cheirality, supported by two theorems on their spatial relations and viewing directions. It also presents a geometric verification technique for rejecting erroneous correspondences based on consistent ordering along image axes, evaluates this via precision/recall, and integrates the pair-wise metric reconstructions into a multi-view pipeline using rotation averaging and focal-length averaging. Performance is reported as comparable to Kruppa equations plus 5-point algorithm (median rotation error 3.49° vs 3.77°).
Significance. If the linear derivation and theorems hold under the stated assumptions, the approach supplies an efficient, direct alternative to nonlinear self-calibration for camera pairs and could streamline robust reconstruction pipelines for architectural scenes; the ordering-based correspondence filter offers a simple geometric prior that may complement RANSAC-style methods.
major comments (2)
- [Abstract] Abstract: the central performance claim (median ΔR = 3.49° on par with 3.77°) is presented without any dataset description, number of image pairs, or error-bar statistics, rendering the numerical comparison unverifiable and load-bearing for the assertion that the method performs on par with Kruppa + 5-point.
- [Abstract] Abstract: the two theorems on mirror configurations and viewing-direction relations are announced but no derivation steps, intermediate equations, or proof outlines are supplied, so the soundness of the linear upgrade cannot be assessed from the given text.
minor comments (1)
- [Abstract] Abstract: the final pipeline sentence contains a typo ('recontructions').
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (median ΔR = 3.49° on par with 3.77°) is presented without any dataset description, number of image pairs, or error-bar statistics, rendering the numerical comparison unverifiable and load-bearing for the assertion that the method performs on par with Kruppa + 5-point.
Authors: We agree that the abstract would benefit from additional context to support the performance claim. The full experimental details, including the number of image pairs and dataset description from architectural scenes, along with error statistics, are provided in the Experiments section. We will revise the abstract to include a brief reference to the evaluation, such as the number of pairs tested, to make the comparison more verifiable. revision: yes
-
Referee: [Abstract] Abstract: the two theorems on mirror configurations and viewing-direction relations are announced but no derivation steps, intermediate equations, or proof outlines are supplied, so the soundness of the linear upgrade cannot be assessed from the given text.
Authors: The abstract serves as a high-level summary of the contributions. The two theorems are fully stated and proven in Section 3, with all derivation steps, intermediate equations, and proof outlines included there. To improve the abstract, we will add a short note referring readers to the theorems' proofs in the main text. However, due to length constraints, detailed derivations cannot be included in the abstract itself. revision: partial
Circularity Check
No significant circularity; derivation is self-contained algebraic upgrade
full rationale
The paper's central derivation is a linear algebraic procedure that takes as explicit input a known projective reconstruction of a camera pair (plus known intrinsics except distinct focal lengths) and produces metric camera poses via direct solution of the stated equations, followed by Cheirality disambiguation and two theorems on mirror configurations that follow from those equations. No step reduces a fitted parameter to a prediction by construction, renames an empirical pattern, or relies on a load-bearing self-citation whose content is itself unverified. The reported median rotation error is an external empirical comparison against Kruppa/5-point baselines and does not participate in the derivation chain. The correspondence verification step is likewise an independent geometric test. The pipeline integration (rotation averaging, focal-length averaging) is downstream application, not part of the core self-calibration claim. The derivation is therefore self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Internal camera parameters except focal length are known; projective reconstruction of the camera pair is given.
Reference graph
Works this paper leans on
-
[1]
R. I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd Edition, Cambridge University Press, ISBN: 0521540518, 2004
work page 2004
-
[2]
M. A. Fischler, R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM 24 (6) (1981) 381–395
work page 1981
-
[3]
R. I. Hartley, Kruppa’s equations derived from the fundamental matrix, IEEE Transactions on pattern365 analysis and machine intelligence 19 (2) (1997) 133–135
work page 1997
-
[4]
D. Nist´ er, An efficient solution to the five-point relative pose problem, Pattern Analysis and Machine Intelligence, IEEE Transactions on 26 (6) (2004) 756–770
work page 2004
-
[5]
M. Pollefeys, R. Koch, L. Van Gool, Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1) (1999) 7–25.370
work page 1999
-
[6]
Y. Seo, A. Heyden, R. Cipolla, A linear iterative method for auto-calibration using the dac equation, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1, IEEE, 2001, pp. I–880. 26
work page 2001
- [7]
-
[8]
N. Snavely, et al., Bundler: Structure from motion (sfm) for unordered image collections, Available online: phototour. cs. washington. edu/bundler/(accessed on 12 July 2013)
work page 2013
-
[9]
D. Martinec, T. Pajdla, Robust rotation and translation estimation in multiview reconstruction, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1–8
work page 2007
-
[10]
H. Stew´ enius, D. Nist´ er, F. Kahl, F. Schaffalitzky, A minimal solution for relative pose with unknown380 focal length, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, IEEE, 2005, pp. 789–794
work page 2005
-
[11]
S. N. Sinha, D. Steedly, R. Szeliski, A multi-stage linear approach to structure from motion, in: ECCV 2010 Workshop on Reconstruction and Modeling of Large-Scale 3D Virtual Environments, Vol. 3002, 2010, pp. 3003–3005.385
work page 2010
-
[12]
R. Hartley, K. Aftab, J. Trumpf, L1 rotation averaging using the weiszfeld algorithm, in: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 2011, pp. 3041–3048
work page 2011
-
[13]
V. M. Govindu, Combining two-view constraints for motion estimation, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 2, IEEE, 2001, pp. II–218.390
work page 2001
-
[14]
V. M. Govindu, Lie-algebraic averaging for globally consistent motion estimation, in: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Vol. 1, IEEE, 2004, pp. I–684
work page 2004
-
[15]
R. Hartley, J. Trumpf, Y. Dai, H. Li, Rotation averaging, International Journal of Computer Vision (2013) 1–39.395
work page 2013
-
[16]
F. Kahl, R. Hartley, Multiple-view geometry under the {L inf ty}-norm, Pattern Analysis and Machine Intelligence, IEEE Transactions on 30 (9) (2008) 1603–1617
work page 2008
-
[17]
O. Chum, J. Matas, Homography estimation from correspondences of local elliptical features, in: Pattern Recognition (ICPR), 2012 21st International Conference on, IEEE, 2012, pp. 3236–3239
work page 2012
-
[18]
N. Snavely, S. M. Seitz, R. Szeliski, Photo tourism: exploring photo collections in 3d, in: ACM trans-400 actions on graphics (TOG), Vol. 25, ACM, 2006, pp. 835–846
work page 2006
-
[19]
M. L. Fredman, On computing the length of longest increasing subsequences, Discrete Mathematics 11 (1) (1975) 29–35. 27
work page 1975
-
[20]
O. Faugeras, Q.-T. Luong, T. Papadopoulo, The geometry of multiple images: the laws that govern the formation of multiple images of a scene and some of their applications, MIT press, 2004.405
work page 2004
-
[21]
D. G. Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2) (2004) 91–110
work page 2004
- [22]
-
[23]
J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1–8
work page 2007
-
[24]
M. Perd’och, O. Chum, J. Matas, Efficient representation of local geometry for large scale object re- trieval, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE,415 2009, pp. 9–16
work page 2009
-
[25]
O. Chum, J. Matas, S. Obdrzalek, Enhancing ransac by generalized model optimization, in: Proc. of the ACCV, Vol. 2, 2004, pp. 812–817
work page 2004
-
[26]
O. Chum, J. Matas, Geometric hashing with local affine frames, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1, IEEE, 2006, pp. 879–884.420
work page 2006
- [27]
-
[28]
Z. Wu, Q. Ke, M. Isard, J. Sun, Bundling features for large scale partial-duplicate web image search, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2009,425 pp. 25–32
work page 2009
-
[29]
M. I. A. Lourakis, A. A. Argyros, Sba: a software package for generic sparse bundle adjustment, ACM Transactions on Mathematical Software (2009) 1–30
work page 2009
-
[30]
C. Wu, S. Agarwal, B. Curless, S. M. Seitz, Multicore bundle adjustment, in: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 2011, pp. 3057–3064.430
work page 2011
-
[31]
A. Dalalyan, R. Keriven, l 1-penalized robust estimation for a class of inverse problems arising in multiview geometry, in: Advances in Neural Information Processing Systems, 2009, pp. 441–449. 28
work page 2009
-
[32]
R. Hartley, F. Kahl, Optimal algorithms in multiview geometry, Computer Vision–ACCV 2007 (2007) 13–34
work page 2007
- [33]
-
[34]
C. Zach, M. Pollefeys, Practical methods for convex multi-view reconstruction, in: Computer Vision– ECCV 2010, Springer, 2010, pp. 354–367
work page 2010
-
[35]
O. Enqvist, F. Kahl, C. Olsson, Non-sequential structure from motion, in: Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, IEEE, 2011, pp. 264–271.440
work page 2011
-
[36]
Y. Furukawa, J. Ponce, Accurate, dense, and robust multiview stereopsis, Pattern Analysis and Machine Intelligence, IEEE Transactions on 32 (8) (2010) 1362–1376
work page 2010
-
[37]
M. Kazhdan, M. Bolitho, H. Hoppe, Poisson surface reconstruction, in: Proceedings of the fourth Eurographics symposium on Geometry processing, 2006
work page 2006
- [38]
-
[39]
J. W. Hunt, T. G. Szymanski, A fast algorithm for computing longest common subsequences, Commu- nications of the ACM 20 (5) (1977) 350–353
work page 1977
-
[40]
van Emde Boas, Preserving order in a forest in less than logarithmic time, in: FOCS, 1975, pp
P. van Emde Boas, Preserving order in a forest in less than logarithmic time, in: FOCS, 1975, pp. 75–84
work page 1975
-
[41]
A. C. Gallagher, Using vanishing points to correct camera rotation in images, in: Computer and Robot450 Vision, 2005. Proceedings. The 2nd Canadian Conference on, IEEE, 2005, pp. 460–467
work page 2005
-
[42]
C. Strecha, W. Von Hansen, L. Van Gool, P. Fua, U. Thoennessen, On benchmarking camera calibration and multi-view stereo for high resolution imagery, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8
work page 2008
-
[43]
C. Strecha, W. Von Hansen, L. Van Gool, P. Fua, U. Thoennessen, On benchmarking camera calibration455 and multi-view stereo for high resolution imagery, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8
work page 2008
- [44]
-
[45]
H. S. Wong, T.-J. Chin, J. Yu, D. Suter, Dynamic and hierarchical multi-structure geometric model460 fitting, in: International Conference on Computer Vision (ICCV), 2011. 29
work page 2011
-
[46]
M. Chandraker, S. Agarwal, F. Kahl, D. Nist´ er, D. Kriegman, Autocalibration via rank-constrained estimation of the absolute quadric, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1–8
work page 2007
-
[47]
R. Gherardi, A. Fusiello, Practical autocalibration, in: Computer Vision–ECCV 2010, Springer, 2010,465 pp. 790–801
work page 2010
-
[48]
Z. Kukelova, M. Bujnak, T. Pajdla, Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems., in: BMVC, 2008, pp. 1–10. Appendix A Gaussian elimination in Self calibration and metric recontruction equations To simplify the expressions, we introduce the notation Pγ i : row vector produced from i-th row of [ a]xF and permute x0 elements w...
work page 2008
-
[49]
The form of homography (4)
-
[50]
(2): Pi m2 = PP2Hi where Hi denotes the homography obtained by substituting the i-th solution of Eq
Eq. (2): Pi m2 = PP2Hi where Hi denotes the homography obtained by substituting the i-th solution of Eq. (17) the lemma is readily deduced 31 Lemma 2. Let P 1 m2, P2 m2 as in Lemma 1. We have: K1 2 R1 2− K2 2 R2 2 = anT (25) where n is an appropriate vector.490 Proof. As in proof of Lemma 1, by observing that Hi for different i values differ only in v ≜−pTK...
-
[51]
P1 is a full-rank matrix ( rank 3) for every projection matrix. The exception, referred to in the literature as “camera at infinity”, is out of our scope. Remember we are handling a metric reconstruction
-
[52]
P2 can be expressed in terms of P1, n, a, thus permitting the application of Eq. (33) to determine510 det P2. Now applying the previous points, we have det P2 = det P1 ⇐⇒ 1− nTR1 2 T K−1 2 a = 1 ⇐⇒ nTR1 2 T K−1 2 a = 0 ⇐⇒ −nTR1 2 T K−1 2 K2R1 2C1 = 0 , from (28): a =−K2R1 2C1 ⇐⇒ nTC1 = 0 , as RRT = I for rotation matrices R 34 Lemma 8. For the reconstruct...
-
[53]
The projective reconstruction PP2 is in the canonical representation form [ [a]xF a ] (36) with FTa = 0
-
[54]
[ a]x denotes the anti-symmetric matrix defined to compute outer product with vector a [a]xv = a× v
-
[55]
e denotes the right null vector of F , Fe = 0 (37) Lemma 9. Let PP2 = [ A a ] = [ [a]xF a ] denote the Projection matrix for camera 2 in the projective reconstruction and p, p′ the solutions for π∞ acquired from Eq. (17) pT = ( p1 p2 p3 ) p′T = ( p′ 1 p′ 2 p′ 3 ) 35 Then p− p′ = ψef (38) where ef = e1/f 2 1 e2/f 2 1 e3 Proof. From Eq. (5) an...
-
[56]
The definition of n in Eq. (25)
-
[57]
The relation between PP2, PM2, H (Eqs. (2),(4)) and the notation for P matrix of Lemma 9 and have P1 = AK1− apTK1 P2 = AK1− ap′TK1 ⇐⇒ P2 = P1 + a(p− p′)TK1 ≜ P1− anT Now, we can rewrite Eq. (47) as (p− p′)TK1C1 = 0 (48) We next have P 1 M2 C1 1 = 0 ⇐⇒ PP2H1 C1 1 = 0 ⇐⇒ PP2 K1C1 −pTK1C1 + 1 = 0 From the assumption that PP2 is in the c...
-
[58]
We denote v1 m2, v2 m2 the vectors that point along the viewing directions of cameras P 1 m2, P2 m2 respec-530 tively
-
[59]
For P 1 m2 we assume det P1 > 0 C ≜ C1 m2 39 Proof of Theorem 2. From Results 6, 7, Lemma 1, Theorem 1 we have for P 1 m2: K2R1C =−a ⇐⇒ f2R1 T f2R2 T R3 T C =−a ⇐⇒ R3 TC =−a3 (51) We have det P 1 = det K2R1 > 0 and so v1 2m = R3 Consequently, from Eq. (51), we have v1 2m T C =∥v1 2m∥∥C∥ cos ∠C, v1 2m =−a3 (52) In Eq. (52), ∥RT 3∥ = 1, since ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.