Homography from two orientation- and scale-covariant features

Daniel Barath; Zuzana Kukelova

arxiv: 1906.11927 · v1 · pith:OLRSMSPVnew · submitted 2019-06-27 · 💻 cs.CV

Homography from two orientation- and scale-covariant features

Daniel Barath , Zuzana Kukelova This is my paper

Pith reviewed 2026-05-25 14:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords homography estimationcovariant featuresminimal solvertwo-point algorithmscale constraintsrotation constraintsSIFTRANSAC

0 comments

The pith

Two orientation- and scale-covariant features suffice to estimate a homography by deriving new constraints from their scales and rotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives two new general constraints on the scales and rotations supplied by detectors such as SIFT. These constraints are specialized to the homography case, producing additional equations that reduce the minimal number of point correspondences from four to two. A solver is then built on the resulting equations. The approach also shows how normalizing the input points stabilizes the recovered scale and rotation values. A reader would care because robust estimators such as RANSAC would require far fewer random samples when only two pairs are needed instead of four.

Core claim

By providing a geometric interpretation of the angles and scales returned by orientation- and scale-covariant feature detectors, two new constraints on these quantities are obtained. When restricted to a homography, the constraints yield a minimal solver that recovers the homography from exactly two correspondences. Normalization of the correspondences is shown to keep the recovered rotation and scale parameters numerically stable.

What carries the argument

Two new constraints on scales and rotations derived from the geometric effect of a homography on covariant features.

If this is right

Homography estimation requires only two feature pairs instead of four.
RANSAC needs substantially fewer iterations when applied to the two-point solver.
The scale and rotation information already supplied by detectors such as SIFT is used at no extra cost.
The same scale-rotation constraints can be inserted into any other geometric estimation task that involves a homography.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be extended to other planar transformations whose action on local scale and orientation can be written in closed form.
In practice the two-point solver would be combined with the four-point solver inside a hybrid RANSAC loop to handle both minimal and over-determined cases.
Numerical stability gains from normalization suggest that similar preprocessing steps may improve other minimal solvers that recover rotation or scale parameters.

Load-bearing premise

The angles and scales reported by the detectors correspond directly to the geometric quantities transformed by the homography.

What would settle it

Run the two-point solver on synthetic homographies with known ground-truth scales and rotations; if the recovered homography matrix deviates substantially from the ground truth while the four-point DLT succeeds, the derived constraints do not hold.

Figures

Figures reproduced from arXiv: 1906.11927 by Daniel Barath, Zuzana Kukelova.

**Figure 2.** Figure 2: Inliers of the estimated homographies (by 2SIFT) drawn to example image pairs. The numbers of iterations of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Stability study. The frequencies (100 000 runs; vertical axis) of log10 errors (horizontal) in the estimated homographies by the proposed (red), 4PT (green) and 3ORI (blue) methods. Ab = T2 A 0 0 1 T −1 1 , (17) where Ab is the normalized affinity. Matrix Ti transforms the points by translating them (last column) and applying a uniform scaling (diagonal). Due to the fact that the last column of Ti has n… view at source ↗

**Figure 4.** Figure 4: The average (of 10 000 runs on each noise σ) re-projection error of homography fitting to synthesized data by the proposed (2SIFT), normalized 4PT [15] and 3ORI [3] methods. Each camera is located randomly on a center-aligned sphere. Ten points from the object are projected into the cameras, and zero-mean Gaussian-noise is added to the coordinates. The affine parameters are calculated from the noisy coordi… view at source ↗

**Figure 5.** Figure 5: The average (10 000 runs on each noise σ) reprojection error of homography fitting to synthesized data by the 2SIFT, normalized 4PT [15] and 3ORI [3] methods. The same test scene is used as in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The results on 15 sequences (9 064 image pairs) of the Malaga dataset using GC-RANSAC [7] as a robust estimator and different minimal solvers (2SIFT, 3ORI, 4PT). The confidence of RANSAC was set to 0.95 and the inlieroutlier threshold to 2.0 pixels. The re-projection error (left; in pixels), average processing time (middle; in seconds) and average iteration number (right) are reported. 6. Conclusion We pr… view at source ↗

read the original abstract

This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide. Two new general constraints are derived on the scales and rotations which can be used in any geometric model estimation tasks. Using these formulas, two new constraints on homography estimation are introduced. Exploiting the derived equations, a solver for estimating the homography from the minimal number of two correspondences is proposed. Also, it is shown how the normalization of the point correspondences affects the rotation and scale parameters, thus achieving numerically stable results. Due to requiring merely two feature pairs, robust estimators, e.g. RANSAC, do significantly fewer iterations than by using the four-point algorithm. When using covariant features, e.g. SIFT, the information about the scale and orientation is given at no cost. The proposed homography estimation method is tested in a synthetic environment and on publicly available real-world datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Derives two new scale and rotation constraints from covariant features to enable a working two-point homography solver.

read the letter

The main thing here is that the paper derives two new general constraints on how a homography transforms local scale and orientation, then uses them to build a minimal solver that needs only two correspondences instead of four. Each point pair supplies the usual two point equations plus one scale and one orientation constraint, giving exactly eight independent equations for the eight degrees of freedom in H. They also work out the normalization step so the rotation and scale parameters stay numerically stable. That combination is the actual advance, and it is presented as a direct geometric derivation rather than a fitted or heuristic trick. The efficiency gain for RANSAC follows immediately when the features are already SIFT or similar, since the extra information costs nothing extra. Synthetic and real-data tests are the right way to check this, and the construction avoids circularity or invented entities. The soft spots are limited. Independence of the constraints in edge cases like repeated scales or near-collinear points will need the experiments to confirm, but nothing in the stated setup signals a load-bearing flaw or missing argument. The geometric interpretation of the covariant properties appears to hold without contradiction. This is for computer vision researchers who already run orientation- and scale-covariant detectors and want fewer RANSAC iterations on homography tasks. A reader focused on minimal solvers or robust estimation gets concrete value and could cite the solver in follow-up work. It shows clear geometric thinking and deserves a serious referee.

Referee Report

1 major / 2 minor

Summary. The paper derives two new general constraints on the scales and rotations induced by a homography acting on orientation- and scale-covariant features (e.g., SIFT). These constraints, together with the standard point correspondences, enable a minimal solver for the eight degrees of freedom of a homography from only two feature pairs. The manuscript also shows how point normalization affects the rotation and scale parameters to improve numerical stability and demonstrates the approach on synthetic data and public real-world datasets.

Significance. If the geometric constraints are independent and correctly derived, the method reduces the minimal sample size for homography estimation from four to two points when scale and orientation are already available from the detector. This directly lowers the number of RANSAC iterations required for robust estimation and exploits information that is obtained at no extra cost, which is a practical advantage for vision pipelines that rely on covariant features.

major comments (1)

[Derivation of constraints] The derivation section must explicitly verify that the two new scale/orientation constraints per correspondence are linearly independent of the two point equations and of each other; otherwise the eight-equation system for two correspondences may be rank-deficient. The abstract states that exactly eight independent equations are obtained, but no rank argument or degeneracy analysis is referenced in the provided description.

minor comments (2)

[Normalization] The normalization procedure for maintaining numerical stability should include a brief statement of the condition number improvement or a small synthetic example quantifying the effect on the estimated homography.
[Experiments] A short discussion of degenerate configurations (e.g., when the two features are collinear or have identical scales) would help readers assess practical applicability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work. We address the single major comment below.

read point-by-point responses

Referee: [Derivation of constraints] The derivation section must explicitly verify that the two new scale/orientation constraints per correspondence are linearly independent of the two point equations and of each other; otherwise the eight-equation system for two correspondences may be rank-deficient. The abstract states that exactly eight independent equations are obtained, but no rank argument or degeneracy analysis is referenced in the provided description.

Authors: We agree that an explicit verification strengthens the paper. The two point equations per correspondence arise solely from the positional mapping x' = Hx. The two new constraints are obtained by applying the homography to the local affine frame encoded by each covariant feature: one from the singular values (scale change) and one from the argument of the complex representation (orientation change). These act on distinct components of the 3x3 homography matrix and are therefore algebraically independent of the positional equations and of each other. Nevertheless, to satisfy the request we will add a short rank analysis (via the Jacobian of the eight-equation system evaluated at generic points) together with a brief degeneracy discussion in the revised derivation section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is geometrically self-contained

full rationale

The paper presents a first-principles geometric derivation of two new constraints relating local scale and orientation changes under a homography, obtained directly from the transformation properties of covariant features. These constraints are combined with the standard two-point equations per correspondence to produce an eight-equation system for the eight degrees of freedom of H, enabling a two-correspondence solver. No fitted parameters are renamed as predictions, no self-citation chain supplies the central equations, and the normalization discussion addresses numerical stability rather than redefining the result. The derivation therefore stands independently of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that covariant features supply scale and orientation values that admit a direct geometric interpretation under homography; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption The scale and orientation reported by covariant feature detectors admit a geometric interpretation that yields usable constraints under a homography transformation.
Invoked in the first sentence of the abstract as the foundation for deriving the two new constraints.

pith-pipeline@v0.9.0 · 5689 in / 1186 out tokens · 30379 ms · 2026-05-25T14:42:13.435711+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

D. Barath. P-HAF: Homography estimation using partial lo- cal afﬁne frames. In International Conference on Computer Vision Theory and Applications, 2017. 2

work page 2017
[2]

Bar ´ath

D. Bar ´ath. Approximate epipolar geometry from six rotation invariant correspondences. 2018. 2

work page 2018
[3]

D. Barath. Five-point fundamental matrix estimation for un- calibrated cameras. Conference on Computer Vision and Pat- tern Recognition, 2018. 2, 3, 5, 6, 7

work page 2018
[4]

Bar ´ath

D. Bar ´ath. Recovering afﬁne features from orientation-and scale-invariant ones. 2018. 2, 3

work page 2018
[5]

Barath and L

D. Barath and L. Hajder. A theory of point-wise homography estimation. Pattern Recognition Letters, 94:7–14, 2017. 1

work page 2017
[6]

Barath and L

D. Barath and L. Hajder. Efﬁcient recovery of essential ma- trix from two afﬁne correspondences.IEEE Transactions on Image Processing, 27(11):5328–5337, 2018. 1, 4

work page 2018
[7]

Barath and J

D. Barath and J. Matas. Graph-Cut RANSAC. Conference on Computer Vision and Pattern Recognition, 2018. 2, 4, 6, 7, 8

work page 2018
[8]

Bar ´ath, J

D. Bar ´ath, J. Moln´ar, and L. Hajder. Optimal surface normal from afﬁne transformation. 2015. 1, 3

work page 2015
[9]

Barath, T

D. Barath, T. Toth, and L. Hajder. A minimal solution for two-view focal-length estimation using two afﬁne corre- spondences. In Conference on Computer Vision and Pattern Recognition, 2017. 1

work page 2017
[10]

H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up robust features. European Conference on Computer Vision,

work page
[11]

Bentolila and J

J. Bentolila and J. M. Francos. Conic epipolar constraints from afﬁne correspondences. Computer Vision and Image Understanding, 2014. 1

work page 2014
[12]

Chum and J

O. Chum and J. Matas. Matching with PROSAC-progressive sample consensus. In Computer Vision and Pattern Recogni- tion, 2005. 6

work page 2005
[13]

D. Cox, J. Little, and D. O’Shea. Using Algebraic Geometry. 2nd edition, 2005. 3, 4

work page 2005
[14]

Grayson and M

D. Grayson and M. Stillman. Macaulay2, a software system for research in algebraic geometry. available at www.math.uiuc.edu/Macaulay2/. 3

work page
[15]

Hartley and A

R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge University Press, 2003. 2, 5, 6, 7

work page 2003
[16]

R. I. Hartley. In defense of the eight-point algorithm. Pattern Analysis and Machine Intelligence, 1997. 3, 4

work page 1997
[17]

K. K ¨oser. Geometric Estimation with Local Afﬁne Frames and Free-form Surfaces. Shaker, 2009. 1

work page 2009
[18]

Kreyszig

E. Kreyszig. Introduction to differential geometry and Rie- mannian geometry, volume 16. University of Toronto Press,

work page
[19]

Kukelova, M

Z. Kukelova, M. Bujnak, and T. Pajdla. Automatic gener- ator of minimal problem solvers. In European Conference on Computer Vision, volume 5304 of Lecture Notes in Com- puter Science, 2008. 4

work page 2008
[20]

Kukelova, J

Z. Kukelova, J. Heller, and A. Fitzgibbon. Efﬁcient intersec- tion of three quadrics and applications in computer vision. In Conference on Computer Vision and Pattern Recognition, pages 1799–1808, 2016. 5

work page 2016
[21]

A clever elimination strategy for efficient minimal solvers

Z. Kukelova, J. Kileel, B. Sturmfels, and T. Pajdla. A clever elimination strategy for efﬁcient minimal solvers. In Con- ference on Computer Vision and Pattern Recognition, 2017. http://arxiv.org/abs/1703.05289. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

D. G. Lowe. Object recognition from local scale-invariant features. In International Conference on Computer vision ,

work page
[23]

Matas, O

J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide- baseline stereo from maximally stable extremal regions. Im- age and vision computing, 2004. 2

work page 2004
[24]

Mikolajczyk, T

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of afﬁne region detectors.International Journal of Computer Vision, 65(1-2):43–72, 2005. 2

work page 2005
[25]

S. Mills. Four-and seven-point relative camera pose from oriented features. In International Conference on 3D Vision, pages 218–227. IEEE, 2018. 2

work page 2018
[26]

Mishkin, J

D. Mishkin, J. Matas, and M. Perdoch. MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding, 2015. 1, 2

work page 2015
[27]

Moln ´ar and D

J. Moln ´ar and D. Chetverikov. Quadratic transformation for planar mapping of implicit surfaces. Journal of Mathemati- cal Imaging and Vision, 2014. 2

work page 2014
[28]

Morel and G

J.-M. Morel and G. Yu. ASIFT: A new framework for fully afﬁne invariant image comparison.SIAM journal on imaging sciences, 2(2):438–469, 2009. 1

work page 2009
[29]

Perdoch, J

M. Perdoch, J. Matas, and O. Chum. Epipolar geometry from two correspondences. In International Conference on Pat- tern Recognition, 2006. 1

work page 2006
[30]

Pritts, Z

J. Pritts, Z. Kukelova, V . Larsson, and O. Chum. Radially- distorted conjugate translations. Conference on Computer Vision and Pattern Recognition, 2018. 1

work page 2018
[31]

Raposo and J

C. Raposo and J. P. Barreto. Theory and practice of structure- from-motion using afﬁne correspondences. In Computer Vi- sion and Pattern Recognition, 2016. 1

work page 2016
[32]

Rublee, V

E. Rublee, V . Rabaud, K. Konolige, and G. Bradski. ORB: An efﬁcient alternative to sift or surf. In International Con- ference on Computer Vision, pages 2564–2571. IEEE, 2011. 2 9

work page 2011

[1] [1]

D. Barath. P-HAF: Homography estimation using partial lo- cal afﬁne frames. In International Conference on Computer Vision Theory and Applications, 2017. 2

work page 2017

[2] [2]

Bar ´ath

D. Bar ´ath. Approximate epipolar geometry from six rotation invariant correspondences. 2018. 2

work page 2018

[3] [3]

D. Barath. Five-point fundamental matrix estimation for un- calibrated cameras. Conference on Computer Vision and Pat- tern Recognition, 2018. 2, 3, 5, 6, 7

work page 2018

[4] [4]

Bar ´ath

D. Bar ´ath. Recovering afﬁne features from orientation-and scale-invariant ones. 2018. 2, 3

work page 2018

[5] [5]

Barath and L

D. Barath and L. Hajder. A theory of point-wise homography estimation. Pattern Recognition Letters, 94:7–14, 2017. 1

work page 2017

[6] [6]

Barath and L

D. Barath and L. Hajder. Efﬁcient recovery of essential ma- trix from two afﬁne correspondences.IEEE Transactions on Image Processing, 27(11):5328–5337, 2018. 1, 4

work page 2018

[7] [7]

Barath and J

D. Barath and J. Matas. Graph-Cut RANSAC. Conference on Computer Vision and Pattern Recognition, 2018. 2, 4, 6, 7, 8

work page 2018

[8] [8]

Bar ´ath, J

D. Bar ´ath, J. Moln´ar, and L. Hajder. Optimal surface normal from afﬁne transformation. 2015. 1, 3

work page 2015

[9] [9]

Barath, T

D. Barath, T. Toth, and L. Hajder. A minimal solution for two-view focal-length estimation using two afﬁne corre- spondences. In Conference on Computer Vision and Pattern Recognition, 2017. 1

work page 2017

[10] [10]

H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up robust features. European Conference on Computer Vision,

work page

[11] [11]

Bentolila and J

J. Bentolila and J. M. Francos. Conic epipolar constraints from afﬁne correspondences. Computer Vision and Image Understanding, 2014. 1

work page 2014

[12] [12]

Chum and J

O. Chum and J. Matas. Matching with PROSAC-progressive sample consensus. In Computer Vision and Pattern Recogni- tion, 2005. 6

work page 2005

[13] [13]

D. Cox, J. Little, and D. O’Shea. Using Algebraic Geometry. 2nd edition, 2005. 3, 4

work page 2005

[14] [14]

Grayson and M

D. Grayson and M. Stillman. Macaulay2, a software system for research in algebraic geometry. available at www.math.uiuc.edu/Macaulay2/. 3

work page

[15] [15]

Hartley and A

R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge University Press, 2003. 2, 5, 6, 7

work page 2003

[16] [16]

R. I. Hartley. In defense of the eight-point algorithm. Pattern Analysis and Machine Intelligence, 1997. 3, 4

work page 1997

[17] [17]

K. K ¨oser. Geometric Estimation with Local Afﬁne Frames and Free-form Surfaces. Shaker, 2009. 1

work page 2009

[18] [18]

Kreyszig

E. Kreyszig. Introduction to differential geometry and Rie- mannian geometry, volume 16. University of Toronto Press,

work page

[19] [19]

Kukelova, M

Z. Kukelova, M. Bujnak, and T. Pajdla. Automatic gener- ator of minimal problem solvers. In European Conference on Computer Vision, volume 5304 of Lecture Notes in Com- puter Science, 2008. 4

work page 2008

[20] [20]

Kukelova, J

Z. Kukelova, J. Heller, and A. Fitzgibbon. Efﬁcient intersec- tion of three quadrics and applications in computer vision. In Conference on Computer Vision and Pattern Recognition, pages 1799–1808, 2016. 5

work page 2016

[21] [21]

A clever elimination strategy for efficient minimal solvers

Z. Kukelova, J. Kileel, B. Sturmfels, and T. Pajdla. A clever elimination strategy for efﬁcient minimal solvers. In Con- ference on Computer Vision and Pattern Recognition, 2017. http://arxiv.org/abs/1703.05289. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

D. G. Lowe. Object recognition from local scale-invariant features. In International Conference on Computer vision ,

work page

[23] [23]

Matas, O

J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide- baseline stereo from maximally stable extremal regions. Im- age and vision computing, 2004. 2

work page 2004

[24] [24]

Mikolajczyk, T

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of afﬁne region detectors.International Journal of Computer Vision, 65(1-2):43–72, 2005. 2

work page 2005

[25] [25]

S. Mills. Four-and seven-point relative camera pose from oriented features. In International Conference on 3D Vision, pages 218–227. IEEE, 2018. 2

work page 2018

[26] [26]

Mishkin, J

D. Mishkin, J. Matas, and M. Perdoch. MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding, 2015. 1, 2

work page 2015

[27] [27]

Moln ´ar and D

J. Moln ´ar and D. Chetverikov. Quadratic transformation for planar mapping of implicit surfaces. Journal of Mathemati- cal Imaging and Vision, 2014. 2

work page 2014

[28] [28]

Morel and G

J.-M. Morel and G. Yu. ASIFT: A new framework for fully afﬁne invariant image comparison.SIAM journal on imaging sciences, 2(2):438–469, 2009. 1

work page 2009

[29] [29]

Perdoch, J

M. Perdoch, J. Matas, and O. Chum. Epipolar geometry from two correspondences. In International Conference on Pat- tern Recognition, 2006. 1

work page 2006

[30] [30]

Pritts, Z

J. Pritts, Z. Kukelova, V . Larsson, and O. Chum. Radially- distorted conjugate translations. Conference on Computer Vision and Pattern Recognition, 2018. 1

work page 2018

[31] [31]

Raposo and J

C. Raposo and J. P. Barreto. Theory and practice of structure- from-motion using afﬁne correspondences. In Computer Vi- sion and Pattern Recognition, 2016. 1

work page 2016

[32] [32]

Rublee, V

E. Rublee, V . Rabaud, K. Konolige, and G. Bradski. ORB: An efﬁcient alternative to sift or surf. In International Con- ference on Computer Vision, pages 2564–2571. IEEE, 2011. 2 9

work page 2011