Depth from Small Motion using Rank-1 Initialization

Peter O. Fasogbon

arxiv: 1907.04058 · v1 · pith:NXKQSNVYnew · submitted 2019-07-09 · 💻 cs.CV

Depth from Small Motion using Rank-1 Initialization

Peter O. Fasogbon This is my paper

Pith reviewed 2026-05-25 00:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords depth from small motionrank-1 factorizationbundle adjustment initializationinverse depthmobile depth estimationsingular value decompositionfeature trackingcamera motion recovery

0 comments

The pith

Rank-1 factorization supplies an initialization that lets bundle adjustment converge depth-from-small-motion maps in roughly one quarter the usual iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make depth estimation from small handheld motions practical on mobile devices by reducing the iterations required for bundle adjustment. It constructs a constraint matrix that is exactly rank-1 when noise is absent, then uses its singular value decomposition to recover initial inverse depths and camera motion. This starting point allows reliable convergence with only 10-15 images and yields more robust final depth maps than bundle adjustment started from scratch. A grided feature extraction step further accelerates tracking by limiting attention to small, important features across frames. Timing results on two mobile platforms with CPU-GPU acceleration confirm the overall speedup.

Core claim

By forming a constraint matrix that is rank-1 in the noiseless case and recovering inverse depth values together with camera motion through singular value decomposition, the method supplies an initialization that lets bundle adjustment converge in approximately one quarter the iterations required without it, while also producing more robust depth maps.

What carries the argument

The rank-1 constraint matrix whose singular value decomposition directly yields initial inverse depths and camera motions for subsequent bundle adjustment.

If this is right

Bundle adjustment converges with 10-15 images after roughly one quarter the iterations needed from a conventional start.
The resulting depth maps are more robust than those obtained by bundle adjustment alone.
Overall execution time on mobile devices drops because fewer optimization steps are required.
Grided feature extraction limits tracking to important small features and reduces computation across frames.
CPU-GPU co-processing on mobile hardware further shortens the measured end-to-end runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rank-1 initialization could shorten convergence in other structure-from-motion pipelines that rely on bundle adjustment.
The approach might remain effective on sequences with modestly larger motions provided the rank-1 approximation still holds approximately.
On-device depth maps obtained this way could support real-time augmented-reality overlays without requiring large camera motions.
Replacing the grided extractor with a learned feature detector would test whether accuracy can be maintained while preserving the speed gain.

Load-bearing premise

The matrix formed from the observed tracks remains sufficiently close to rank-1 under real image noise and small motions that its SVD solution gives bundle adjustment a starting point accurate enough for fast, reliable convergence.

What would settle it

If real handheld sequences with 10-15 frames show that the rank-1 initialized bundle adjustment still requires the same number of iterations as a conventional initialization or fails to converge in a substantial fraction of trials, the speedup claim is falsified.

read the original abstract

Depth from Small Motion (DfSM) (Ha et al., 2016) is particularly interesting for commercial handheld devices because it allows the possibility to get depth information with minimal user effort and cooperation. Due to speed and memory issue on these devices, the self calibration optimization of the method using Bundle Adjustment (BA) need as little as 10-15 images. Therefore, the optimization tends to take many iterations to converge or may not converge at all in some cases. This work propose a robust initialization for the bundle adjustment using the rank-1 factorization method (Tomasi and Kanade, 1992), (Aguiar and Moura, 1999a). We create a constraint matrix that is rank-1 in a noiseless situation, then use SVD to compute the inverse depth values and the camera motion. We only need about quarter fraction of the bundle adjustment iteration to converge. We also propose grided feature extraction technique so that only important and small features are tracked all over the image frames. This also ensure speedup in the full execution time on the mobile device. For the experiments, we have documented the execution time with the proposed Rank-1 initialization on two mobile device platforms using optimized accelerations with CPU-GPU co-processing. The combination of Rank 1-BA generates more robust depth-map and is significantly faster than using BA alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rank-1 SVD initialization for DfSM bundle adjustment targets mobile speed but the abstract gives no numbers to support the quarter-iteration claim.

read the letter

The paper's main move is to feed a rank-1 factorization (Tomasi-Kanade style) into the bundle adjustment step of depth-from-small-motion so that 10-15 handheld frames converge faster on phones. They build a constraint matrix that is exactly rank-1 in the noiseless case, run SVD to get inverse depths and motion, and add a grided feature tracker to keep only useful small features across frames. That combination is the concrete addition on top of Ha et al. 2016 and the classic factorization work. It directly tackles the speed and memory limits that force BA to run on very few images, which is a practical bottleneck for consumer devices. The grided extraction is a simple but sensible engineering step to reduce tracking cost. The central claims of roughly 4x fewer iterations and higher robustness rest on the assumption that the SVD solution lands close enough to the true basin even when parallax is tiny. The abstract states these outcomes without any reported timings, error curves, success rates, or noise-sensitivity tests, so the evidence for the speedup is missing. The stress-test point about the leading singular vector becoming unreliable once feature noise exceeds the small-motion signal is not addressed in the provided text, and that gap matters because DfSM lives exactly in that low-signal regime. If the full paper contains the CPU-GPU timing tables on two devices, those would be the first thing to check. This is aimed at engineers who ship depth on handheld hardware rather than at people extending the theory of structure-from-motion. A reader already working on mobile bundle adjustment could borrow the initialization trick if the numbers hold. The work is coherent enough on its own terms to go to a serious referee; the practical target and the explicit algorithmic step justify the time even if the current write-up needs the experimental section strengthened.

Referee Report

3 major / 1 minor

Summary. The paper proposes using rank-1 factorization (citing Tomasi-Kanade 1992 and Aguiar-Moura 1999) to initialize bundle adjustment for Depth from Small Motion (DfSM) with 10-15 handheld images. It claims the resulting Rank-1-BA combination converges in about one-quarter the iterations of BA alone, produces more robust depth maps, and, with a grided feature extraction method, yields faster execution on mobile devices via CPU-GPU co-processing. Experiments document execution times on two mobile platforms.

Significance. If the rank-1 SVD initialization reliably delivers the claimed 4x reduction in BA iterations and robustness gains under small-motion conditions, the work would provide a practical engineering improvement for efficient self-calibrated depth estimation on resource-constrained handheld devices.

major comments (3)

[Abstract] Abstract: The central claims that 'We only need about quarter fraction of the bundle adjustment iteration to converge' and that 'The combination of Rank 1-BA generates more robust depth-map' are asserted without any supporting quantitative results, iteration counts, convergence curves, or accuracy metrics comparing Rank-1-BA against BA alone. This evidence gap directly undermines the primary contribution.
[Abstract] Abstract / method description: The constraint matrix is stated to be exactly rank-1 only in the noiseless case, with SVD then used to recover inverse depths and motion as initialization. No analysis or experiments quantify how quickly this approximation degrades under realistic feature noise or the weak parallax typical of small-motion DfSM, which is load-bearing for the robustness and convergence claims.
[Experiments] Experiments: Execution times with CPU-GPU acceleration are reported, but no tables or figures compare depth-map quality, success rates, or iteration counts with versus without the rank-1 initialization, leaving the robustness and speedup assertions unsupported.

minor comments (1)

[Abstract] Abstract: 'This work propose a robust initialization' contains a subject-verb agreement error.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The feedback highlights important gaps in the presentation of quantitative evidence supporting our claims. We address each major comment below and will revise the manuscript accordingly to include the requested comparisons and analysis.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that 'We only need about quarter fraction of the bundle adjustment iteration to converge' and that 'The combination of Rank 1-BA generates more robust depth-map' are asserted without any supporting quantitative results, iteration counts, convergence curves, or accuracy metrics comparing Rank-1-BA against BA alone. This evidence gap directly undermines the primary contribution.

Authors: We agree that the abstract states these performance claims without direct supporting quantitative evidence such as iteration counts or convergence curves. The current manuscript reports execution times on mobile platforms but does not include side-by-side comparisons of Rank-1 initialization versus standard BA. We will revise the abstract for accuracy and add a dedicated experiments subsection with tables and figures showing iteration counts, convergence behavior, and depth-map robustness metrics. revision: yes
Referee: [Abstract] Abstract / method description: The constraint matrix is stated to be exactly rank-1 only in the noiseless case, with SVD then used to recover inverse depths and motion as initialization. No analysis or experiments quantify how quickly this approximation degrades under realistic feature noise or the weak parallax typical of small-motion DfSM, which is load-bearing for the robustness and convergence claims.

Authors: The manuscript correctly identifies that the rank-1 property holds exactly only in the noiseless case. We acknowledge the absence of any quantitative study on degradation under feature noise or weak parallax conditions. We will add a short analysis section with synthetic experiments that measure initialization error as a function of noise level and parallax to support the robustness claims. revision: yes
Referee: [Experiments] Experiments: Execution times with CPU-GPU acceleration are reported, but no tables or figures compare depth-map quality, success rates, or iteration counts with versus without the rank-1 initialization, leaving the robustness and speedup assertions unsupported.

Authors: The experiments section currently presents execution times for the full pipeline on two mobile platforms but omits direct comparisons of depth-map quality, success rates, and iteration counts between the Rank-1-BA approach and BA alone. We will incorporate new comparative tables and figures addressing these metrics in the revised experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent external citations

full rationale

The paper constructs a rank-1 constraint matrix and applies SVD for initialization by directly invoking the factorization technique of Tomasi and Kanade (1992) and Aguiar and Moura (1999a), which are cited as external prior art with no self-citation load-bearing on the central result. The reduction in bundle-adjustment iterations is presented as an empirical outcome of this initialization rather than a quantity derived by construction from fitted parameters or self-referential definitions. The overall chain remains self-contained against the cited external methods.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that the constructed constraint matrix has exact rank one under noiseless conditions, enabling direct SVD recovery of motion and depth for initialization.

axioms (1)

domain assumption The constraint matrix is rank-1 in a noiseless situation
Explicitly stated in the abstract as the foundation for using SVD to obtain inverse depth and camera motion.

pith-pipeline@v0.9.0 · 5765 in / 1216 out tokens · 23399 ms · 2026-05-25T00:37:06.351973+00:00 · methodology

Depth from Small Motion using Rank-1 Initialization

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)