A Compact Light Field Camera for Real-Time Depth Estimation

Didier Stricker; Oliver Wasenm\"uller; Yuriy Anisimov

arxiv: 1907.10880 · v1 · pith:UFWTNNGHnew · submitted 2019-07-25 · 💻 cs.CV

A Compact Light Field Camera for Real-Time Depth Estimation

Yuriy Anisimov , Oliver Wasenm\"uller , Didier Stricker This is my paper

Pith reviewed 2026-05-24 16:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords light field cameradepth estimationreal-timecompact designdepth cameracomputer vision

0 comments

The pith

A light field depth camera is made both compact and real-time for the first time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a depth camera that uses the light field principle to estimate depth. Earlier light field methods for depth were either too computationally intensive or too physically large to be useful outside labs. The authors claim to have resolved these issues through a new design. A reader would care if this allows depth sensing in everyday devices like phones or robots without the bulk or lag of previous systems. The central object is the integrated camera hardware and processing that makes light field depth viable for real-world settings.

Core claim

For the first time, a depth camera based on the light field principle provides real-time depth information as well as a compact design, overcoming the high computation time and large design of previous approaches.

What carries the argument

The light field principle applied via a specific compact optical design and real-time software pipeline for depth computation.

If this is right

Real-time depth information becomes available from a compact device.
Light field depth cameras can now be considered for real-world applications.
Both depth estimation and compact form are achieved simultaneously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a camera could enable new portable applications in augmented reality that require fast depth.
Future work might focus on improving accuracy while maintaining the size and speed constraints.

Load-bearing premise

The authors' optical design and software pipeline can meet the conflicting demands of small size, real-time speed, and sufficient depth accuracy at the same time.

What would settle it

Direct measurement of the camera's physical size, the frame rate of depth map output, and the accuracy of depth estimates against ground truth data.

Figures

Figures reproduced from arXiv: 1907.10880 by Didier Stricker, Oliver Wasenm\"uller, Yuriy Anisimov.

**Figure 1.** Figure 1: In this paper, we propose a new compact light field camera, which is capable to compute depth information in real-time. In this paper, we propose a novel system that handles these two disadvantages. We build a compact light field camera by placing an array of 4x4 single lenses in front of a full format CMOS sensor. Furthermore, we enable a real-time depth computation by developing the depth algorithm adeq… view at source ↗

**Figure 2.** Figure 2: Overview of the single processing steps in our proposed system. The depth estimation is detailed in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed real-time depth estimation algorithm. 4.2 Calibration The camera calibration procedure provides the camera intrinsic values, such as camera focal length and camera center, which are required for the disparityto-depth conversion, together with the extrinsic values, which represents the camera relative position. Both intrinsics and extrinsics are required for the images rectificati… view at source ↗

**Figure 4.** Figure 4: Our real-time depth estimation is performed on an GPU-based SoC (a). For the evaluation of accuracy we added a reference laser scanner (b). Matching cost generation is performed for every pixel in every light field view by S(u, v, d) = Xn s=1 Xm t=1 HD(L(u, v, s, ˆ tˆ), pˆ(u, v, s, t, d)). (6) Out of the matched costs, the final disparity map can be estimated as Ds(p) = arg min d Cs(p, d). (7) Performing o… view at source ↗

**Figure 5.** Figure 5: Depth accuracy of our system. = {∆u, ∆v}, aggregated cost Lr is Lr(p, d) = C(p, d)+ min (Lr(p − r, d), Lr(p − r, d − 1) + P1 , Lr(p − r, d + 1) + P1 , min t Lr(p − r, t) + P2 ), (8) where P1 and P2 are penalty parameters, P2 > P1 . Traversed costs are then summarized through all traversing directions Cs(p, d) = X r Lr(p, d). (9) Disparity-to-depth conversion is performed by a classical equation, based on t… view at source ↗

**Figure 6.** Figure 6: Qualitative results of the proposed system. The scenes are reconstructed with a high level of detail – even for homogeneous regions (wall), filigree objects (pillar) and crowded objects (plant hedge). 5.2 Running Time As described in Section 3, our system utilizes a Nvidia Jetson TX2 for embedded processing. This platform is equipped with a Tegra X2 GPU. The run times are given in [PITH_FULL_IMAGE:figures… view at source ↗

read the original abstract

Depth cameras are utilized in many applications. Recently light field approaches are increasingly being used for depth computation. While these approaches demonstrate the technical feasibility, they can not be brought into real-world application, since they have both a high computation time as well as a large design. Exactly these two drawbacks are overcome in this paper. For the first time, we present a depth camera based on the light field principle, which provides real-time depth information as well as a compact design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper presents a compact real-time light-field depth camera prototype, but the abstract supplies no measurements to show the design meets size, speed, and accuracy constraints together.

read the letter

The core claim is that the authors built the first light-field depth camera that is both compact enough for real products and fast enough for video rates. If the numbers work out, that would be a useful step for moving the technique into applications like robotics or consumer devices where prior systems were either too bulky or too slow. The paper does a clear job naming those two adoption barriers and describing an optical setup plus processing pipeline meant to address them at the same time. That system-level framing is the actual contribution here. The main soft spot is the lack of any supporting data. The abstract states the result without physical dimensions, frame-rate figures, depth-error values, or comparisons to earlier work. The stress-test note is accurate on this point: the design must satisfy three constraints simultaneously, and without explicit measurements or trade-off curves it is impossible to know whether all three are met or whether one was relaxed. If the full manuscript contains those benchmarks and they are solid, the work strengthens; if they are absent or marginal, the headline assertion stays unproven. This is aimed at computer-vision engineers working on depth hardware who want to see a new prototype approach. A reader focused on embedded sensing might extract useful design details even if the performance claims require verification. It deserves peer review because the target problem is practical and the approach is concrete, but any serious referee will need to see the quantitative validation before the central claim can be accepted.

Referee Report

1 major / 0 minor

Summary. The paper claims to introduce the first compact light-field depth camera that simultaneously achieves real-time depth estimation and a small physical form factor, overcoming the high computation time and large size that have prevented prior light-field systems from real-world use.

Significance. If the specific microlens array, sensor, and reconstruction pipeline demonstrably meet the joint constraints of handheld-scale envelope, sustained video-rate output on modest hardware, and usable depth accuracy, the work would enable practical deployment of light-field depth sensing in embedded and mobile applications.

major comments (1)

[Abstract] Abstract: the headline claim that the design 'overcomes' both high computation time and large physical size is presented without any supporting measurements of physical dimensions, sustained frame rate, depth error statistics, or direct comparisons to prior light-field systems; these quantities are load-bearing for the central assertion that all three constraints (size, speed, accuracy) are satisfied simultaneously.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the major comment on the abstract below and agree that strengthening the quantitative support for the central claims will improve the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that the design 'overcomes' both high computation time and large physical size is presented without any supporting measurements of physical dimensions, sustained frame rate, depth error statistics, or direct comparisons to prior light-field systems; these quantities are load-bearing for the central assertion that all three constraints (size, speed, accuracy) are satisfied simultaneously.

Authors: We agree with this observation. The current abstract states the claims at a high level without embedding the supporting numbers. In the revised manuscript we will expand the abstract to include the key quantitative results: the physical envelope of the prototype (dimensions and weight), the sustained frame rate on the target hardware, depth error statistics (e.g., mean absolute error on standard benchmarks), and direct numerical comparisons against representative prior light-field systems. These values are already reported in Sections 4 and 5; the revision will simply surface them in the abstract so that the headline assertion is immediately supported by evidence. revision: yes

Circularity Check

0 steps flagged

No circularity; engineering claims rest on physical implementation and measurements

full rationale

The paper presents a hardware/software design for a compact real-time light-field depth camera. Its central claim is an existence demonstration achieved by construction (specific microlens array, sensor, and pipeline) rather than any derivation, equation, or fitted parameter that reduces to its own inputs. No equations, self-referential definitions, fitted-input predictions, or load-bearing self-citations appear. The result is self-contained against external benchmarks of size, speed, and accuracy.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described or implied in the abstract; the contribution is presented as an engineering integration rather than a theoretical construction.

pith-pipeline@v0.9.0 · 5599 in / 1014 out tokens · 28190 ms · 2026-05-24T16:26:03.016282+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Fast and Eﬃcient Depth Map Estimation from Light Fields,

Y. Anisimov and D. Stricker: “Fast and Eﬃcient Depth Map Estimation from Light Fields,” International Conference on 3D Vision (3DV) , pp.337-346, 2017

work page 2017
[2]

Accurate and eﬃcient stereo processing by semi-global matching and mutual information,

H. Hirschmuller: “Accurate and eﬃcient stereo processing by semi-global matching and mutual information,” IEEE Computer Vision and Pattern Recognition (CVPR) , vol.2, pp.807-814, 2005

work page 2005
[3]

Scene reconstruction from high spatio-angular resolution light ﬁelds,

C. Kim et al.: “Scene reconstruction from high spatio-angular resolution light ﬁelds,” ACM Transactions on Graphics , 2013

work page 2013
[4]

Light ﬁeld rendering,

M. Levoy and P. Hanrahan: “Light ﬁeld rendering,” Computer Graphics and Inter- active Techniques, ACM, 1996

work page 1996
[5]

Lytro: http://www.lytro.com/

work page
[6]

Raytrix: https://raytrix.de/

work page
[7]

GPU-based depth estimation for light ﬁeld images,

Y. Qin et al.: “GPU-based depth estimation for light ﬁeld images,” IEEE Inter- national Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp 640-645, 2017. 12 Yuriy Anisimov, Oliver Wasenm¨ uller, and Didier Stricker

work page 2017
[8]

Unsupervised Depth Estimation from Light Field Using a Convolu- tional Neural Network,

J. Peng et al.: “Unsupervised Depth Estimation from Light Field Using a Convolu- tional Neural Network,” International Conference on 3D Vision (3DV) , pp.295-303, 2018

work page 2018
[9]

Dataset and Pipeline for Multi-view Light-Field Video,

N. Sabater et al.: “Dataset and Pipeline for Multi-view Light-Field Video,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR W) , pp.17431753, 2017

work page 2017
[10]

FlowFields++: Accurate Optical Flow Correspondences Meet Robust Interpolation

R. Schuster et al.: “FlowFields++: Accurate Optical Flow Correspondences Meet Robust Interpolation”, IEEE International Conference on Image Processing (ICIP) , 2018

work page 2018
[11]

Epinet: A fully-convolutional neural network using epipolar ge- ometry for depth from light ﬁeld images,

C. Shin et al.: “Epinet: A fully-convolutional neural network using epipolar ge- ometry for depth from light ﬁeld images,” IEEE Computer Vision and Pattern Recognition (CVPR), pp.4748-4757, 2018

work page 2018
[12]

Globally consistent depth labeling of 4d light ﬁelds,

S. Wanner and B. Goldluecke: “Globally consistent depth labeling of 4d light ﬁelds,” IEEE Computer Vision and Pattern Recognition (CVPR) , pp. 41-48, 2012

work page 2012
[13]

Augmented reality 3D discrepancy check in industrial ap- plications,

O. Wasenm¨ uller et al.: “Augmented reality 3D discrepancy check in industrial ap- plications,” IEEE International Symposium on Mixed and Augmented Reality (IS- MAR), pp.125-134, 2016

work page 2016
[14]

High performance imaging using large camera arrays

B. Wilburn et al.: “High performance imaging using large camera arrays”, ACM Transactions on Graphics (TOG) . Vol. 24. No. 3, 2005

work page 2005
[15]

Time-of-Flight Sensor Depth Enhancement for Automotive Ex- haust Gas,

T. Yoshida et al.: “Time-of-Flight Sensor Depth Enhancement for Automotive Ex- haust Gas,” IEEE International Conference on Image Processing (ICIP) , pp.1955- 1959, 2017

work page 1955
[16]

Non-parametric local transforms for computing visual correspondence,

R. Zabih and J. Woodﬁll: “Non-parametric local transforms for computing visual correspondence,” European Conference on Computer Vision (ECCV) , pp.151-158, Springer, 1994

work page 1994
[17]

Robust depth estimation for light ﬁeld via spinning parallelogram operator,

S. Zhang et al.: “Robust depth estimation for light ﬁeld via spinning parallelogram operator,” Computer Vision and Image Understanding , pp.148-159, 2016

work page 2016
[18]

A Flexible New Technique for Camera Calibration,

Z. Zhang: “A Flexible New Technique for Camera Calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 2000

work page 2000
[19]

Microsoft kinect sensor and its eﬀect,

Z. Zhang: “Microsoft kinect sensor and its eﬀect,” IEEE Multimedia , vol. 19, no. 2, pp.4-10, 2012

work page 2012

[1] [1]

Fast and Eﬃcient Depth Map Estimation from Light Fields,

Y. Anisimov and D. Stricker: “Fast and Eﬃcient Depth Map Estimation from Light Fields,” International Conference on 3D Vision (3DV) , pp.337-346, 2017

work page 2017

[2] [2]

Accurate and eﬃcient stereo processing by semi-global matching and mutual information,

H. Hirschmuller: “Accurate and eﬃcient stereo processing by semi-global matching and mutual information,” IEEE Computer Vision and Pattern Recognition (CVPR) , vol.2, pp.807-814, 2005

work page 2005

[3] [3]

Scene reconstruction from high spatio-angular resolution light ﬁelds,

C. Kim et al.: “Scene reconstruction from high spatio-angular resolution light ﬁelds,” ACM Transactions on Graphics , 2013

work page 2013

[4] [4]

Light ﬁeld rendering,

M. Levoy and P. Hanrahan: “Light ﬁeld rendering,” Computer Graphics and Inter- active Techniques, ACM, 1996

work page 1996

[5] [5]

Lytro: http://www.lytro.com/

work page

[6] [6]

Raytrix: https://raytrix.de/

work page

[7] [7]

GPU-based depth estimation for light ﬁeld images,

Y. Qin et al.: “GPU-based depth estimation for light ﬁeld images,” IEEE Inter- national Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp 640-645, 2017. 12 Yuriy Anisimov, Oliver Wasenm¨ uller, and Didier Stricker

work page 2017

[8] [8]

Unsupervised Depth Estimation from Light Field Using a Convolu- tional Neural Network,

J. Peng et al.: “Unsupervised Depth Estimation from Light Field Using a Convolu- tional Neural Network,” International Conference on 3D Vision (3DV) , pp.295-303, 2018

work page 2018

[9] [9]

Dataset and Pipeline for Multi-view Light-Field Video,

N. Sabater et al.: “Dataset and Pipeline for Multi-view Light-Field Video,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR W) , pp.17431753, 2017

work page 2017

[10] [10]

FlowFields++: Accurate Optical Flow Correspondences Meet Robust Interpolation

R. Schuster et al.: “FlowFields++: Accurate Optical Flow Correspondences Meet Robust Interpolation”, IEEE International Conference on Image Processing (ICIP) , 2018

work page 2018

[11] [11]

Epinet: A fully-convolutional neural network using epipolar ge- ometry for depth from light ﬁeld images,

C. Shin et al.: “Epinet: A fully-convolutional neural network using epipolar ge- ometry for depth from light ﬁeld images,” IEEE Computer Vision and Pattern Recognition (CVPR), pp.4748-4757, 2018

work page 2018

[12] [12]

Globally consistent depth labeling of 4d light ﬁelds,

S. Wanner and B. Goldluecke: “Globally consistent depth labeling of 4d light ﬁelds,” IEEE Computer Vision and Pattern Recognition (CVPR) , pp. 41-48, 2012

work page 2012

[13] [13]

Augmented reality 3D discrepancy check in industrial ap- plications,

O. Wasenm¨ uller et al.: “Augmented reality 3D discrepancy check in industrial ap- plications,” IEEE International Symposium on Mixed and Augmented Reality (IS- MAR), pp.125-134, 2016

work page 2016

[14] [14]

High performance imaging using large camera arrays

B. Wilburn et al.: “High performance imaging using large camera arrays”, ACM Transactions on Graphics (TOG) . Vol. 24. No. 3, 2005

work page 2005

[15] [15]

Time-of-Flight Sensor Depth Enhancement for Automotive Ex- haust Gas,

T. Yoshida et al.: “Time-of-Flight Sensor Depth Enhancement for Automotive Ex- haust Gas,” IEEE International Conference on Image Processing (ICIP) , pp.1955- 1959, 2017

work page 1955

[16] [16]

Non-parametric local transforms for computing visual correspondence,

R. Zabih and J. Woodﬁll: “Non-parametric local transforms for computing visual correspondence,” European Conference on Computer Vision (ECCV) , pp.151-158, Springer, 1994

work page 1994

[17] [17]

Robust depth estimation for light ﬁeld via spinning parallelogram operator,

S. Zhang et al.: “Robust depth estimation for light ﬁeld via spinning parallelogram operator,” Computer Vision and Image Understanding , pp.148-159, 2016

work page 2016

[18] [18]

A Flexible New Technique for Camera Calibration,

Z. Zhang: “A Flexible New Technique for Camera Calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 2000

work page 2000

[19] [19]

Microsoft kinect sensor and its eﬀect,

Z. Zhang: “Microsoft kinect sensor and its eﬀect,” IEEE Multimedia , vol. 19, no. 2, pp.4-10, 2012

work page 2012