Real-Time Highly Accurate Dense Depth on a Power Budget using an FPGA-CPU Hybrid SoC
Pith reviewed 2026-05-24 20:15 UTC · model grok-4.3
The pith
A hybrid FPGA-CPU chip computes dense stereo depth at over 50 frames per second with 8.7 percent error while drawing only 5 watts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a novel stereo method combining the best features of SGM and ELAS on an FPGA-CPU hybrid SoC produces highly accurate dense depth in real time, reaching an 8.7 percent error rate on the KITTI 2015 benchmark at over 50 FPS with a total power draw of only 5 W.
What carries the argument
Partitioning of SGM and ELAS processing steps across the FPGA fabric and CPU cores of the hybrid SoC so that memory-intensive or iterative operations run where they are efficient.
If this is right
- Real-time dense depth becomes available on platforms limited to a few watts of power.
- Stereo pipelines no longer need to sacrifice accuracy to fit within FPGA resource or timing limits.
- Embedded vision systems can now use depth maps whose quality approaches that of full desktop implementations.
Where Pith is reading between the lines
- The same split of regular and irregular computation steps could be tried on other hybrid chips for different vision tasks.
- Power budgets that once ruled out dense depth sensing may now support it, changing the design space for battery-powered robots.
- Multiple such depth pipelines might run concurrently on one low-power SoC if the reported margins hold.
Load-bearing premise
The SGM and ELAS components can be split between FPGA and CPU without large drops in accuracy or violations of real-time timing on the chosen chip.
What would settle it
Running the described implementation on the target SoC and measuring either an error rate above 8.7 percent on KITTI 2015 or power consumption above 5 W at frame rates below 50 FPS would falsify the performance result.
Figures
read the original abstract
Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs. Whilst various stereo algorithms have been deployed on these platforms, usually cut down to better match the embedded architecture, certain key parts of the more advanced algorithms, e.g. those that rely on unpredictable access to memory or are highly iterative in nature, are difficult to deploy efficiently on FPGAs, and thus the depth quality that can be achieved is limited. In this paper, we leverage a FPGA-CPU chip to propose a novel, sophisticated, stereo approach that combines the best features of SGM and ELAS-based methods to compute highly accurate dense depth in real time. Our approach achieves an 8.7% error rate on the challenging KITTI 2015 dataset at over 50 FPS, with a power consumption of only 5W.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hybrid FPGA-CPU stereo depth system that combines SGM and ELAS by mapping unpredictable-memory and iterative ELAS components to the CPU while running the remainder on the FPGA, claiming an 8.7% error rate on KITTI 2015 at >50 FPS with 5 W power draw.
Significance. If the hybrid partitioning demonstrably preserves the accuracy of the combined SGM+ELAS pipeline without introducing latency or synchronization artifacts that violate the real-time bound on the target SoC, the result would be a meaningful contribution to embedded vision, showing how advanced stereo algorithms can be deployed at low power without the usual accuracy trade-offs.
major comments (1)
- [Abstract] Abstract (and any corresponding method section): the headline 8.7% KITTI 2015 error is presented as evidence that the FPGA-CPU split succeeds, yet no ablation, timing profile, or disparity-map comparison is supplied to show that off-loading the iterative ELAS stages to the CPU preserves accuracy or meets the >50 FPS bound on the specific SoC; without this quantitative support the central claim cannot be evaluated.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (and any corresponding method section): the headline 8.7% KITTI 2015 error is presented as evidence that the FPGA-CPU split succeeds, yet no ablation, timing profile, or disparity-map comparison is supplied to show that off-loading the iterative ELAS stages to the CPU preserves accuracy or meets the >50 FPS bound on the specific SoC; without this quantitative support the central claim cannot be evaluated.
Authors: We agree that the abstract and method section would benefit from explicit quantitative support for the hybrid partitioning. The results section of the manuscript reports the end-to-end accuracy and frame rate achieved on the target SoC, but does not include a dedicated ablation isolating the effect of CPU off-loading. In the revised version we will add an ablation comparing the hybrid system to a pure-FPGA baseline, detailed per-stage timing profiles, and side-by-side disparity-map visualizations to confirm that accuracy and the >50 FPS bound are preserved. revision: yes
Circularity Check
No significant circularity; empirical hardware result with no derivation chain
full rationale
The paper reports an empirical implementation result (8.7% KITTI error at >50 FPS on 5W FPGA-CPU SoC) obtained by partitioning SGM and ELAS components. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described approach. The central claim rests on measured hardware performance against an external benchmark (KITTI 2015), which is falsifiable outside any internal construction. This matches the default case of a self-contained empirical paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ElasticFusion: Real-Time Dense SLAM and Light Source Estimation,
T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “ElasticFusion: Real-Time Dense SLAM and Light Source Estimation,” IJRR, vol. 35, no. 14, pp. 1697–1716, 2016
work page 2016
-
[2]
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
V . A. Prisacariu, O. K ¨ahler, S. Golodetz, M. Sapienza, T. Cavallari, P. H. S. Torr, and D. W. Murray, “InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure,” arXiv preprint arXiv:1708.00783v1, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
Collaborative Large-Scale Dense 3D Recon- struction with Online Inter-Agent Pose Optimisation,
S. Golodetz ∗, T. Cavallari ∗, N. A. Lord ∗, V . A. Prisacariu, D. W. Murray, and P. H. S. Torr, “Collaborative Large-Scale Dense 3D Recon- struction with Online Inter-Agent Pose Optimisation,” TVCG, vol. 24, no. 11, 2018
work page 2018
-
[4]
Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images,
J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgib- bon, “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images,” in CVPR, 2013, pp. 2930–2937
work page 2013
-
[5]
On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation,
T. Cavallari, S. Golodetz*, N. A. Lord*, J. Valentin, L. D. Stefano, and P. H. S. Torr, “On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation,” in CVPR, 2017, pp. 4457–4466
work page 2017
-
[6]
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
T. Cavallari*, S. Golodetz*, N. A. Lord*, J. Valentin*, V . A. Prisacariu, L. D. Stefano, and P. H. S. Torr, “Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade,” arXiv preprint arXiv:1810.12163, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
A Depth-Based Head-Mounted Visual Display to Aid Navigation in Partially Sighted Individuals,
S. L. Hicks, I. Wilson, L. Muhammed, J. Worsfold, S. M. Downes, and C. Kennard, “A Depth-Based Head-Mounted Visual Display to Aid Navigation in Partially Sighted Individuals,” PLoS ONE , 2013
work page 2013
-
[8]
A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,
D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,” International Journal of Computer Vision , vol. 47, no. 1-3, pp. 7–42, 2002
work page 2002
-
[9]
Review of Stereo Vision Algorithms and their Suitability for Resource-Limited Systems,
B. Tippetts, D. J. Lee, K. Lillywhite, and J. Archibald, “Review of Stereo Vision Algorithms and their Suitability for Resource-Limited Systems,” Journal of Real-Time Image Processing , vol. 11, no. 1, pp. 5–25, 2016
work page 2016
-
[10]
Stereo Processing by Semiglobal Matching and Mu- tual Information,
H. Hirschmuller, “Stereo Processing by Semiglobal Matching and Mu- tual Information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008
work page 2008
-
[11]
Efficient Large-Scale Stereo Matching,
A. Geiger, M. Roser, and R. Urtasun, “Efficient Large-Scale Stereo Matching,” in Computer Vision–ACCV 2010. Springer, 2010, pp. 25–38
work page 2010
-
[12]
R 3SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems,
O. Rahnama, T. Cavallari ∗, S. Golodetz ∗, S. Walker, and P. H. S. Torr, “R 3SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems,” in FPT, 2018
work page 2018
-
[13]
Real-Time Semi-Global Matching on the CPU,
S. K. Gehrig and C. Rabe, “Real-Time Semi-Global Matching on the CPU,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on . IEEE, 2010, pp. 85–92
work page 2010
-
[14]
Real-Time Semi-Global Matching Disparity Estimation on the GPU,
C. Banz, H. Blume, and P. Pirsch, “Real-Time Semi-Global Matching Disparity Estimation on the GPU,” inComputer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on . IEEE, 2011, pp. 514–521
work page 2011
-
[15]
Embedded real-time stereo estimation via Semi- Global Matching on the GPU,
D. Hernandez-Juarez, A. Chac ´on, A. Espinosa, D. V´azquez, J. C. Moure, and A. M. L ´opez, “Embedded real-time stereo estimation via Semi- Global Matching on the GPU,” Procedia Computer Science , vol. 80, 2016
work page 2016
-
[16]
Design of Real- Time FPGA-based Embedded System for Stereo Vision,
S. Perri, F. Frustaci, F. Spagnolo, and P. Corsonello, “Design of Real- Time FPGA-based Embedded System for Stereo Vision,” in Circuits and Systems (ISCAS), 2018 IEEE International Symposium on . IEEE, 2018, pp. 1–5
work page 2018
-
[17]
Real-time depth processing for embedded platforms,
O. Rahnama, A. Makarov, and P. Torr, “Real-time depth processing for embedded platforms,” in Real-Time Image and Video Processing 2017 , vol. 10223. International Society for Optics and Photonics, 2017, p. 102230N
work page 2017
-
[18]
Real-time high-definition stereo matching on FPGA,
L. Zhang, K. Zhang, T. S. Chang, G. Lafruit, G. K. Kuzmanov, and D. Verkest, “Real-time high-definition stereo matching on FPGA,” in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays . ACM, 2011, pp. 55–64
work page 2011
-
[19]
Stereo vision architecture for heterogeneous systems-on-chip,
S. Perri, F. Frustaci, F. Spagnolo, and P. Corsonello, “Stereo vision architecture for heterogeneous systems-on-chip,” Journal of Real-Time Image Processing, pp. 1–23, 2018
work page 2018
-
[20]
High-quality real-time hardware stereo matching based on guided image filtering,
C. Ttofis and T. Theocharides, “High-quality real-time hardware stereo matching based on guided image filtering,” in Proceedings of the Conference on Design, Automation & Test in Europe . European Design and Automation Association, 2014, p. 356
work page 2014
-
[21]
FPGA based real-time on-road stereo vision system,
M. Dehnavi and M. Eshghi, “FPGA based real-time on-road stereo vision system,” Journal of Systems Architecture , vol. 81, pp. 32–43, 2017
work page 2017
-
[22]
A real-time global stereo-matching on FPGA,
D. Zha, X. Jin, and T. Xiang, “A real-time global stereo-matching on FPGA,” Microprocessors and Microsystems, vol. 47, pp. 419–428, 2016
work page 2016
-
[23]
A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching,
S. K. Gehrig, F. Eberli, and T. Meyer, “A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching,” in International Conference on Computer Vision Systems . Springer, 2009, pp. 134–143
work page 2009
-
[24]
C. Banz, S. Hesselbarth, H. Flatt, H. Blume, and P. Pirsch, “Real-Time Stereo Vision System using Semi-Global Matching Disparity Estima- tion: Architecture and FPGA-Implementation,” in Embedded Computer Systems (SAMOS), 2010 International Conference on . IEEE, 2010, pp. 93–101
work page 2010
-
[25]
A passive RGBD sensor for accurate and real-time depth sensing self-contained into an FPGA,
S. Mattoccia and M. Poggi, “A passive RGBD sensor for accurate and real-time depth sensing self-contained into an FPGA,” in Proceedings of the 9th International Conference on Distributed Smart Cameras . ACM, 2015, pp. 146–151
work page 2015
-
[26]
D. Honegger, H. Oleynikova, and M. Pollefeys, “Real-time and Low Latency Embedded Computer Vision Hardware Based on a Combination of FPGA and Mobile CPU,” in Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on . IEEE, 2014, pp. 4930–4935
work page 2014
-
[27]
Real-Time High- Quality Stereo Vision System in FPGA,
W. Wang, J. Yan, N. Xu, Y . Wang, and F.-H. Hsu, “Real-Time High- Quality Stereo Vision System in FPGA,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 10, pp. 1696–1708, 2015
work page 2015
-
[28]
Real-Time Dense Stereo Matching With ELAS on FPGA-Accelerated Embedded De- vices,
O. Rahnama, D. Frost, O. Miksik, and P. H. Torr, “Real-Time Dense Stereo Matching With ELAS on FPGA-Accelerated Embedded De- vices,” IEEE Robotics and Automation Letters , vol. 3, no. 3, pp. 2008– 2015, 2018
work page 2008
-
[29]
Joint 3D Estimation of Vehicles and Scene Flow,
M. Menze, C. Heipke, and A. Geiger, “Joint 3D Estimation of Vehicles and Scene Flow,” in ISPRS Workshop on Image Sequence Analysis (ISA) , 2015
work page 2015
-
[30]
——, “Object Scene Flow,” ISPRS Journal of Photogrammetry and Remote Sensing (JPRS) , 2018
work page 2018
-
[31]
End-to-end Learning of Cost-V olume Aggregation for Real-time Dense Stereo,
A. Kuzmin, D. Mikushin, and V . Lempitsky, “End-to-end Learning of Cost-V olume Aggregation for Real-time Dense Stereo,” in MLSP, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.