arxiv: 2605.10675 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Neuromorphic Monocular Depth Estimation with Uncertainty Modeling

Viktor Bergkvist , Felix Rydell , Per-Erik Forss\'en , David Gustafsson , Johan Rideg

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords event camerasmonocular depth estimationuncertainty modelingevidential learningneuromorphic visiondepth distributionsspatio-temporal representations

0 comments

The pith

Integrating uncertainty estimation into neural networks allows event-based monocular depth prediction to flag reliable pixels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper trains deep networks to output per-pixel depth distributions from monocular streams of events captured by event cameras. It tests three uncertainty frameworks—Gaussian, log-normal, and evidential—across six different event representations and shows that the uncertainty values can mark which depth estimates are trustworthy. Training begins with synthetic data and continues with fine-tuning on real sequences. A sympathetic reader would care because event cameras run at high speed with low power, so knowing when a depth value is unreliable matters for any downstream task that needs 3D information.

Core claim

We predict per-pixel depth distributions from monocular event streams using U-Net models and estimate uncertainty with Gaussian, log-normal, and evidential learning frameworks. We compare six event representations and find that the representations perform similarly after synthetic pre-training and real fine-tuning, with 10-bin log-normal and 5-bin evidential models performing best across absolute relative error, root mean squared error, and area under the sparsification error. Our experiments demonstrate that uncertainty estimation can be successfully integrated into event-based monocular depth estimation and used to indicate pixels with reliable depth.

What carries the argument

U-Net models that predict depth distributions while simultaneously estimating uncertainty, applied to event representations such as multi-bin spatio-temporal voxel grids, CSTR, and TORE volumes.

If this is right

Uncertainty estimates can be used to identify and ignore pixels whose depth values are likely incorrect.
10 temporal bin voxel grids paired with log-normal uncertainty and 5 temporal bin voxel grids paired with evidential learning achieve the strongest results on standard depth and sparsification metrics.
Performance remains comparable across the tested event representations once the models are fine-tuned on real data.
Synthetic pre-training followed by limited real fine-tuning supports deployment in practical event-camera settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robotic systems using event cameras could discard uncertain depth values before making navigation decisions.
Uncertainty maps might guide active sensing strategies that allocate more events to uncertain regions.
The same uncertainty machinery could be tested on longer temporal sequences to check consistency over time.

Load-bearing premise

Fine-tuning on a limited set of real sequences after synthetic pre-training produces depth and uncertainty estimates that generalize to new real-world environments without significant domain shift or overfitting.

What would settle it

An evaluation on held-out real event sequences in which removing high-uncertainty pixels fails to reduce the sparsification error would show that the uncertainty estimates do not reliably indicate accurate depth.

Figures

Figures reproduced from arXiv: 2605.10675 by David Gustafsson, Felix Rydell, Johan Rideg, Per-Erik Forss\'en, Viktor Bergkvist.

**Figure 1.** Figure 1: Overview of the network architecture used for predicting depth and uncertainty. The blue arrows denote skip connections. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 3.** Figure 3: VI. DISCUSSION AND FUTURE WORK The goal of this work has been to develop and evaluate monocular depth estimation networks that predict per-pixel depth together with uncertainty estimates from a single event camera stream. To this end, six event data representations were compared in combination with Gaussian, log-normal, and evidential deep learning (EDL) uncertainty modelling. The models were initially tra… view at source ↗

**Figure 2.** Figure 2: Sparsification curves for some combinations of uncertainty models and event representations tested on MVSEC indoor flying 1. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of depth predictions for six combinations of uncertainty modeling approaches and event representations. Orange indicates closer regions, [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of predicted uncertainty across methods. Black indicates low predictive standard deviation (or epistemic uncertainty), while white [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Event cameras offer distinct advantages over conventional frame-based sensors, including microsecond-level temporal resolution, high dynamic range, and low bandwidth. In this paper, we predict per-pixel depth distributions from monocular event streams using deep neural networks. We estimate uncertainty using Gaussian, log-normal, and evidential learning frameworks. We compare six event representations: spatio-temporal voxel grids with 1, 5, 10, and 20 temporal bins, the Compact Spatio-Temporal Representation (CSTR), and Time-Ordered Recent Event (TORE) volumes. Our U-Net-based models are trained on synthetic data and then fine-tuned on real sequences. We evaluate performance using absolute relative error, root mean squared error, and the area under the sparsification error. Quantitative results show that the representations perform similarly, while 10 bin log-normal and 5 bin evidential learning perform best across metrics. Our experiments demonstrate that uncertainty estimation can be successfully integrated into event-based monocular depth estimation, and be used to indicate pixels with reliable depth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful head-to-head comparison of event representations and uncertainty heads for monocular depth, but the real-world generalization of the uncertainty estimates rests on thin evidence.

read the letter

The main thing to know is that the paper runs a clean bake-off of six event representations against three uncertainty models in a U-Net for depth from events, with 10-bin log-normal and 5-bin evidential coming out ahead on standard metrics. That comparison itself is new enough to be worth noting for people in this niche. They pre-train on synthetic data then fine-tune on real sequences and report absolute relative error, RMSE, and area under sparsification error, which lets readers see which combos actually move the numbers. The work is straightforward and the evaluation stays consistent across variants, which is the kind of incremental but practical result that helps practitioners pick a starting point without guessing. The paper does not claim a new paradigm or first-principles derivation, and it does not overstate what the numbers show. The soft spot is exactly the one the stress-test note flags: the claim that uncertainty indicates reliable pixels in new real settings. The abstract gives no details on how the real sequences were split, how many scenes were involved, or whether any held-out real test set came from a different environment or sensor. If domain shift hits the uncertainty calibration, the AUSE scores on the fine-tuning data do not prove the broader point. Missing error bars and statistical tests are minor but add to the sense that the central claim is only moderately supported. This paper is for researchers already working on event cameras and depth estimation who need to know which representation and uncertainty model to try first. A reader outside that subfield will get little beyond the result table. It deserves a serious referee because the comparison is new experimental content and the setup is reproducible in principle. I would send it for review and ask the referees to focus on the real-data protocol and any cross-environment checks.

Referee Report

3 major / 2 minor

Summary. The paper proposes U-Net models for per-pixel depth distribution prediction from monocular event camera streams, incorporating uncertainty via Gaussian, log-normal, and evidential frameworks. It evaluates six event representations (voxel grids with varying temporal bins, CSTR, TORE) after synthetic pre-training and real-sequence fine-tuning, reporting best performance for 10-bin log-normal and 5-bin evidential variants on absolute relative error, RMSE, and AUSE metrics. The central claim is that uncertainty estimation integrates successfully into event-based depth estimation and can indicate pixels with reliable depth.

Significance. If the empirical results and generalization claims hold, the work provides a useful empirical benchmark for uncertainty-aware neuromorphic depth estimation, highlighting practical combinations of representations and uncertainty models that could aid robust perception in high-dynamic-range or low-light scenarios. The AUSE-based evaluation of uncertainty calibration is a positive aspect, as is the systematic comparison across representations.

major comments (3)

[Abstract] Abstract: The claim that uncertainty 'can be used to indicate pixels with reliable depth' is load-bearing but unsupported by evidence of generalization. The abstract reports only aggregate AUSE scores without describing the real-data split, number of sequences, scene diversity, or any held-out real test set drawn from a different environment or sensor; AUSE on the fine-tuning distribution alone cannot substantiate the generalization assumption.
[Experiments] Experiments section (inferred from quantitative results description): No details are provided on statistical testing, error bars, or variance across multiple training runs for the reported metrics (e.g., the best 10-bin log-normal and 5-bin evidential results). This weakens confidence in the ranking of representations and uncertainty frameworks.
[Abstract] Abstract and evaluation: Baseline comparisons are limited to internal variants; the manuscript does not report comparisons against prior event-based depth methods or frame-based equivalents on the same real sequences, making it difficult to assess the absolute advance in depth accuracy or uncertainty quality.

minor comments (2)

[Methods] Clarify the exact definitions and hyperparameters of the six event representations (e.g., how temporal bin counts are chosen and normalized) in the methods section to improve reproducibility.
[Abstract] The abstract states 'quantitative results show that the representations perform similarly' yet highlights specific winners; add a table or figure with all metric values for all combinations to support this statement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the AUSE evaluation and systematic comparisons. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] The claim that uncertainty 'can be used to indicate pixels with reliable depth' is load-bearing but unsupported by evidence of generalization. The abstract reports only aggregate AUSE scores without describing the real-data split, number of sequences, scene diversity, or any held-out real test set drawn from a different environment or sensor; AUSE on the fine-tuning distribution alone cannot substantiate the generalization assumption.

Authors: We appreciate the referee's point on clarifying the scope of our claims. The evaluation uses held-out portions of the real sequences after synthetic pre-training and fine-tuning, demonstrating that uncertainty correlates with depth error on unseen real data from the same sensor and environment distribution. In the revised manuscript, we will expand both the abstract and experiments section to explicitly describe the real-data splits (train/validation/test partitioning), number of sequences, and scene diversity. We will also qualify the generalization statement to reflect that the results support reliable-depth indication within the evaluated real sequences, while noting cross-environment or cross-sensor generalization as future work. These additions will better ground the abstract claim. revision: yes
Referee: [Experiments] No details are provided on statistical testing, error bars, or variance across multiple training runs for the reported metrics (e.g., the best 10-bin log-normal and 5-bin evidential results). This weakens confidence in the ranking of representations and uncertainty frameworks.

Authors: We agree that reporting variability across runs would increase confidence in the metric rankings. In the revised version, we will add error bars (standard deviation across 3-5 independent training runs with different random seeds) for the key absolute relative error, RMSE, and AUSE results. We will also note any statistically significant differences between the top variants where appropriate. This revision directly addresses the concern without altering the core findings. revision: yes
Referee: [Abstract] Baseline comparisons are limited to internal variants; the manuscript does not report comparisons against prior event-based depth methods or frame-based equivalents on the same real sequences, making it difficult to assess the absolute advance in depth accuracy or uncertainty quality.

Authors: The manuscript's primary aim is a controlled, systematic benchmark of event representations and uncertainty models under a unified training protocol rather than establishing new state-of-the-art depth accuracy. We will revise the related-work and experiments sections to include a discussion of prior event-based depth methods, referencing their reported metrics on comparable datasets, and to contextualize our depth accuracy and uncertainty quality relative to those works. Direct comparisons on identical real sequences are limited by differences in data splits and protocols across the literature; we will explicitly acknowledge this constraint while emphasizing the value of the internal ablation for isolating representation and uncertainty effects. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical training and metric evaluation on held-out data

full rationale

The paper trains U-Net models on synthetic event data, fine-tunes on real sequences, and reports aggregate metrics (Abs Rel, RMSE, AUSE) for depth and uncertainty. No equations, predictions, or uniqueness claims are present that reduce by construction to fitted parameters, self-citations, or ansatzes defined in terms of the target outputs. The pipeline is self-contained empirical ML work with standard held-out evaluation; the reader's assessment of score 1.0 is consistent with this finding.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claims rest on standard deep-learning assumptions about network expressivity and the transferability of synthetic event data to real sequences; no new physical entities or ad-hoc constants are introduced beyond the usual trainable weights.

free parameters (2)

Temporal bin counts = 1,5,10,20
The choices of 1, 5, 10, and 20 bins are selected by the authors to compare temporal resolution.
U-Net weights
All network parameters are fitted to synthetic data then fine-tuned on real sequences.

axioms (1)

domain assumption Event streams can be losslessly or near-losslessly converted into fixed-size spatio-temporal volumes suitable for convolutional networks.
Invoked by the choice of all six input representations.

pith-pipeline@v0.9.0 · 5487 in / 1439 out tokens · 40419 ms · 2026-05-12T04:50:55.565538+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We estimate uncertainty using Gaussian, log-normal, and evidential learning frameworks... U-Net-based models are trained on synthetic data and then fine-tuned on real sequences... area under the sparsification error
IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

spatio-temporal voxel grids with 1, 5, 10, and 20 temporal bins... 8-tick period never mentioned

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

Event-based vision: A survey,

G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scara- muzza, “Event-based vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 154–180, 2022

work page 2022
[2]

Unsupervised event- based learning of optical flow, depth and egomotion,

A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event- based learning of optical flow, depth and egomotion,” inConference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[3]

Learning monocular dense depth from events,

J. Hidalgo-Carrio, D. Gehrig, and D. Scaramuzza, “Learning monocular dense depth from events,”IEEE International Conference on 3D Vision.(3DV), 2020. [Online]. Available:{http://rpg.ifi.uzh.ch/docs/ 3DV20 Hidalgo.pdf}

work page 2020
[4]

Cstr: A compact spatio-temporal representation for event-based vision,

Z. A. El Shair, A. Hassani, and S. A. Rawashdeh, “Cstr: A compact spatio-temporal representation for event-based vision,”IEEE Access, pp. 102 899–102 916, 2023

work page 2023
[5]

Time-ordered recent event (TORE) volumes for event cameras,

R. W. Baldwin, R. Liu, M. Almatrafi, V . Asari, and K. Hirakawa, “Time-ordered recent event (TORE) volumes for event cameras,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2519–2532, 2023

work page 2023
[6]

Depth anything: Unleashing the power of large-scale unlabeled data,

L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, “Depth anything: Unleashing the power of large-scale unlabeled data,” inCVPR, 2024

work page 2024
[7]

Blinkvision: A benchmark for optical flow, scene flow and point tracking estimation using rgb frames and events,

Y . Li, Y . Shen, Z. Huang, W. B. Shuo Chen, X. Shi, F.-Y . Wang, K. Sun, H. Bao, Z. Cui, G. Zhang, and H. Li, “Blinkvision: A benchmark for optical flow, scene flow and point tracking estimation using rgb frames and events,” inEuropean Conference on Computer Vision (ECCV), 2024

work page 2024
[8]

Dvs-voltmeter: Stochastic process-based event simulator for dynamic vision sensors,

S. Lin, Y . Ma, Z. Guo, and B. Wen, “Dvs-voltmeter: Stochastic process-based event simulator for dynamic vision sensors,” in Computer Vision – ECCV 2022, 2022. [Online]. Available: https: //doi.org/10.1007/978-3-031-20071-7 34

work page doi:10.1007/978-3-031-20071-7 2022
[9]

The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,

A. Z. Zhu, D. Thakur, T. ¨Ozaslan, B. Pfrommer, V . Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,”IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2032–2039, 2018

work page 2032
[10]

Address-event based stereo vision with bio-inspired silicon retina imagers,

J. Kogler, C. Sulzbachner, M. Humenberger, and F. Eibensteiner, “Address-event based stereo vision with bio-inspired silicon retina imagers,” inAdvances in Theory and Applications of Stereo Vision, A. Bhatti, Ed. Rijeka: IntechOpen, 2011, ch. 9. [Online]. Available: https://doi.org/10.5772/12941

work page doi:10.5772/12941 2011
[11]

On the use of orientation filters for 3d reconstruction in event-driven stereo vision,

L. A. Camunas-Mesa, T. Serrano-Gotarredona, S. H. Ieng, R. B. Benosman, and B. Linares-Barranco, “On the use of orientation filters for 3d reconstruction in event-driven stereo vision,” Frontiers in Neuroscience, vol. V olume 8 - 2014, 2014. [Online]. Available: https://www.frontiersin.org/journals/neuroscience/articles/10. 3389/fnins.2014.00048

work page arXiv 2014
[12]

Asynchronous event-based binocular stereo matching,

P. Rogister, R. Benosman, S.-H. Ieng, P. Lichtsteiner, and T. Delbruck, “Asynchronous event-based binocular stereo matching,”IEEE Transac- tions on Neural Networks and Learning Systems, vol. 23, no. 2, pp. 347–353, 2012

work page 2012
[13]

EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time,

H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time,”Int. J. Comput. Vis., vol. 126, pp. 1394–1414, Dec. 2018. Accepted to the Challenges and Opportunities of Neuromorphic Field Robotics and Automation IEEE ICRA Workshop - 2026 Events Ground truth E2Depth, 5 Bin Lo...

work page 2018
[14]

A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,

G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3867–3876

work page 2018
[15]

Focus Is All You Need: Loss Functions for Event-Based Vision ,

G. Gallego, M. Gehrig, and D. Scaramuzza, “ Focus Is All You Need: Loss Functions for Event-Based Vision ,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2019. [Online]. Available: https: //doi.ieeecomputersociety.org/10.1109/CVPR.2019.01256

work page doi:10.1109/cvpr.2019.01256 2019
[16]

Secrets of event-based optical flow,

S. Shiba, Y . Aoki, and G. Gallego, “Secrets of event-based optical flow,” inEuropean Conference on Computer Vision (ECCV), 2022, pp. 628– 645

work page 2022
[17]

Dtam: Dense tracking and mapping in real-time,

R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in2011 international conference on computer vision. IEEE, 2011, pp. 2320–2327

work page 2011
[18]

Semi-dense 3d reconstruction with a stereo event camera,

Y . Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza, “Semi-dense 3d reconstruction with a stereo event camera,” inProceed- ings of the European conference on computer vision (ECCV), 2018, pp. 235–251

work page 2018
[19]

Active Event Alignment for Monocular Distance Estimation ,

N. Cai and P. Bideau, “ Active Event Alignment for Monocular Distance Estimation ,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE Computer Society, 2025, pp. 2464– 2473

work page 2025
[20]

U-net: Convolutional networks for biomedical image segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation.”International Conference on Med- ical image computing, 2015

work page 2015
[21]

Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,

D. Gehrig, M. R ¨uegg, M. Gehrig, J. Hidalgo-Carrio, and D. Scaramuzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,”IEEE Robotic and Automation Letters. (RA-L), 2021. [Online]. Available: http://rpg.ifi.uzh. ch/docs/RAL21 Gehrig.pdf

work page 2021
[22]

Learning monocular depth from events via egomotion compensation,

H. Meng, C. Zhong, S. Tang, L. JunJia, W. Lin, Z. Bing, Y . Chang, G. Chen, and A. Knoll, “Learning monocular depth from events via egomotion compensation,” 2024, arXiv preprint arXiv:2412.19067, sub- mitted Dec. 26, 2024

work page arXiv 2024
[23]

Depth anyevent: A cross-modal distillation paradigm for event-based monoc- ular depth estimation,

L. Bartolomei, E. Mannocci, F. Tosi, M. Poggi, and S. Mattoccia, “Depth anyevent: A cross-modal distillation paradigm for event-based monoc- ular depth estimation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 19 669–19 678

work page 2025
[24]

Explor- ing event-based human pose estimation with 3d event representations,

X. Yin, H. Shi, J. Chen, Z. Wang, Y . Ye, K. Yang, and K. Wang, “Explor- ing event-based human pose estimation with 3d event representations,” Computer Vision and Image Understanding, p. 104189, 2024

work page 2024
[25]

Evidential deep learning to quantify classification uncertainty,

M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[26]

Deep evidential regression,

A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,”Advances in neural information processing systems, vol. 33, pp. 14 927–14 937, 2020

work page 2020
[27]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProceedings of the 3rd International Conference on Learning Representations (ICLR), 2015. [Online]. Available: https://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015
[28]

Tore-based disparity estimation in stereo event-only vision,

R. Liu, R. W. Baldwin, V . Asari, and K. Hirakawa, “Tore-based disparity estimation in stereo event-only vision,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, cVPR 2021 Workshop on Event-Based Vision DSEC Competition Submission 1. [On- line]. Available: https://dsec.ifi.uzh.ch/wp-content/uploa...

work page 2021
[29]

Uncertainty quantification metrics for deep regression,

S. Kristoffersson Lind, Z. Xiong, P.-E. Forss ´en, and V . Kr ¨uger, “Uncertainty quantification metrics for deep regression,”Pattern Recognition Letters, vol. 186, pp. 91–97, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865524002733

work page 2024
[30]

Explor- ing event-based human pose estimation with 3d event representations,

X. Yin, H. Shi, J. Chen, Z. Wang, Y . Ye, K. Yang, and K. Wang, “Explor- ing event-based human pose estimation with 3d event representations,” Computer Vision and Image Understanding, 2023

work page 2023
[31]

Transformer-based attention networks for continuous pixel-wise prediction,

G. Yang, H. Tang, M. Ding, N. Sebe, and E. Ricci, “Transformer-based attention networks for continuous pixel-wise prediction,” inICCV, 2021. Accepted to the Challenges and Opportunities of Neuromorphic Field Robotics and Automation IEEE ICRA Workshop - 2026

work page 2021