Receding-Horizon Maximum-Likelihood Estimation of Neural-ODE Dynamics and Thresholds from Event Cameras
Pith reviewed 2026-05-15 15:44 UTC · model grok-4.3
The pith
A receding-horizon maximum-likelihood estimator recovers Neural ODE dynamics parameters and contrast thresholds from event camera streams.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The receding-horizon maximum-likelihood estimator jointly recovers the Neural ODE parameters and the contrast threshold by optimizing the event likelihood over sliding data windows, using a differentiable state-to-image mapping and a smooth surrogate for the contrast-threshold trigger, while storing only two scalars per pixel for streaming operation.
What carries the argument
Receding-horizon maximum-likelihood optimization over a sliding window of events, with the log-likelihood formed from a marked point process whose conditional intensity is a smooth surrogate of contrast-threshold crossing.
If this is right
- Joint recovery of dynamics and threshold parameters is feasible from synthetic event streams.
- Longer horizon windows improve accuracy at the cost of increased estimation latency.
- Streaming computation is achieved by retaining only last-event time and log-intensity per pixel.
- Monte Carlo pixel subsampling keeps the compensator integral tractable during online updates.
Where Pith is reading between the lines
- The same estimator structure could be adapted to other asynchronous sensors that produce threshold-crossing events.
- Real-time parameter tracking might enable adaptive control loops that update the dynamics model on the fly.
- Extensions could incorporate uncertainty quantification on the recovered parameters for safety-critical applications.
Load-bearing premise
The Neural ODE together with its smooth surrogate intensity function must accurately represent the true continuous dynamics and the physical event-triggering process.
What would settle it
Run the estimator on real event data from a physical system whose true dynamics and contrast threshold are known independently; mismatch between recovered and true values would falsify the modeling assumption.
Figures
read the original abstract
Event cameras emit asynchronous brightness-change events where each pixel triggers an event when the last event exceeds a threshold, yielding a history-dependent measurement model. We address online maximum-likelihood identification of continuous-time dynamics from such streams. The latent state follows a Neural ODE and is mapped to predicted log-intensity through a differentiable state-to-image model. We model events with a history-dependent marked point process whose conditional intensity is a smooth surrogate of contrast-threshold triggering, treating the contrast threshold as an unknown parameter. The resulting log-likelihood consists of an event term and a compensator integral. We propose a receding-horizon estimator that performs a few gradient steps per update on a receding horizon window. For streaming evaluation, we store two scalars per pixel (last-event time and estimated log-intensity at that time) and approximate the compensator via Monte Carlo pixel subsampling. Synthetic experiments demonstrate joint recovery of dynamics parameters and the contrast threshold, and characterize accuracy--latency trade-offs with respect to the window length.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a receding-horizon maximum-likelihood estimator for jointly identifying Neural-ODE parameters and the contrast threshold from asynchronous event-camera streams. The latent state evolves according to a Neural ODE, is mapped to log-intensity via a differentiable state-to-image model, and events are modeled as a history-dependent marked point process whose intensity is a smooth surrogate of threshold triggering; the log-likelihood (event term plus compensator integral) is maximized by a few gradient steps on each receding window, with Monte Carlo pixel subsampling used to approximate the compensator for streaming operation. Synthetic experiments are presented to demonstrate joint recovery and accuracy-latency trade-offs versus window length.
Significance. If the synthetic recovery results prove robust under higher-fidelity integration and quantitative benchmarking, the framework would supply a statistically principled online method for continuous-time system identification from event data, directly addressing a gap in event-based vision for control and robotics by treating the sensor threshold as an estimable parameter rather than a fixed constant.
major comments (2)
- [§4 (Synthetic Experiments)] §4 (Synthetic Experiments): the central claim of joint recovery of Neural-ODE weights and contrast threshold is supported only by qualitative or unreported quantitative results; the abstract and available description provide no parameter-error metrics, baselines, number of Monte Carlo trials, or error bars, leaving the strength of the evidence for the main contribution unclear.
- [Compensator approximation] Compensator approximation (streaming implementation): Monte Carlo subsampling of pixels to approximate the compensator integral stores only two scalars per pixel and performs only a few gradient steps per window; the resulting stochastic gradient variance directly affects updates to both the Neural-ODE parameters and the threshold, yet no analysis or variance-reduction technique is provided to bound its effect on recovery accuracy.
minor comments (2)
- [Abstract] Abstract: the description of the synthetic data-generation process, network architecture, and exact form of the smooth surrogate intensity function is omitted, making it difficult to reproduce or assess the modeling assumptions.
- [Notation] Notation: the precise functional dependence of the surrogate intensity on the contrast threshold parameter should be stated explicitly (e.g., as an equation) rather than described only in prose.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us strengthen the presentation of our results. We address each major comment below and have revised the manuscript to incorporate additional quantitative evidence and analysis.
read point-by-point responses
-
Referee: [§4 (Synthetic Experiments)] the central claim of joint recovery of Neural-ODE weights and contrast threshold is supported only by qualitative or unreported quantitative results; the abstract and available description provide no parameter-error metrics, baselines, number of Monte Carlo trials, or error bars, leaving the strength of the evidence for the main contribution unclear.
Authors: We agree that explicit quantitative metrics strengthen the evidence. In the revised manuscript we have added a new table in §4 reporting mean parameter recovery error (MSE on Neural-ODE weights and on the contrast threshold) across 20 independent Monte Carlo trials, together with standard-error bars for each window length. We also include a baseline comparison against a fixed-threshold estimator that does not jointly optimize the contrast parameter. These additions directly quantify the joint-recovery performance claimed in the abstract. revision: yes
-
Referee: Compensator approximation (streaming implementation): Monte Carlo subsampling of pixels to approximate the compensator integral stores only two scalars per pixel and performs only a few gradient steps per window; the resulting stochastic gradient variance directly affects updates to both the Neural-ODE parameters and the threshold, yet no analysis or variance-reduction technique is provided to bound its effect on recovery accuracy.
Authors: We acknowledge that an explicit variance analysis was missing. The revised §3.3 now includes a short derivation of the Monte Carlo variance of the compensator estimator (proportional to 1/M where M is the number of subsampled pixels) and reports an empirical study showing that the gradient variance remains below 0.015 for M=128 across the tested window lengths. We also note that the receding-horizon scheme with only a few gradient steps per window limits error accumulation; a full theoretical convergence bound under stochastic gradients is left for future work as it would require additional assumptions on the Neural-ODE Lipschitz constants. revision: partial
Circularity Check
No significant circularity; derivation follows independent MLE principles
full rationale
The paper constructs the log-likelihood directly from the marked point-process model with conditional intensity defined via the Neural-ODE state and smooth surrogate, independent of the fitted parameters. The receding-horizon estimator applies standard gradient steps to this likelihood on a sliding window, while the Monte Carlo pixel subsampling is an implementation approximation for the compensator integral rather than a definitional reduction. No equations reduce the claimed recovery of dynamics and threshold to the inputs by construction, no uniqueness theorems or ansatzes are imported via self-citation, and the synthetic experiments function as external validation rather than fitted-input predictions. The central claim therefore remains self-contained against the modeling assumptions.
Axiom & Free-Parameter Ledger
free parameters (2)
- Neural ODE network weights
- contrast threshold
axioms (2)
- domain assumption The state-to-image mapping is differentiable
- domain assumption The smooth surrogate conditional intensity correctly models the marked point process of events
Reference graph
Works this paper leans on
-
[1]
G. Gallegoet al., “Event-based vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022
work page 2022
-
[2]
A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,
P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008. 9 Fig. 6: Snapshots of pixel-dependent contrast-threshold estimation over online episodes (H= 15).Left:ground-truth threshold fieldC(u)used for event synthesis.Rig...
work page 2008
-
[3]
A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor,
C. Brandli, R. Berner, M.-H. Yang, S.-C. Liu, and T. Delbruck, “A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor,”IEEE Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2333– 2341, 2014
work page 2014
-
[4]
Ev-flownet: Self- supervised optical flow estimation for event-based cameras,
A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Ev-flownet: Self- supervised optical flow estimation for event-based cameras,” inRobotics: Science and Systems (RSS), 2018. Fig. 9: Mean update time per step as a function ofH
work page 2018
-
[5]
E-RAFT: Dense optical flow from event cameras,
M. Gehrig, M. Millh ¨ausler, D. Gehrig, and D. Scaramuzza, “E-RAFT: Dense optical flow from event cameras,” inInternational Conference on 3D Vision (3DV). IEEE, 2021, pp. 197–206
work page 2021
-
[6]
Dense continuous-time optical flow from event cameras,
M. Gehrig, M. Muglikar, and D. Scaramuzza, “Dense continuous-time optical flow from event cameras,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 7, pp. 4736–4746, 2024
work page 2024
-
[7]
Events-to-video: Bringing modern computer vision to event cameras,
H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “Events-to-video: Bringing modern computer vision to event cameras,” inProc. IEEE/CVF CVPR, 2019
work page 2019
-
[8]
Continuous-time intensity estimation using event cameras,
C. Scheerlinck, N. Barnes, and R. Mahony, “Continuous-time intensity estimation using event cameras,” inAsian Conference on Computer Vision (ACCV), 2018, pp. 308–324
work page 2018
-
[9]
High frame rate video reconstruction based on an event camera,
L. Pan, R. Hartley, C. Scheerlinck, M. Liu, X. Yu, and Y . Dai, “High frame rate video reconstruction based on an event camera,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2519–2533, 2022
work page 2022
-
[10]
Time lens: Event-based video frame interpolation,
S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y . Li, and D. Scaramuzza, “Time lens: Event-based video frame interpolation,” in Proc. IEEE/CVF CVPR, 2021, pp. 16 155–16 164
work page 2021
-
[11]
E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza, “The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam,”The International Journal of Robotics Research, vol. 36, no. 2, pp. 142–149, 2017
work page 2017
-
[12]
Esim: an open event camera simulator,
H. Rebecq, D. Gehrig, and D. Scaramuzza, “Esim: an open event camera simulator,” inConference on Robot Learning (CoRL), 2018
work page 2018
-
[13]
Event-driven sensing for efficient perception: Vision and audition algorithms,
S.-C. Liu, B. Rueckauer, E. Ceolini, A. E. G. Huber, and T. Delbruck, “Event-driven sensing for efficient perception: Vision and audition algorithms,”IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 29– 37, 2019
work page 2019
-
[14]
Learning adaptive parameter representation for event-based video reconstruction,
D. Gu, J. Li, and L. Zhu, “Learning adaptive parameter representation for event-based video reconstruction,”IEEE Signal Processing Letters, vol. 31, pp. 1950–1954, 2024. 10
work page 1950
-
[15]
Event-based stereo depth es- timation by temporal-spatial context learning,
W. Chen, Y . Zhang, X. Sun, and F. Wu, “Event-based stereo depth es- timation by temporal-spatial context learning,”IEEE Signal Processing Letters, vol. 31, pp. 1429–1433, 2024
work page 2024
-
[16]
Perfect recovery and sensitivity analysis of time encoded bandlimited signals,
A. A. Lazar and L. T. T ´oth, “Perfect recovery and sensitivity analysis of time encoded bandlimited signals,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 10, pp. 2060–2073, 2004
work page 2060
-
[17]
Regularized signal reconstruction for level-crossing sampling using slepian functions,
S. Senay, J. Oh, and L. F. Chaparro, “Regularized signal reconstruction for level-crossing sampling using slepian functions,”Signal Processing, vol. 92, no. 4, pp. 1157–1165, 2012
work page 2012
-
[18]
Asynchronous process- ing of sparse signals,
A. Can-Cimino, E. Sejdi ´c, and L. F. Chaparro, “Asynchronous process- ing of sparse signals,”IET Signal Processing, vol. 8, no. 3, pp. 257–266, 2014
work page 2014
-
[19]
Time encoding via unlimited sampling: Theory, algorithms and hardware validation,
D. Florescu and A. Bhandari, “Time encoding via unlimited sampling: Theory, algorithms and hardware validation,”IEEE Transactions on Signal Processing, vol. 70, pp. 4912–4924, 2022
work page 2022
-
[20]
Robust online reconstruction of continuous-time signals from a lean spike train ensemble code,
A. Chattopadhyay and A. Banerjee, “Robust online reconstruction of continuous-time signals from a lean spike train ensemble code,”IEEE Transactions on Signal Processing, vol. 73, pp. 2008–2021, 2025
work page 2008
-
[21]
Filtering and detection for doubly stochastic Poisson processes,
D. L. Snyder, “Filtering and detection for doubly stochastic Poisson processes,”IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 91–102, 1972
work page 1972
-
[22]
Estimating a state-space model from point process observations,
A. C. Smith and E. N. Brown, “Estimating a state-space model from point process observations,”Neural Computation, vol. 15, no. 5, pp. 965–991, 2003
work page 2003
-
[23]
Dy- namic analysis of neural encoding by point process adaptive filtering,
U. T. Eden, L. M. Frank, R. Barbieri, V . Solo, and E. N. Brown, “Dy- namic analysis of neural encoding by point process adaptive filtering,” Neural Computation, vol. 16, no. 5, pp. 971–998, 2004
work page 2004
-
[24]
Event-triggered proximal online gradient de- scent algorithm for parameter estimation,
Y . Zhou and G. Chen, “Event-triggered proximal online gradient de- scent algorithm for parameter estimation,”IEEE Transactions on Signal Processing, vol. 72, pp. 2594–2606, 2024
work page 2024
-
[25]
Event-triggered state estimation through confidence level,
W. Liu, “Event-triggered state estimation through confidence level,” IEEE Transactions on Signal Processing, vol. 73, pp. 1337–1350, 2025
work page 2025
-
[26]
Asymptotic error rates for point process classifi- cation,
X. Rong and V . Solo, “Asymptotic error rates for point process classifi- cation,”IEEE Transactions on Signal Processing, vol. 73, pp. 738–750, 2025
work page 2025
-
[27]
A bayesian mixture model of temporal point processes with determinantal point process prior,
Y . Dong, S. Ye, Q. Han, Y . Cao, H. Xu, and H. Yang, “A bayesian mixture model of temporal point processes with determinantal point process prior,”IEEE Transactions on Signal Processing, vol. 73, pp. 2216–2226, 2025
work page 2025
-
[28]
Reducing the sim-to-real gap for event cameras,
T. Stoffregen, C. Scheerlinck, D. Scaramuzza, T. Drummond, N. Barnes, L. Kleeman, and R. Mahony, “Reducing the sim-to-real gap for event cameras,” inComputer Vision – ECCV 2020, ser. Lecture Notes in Computer Science, vol. 12372. Springer, 2020, pp. 534–549
work page 2020
-
[29]
Event camera calibration of per-pixel biased contrast threshold,
Z. Wang, Y . Ng, P. van Goor, and R. Mahony, “Event camera calibration of per-pixel biased contrast threshold,” inAustralasian Conference on Robotics and Automation (ACRA), 2019, author-accepted version available as arXiv:2012.09378
-
[30]
State space models for event cameras,
N. Zubi ´c, M. Gehrig, and D. Scaramuzza, “State space models for event cameras,” inProc. IEEE/CVF CVPR, 2024, pp. 5819–5828
work page 2024
-
[31]
G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” inProc. IEEE CVPR, 2018
work page 2018
-
[32]
Event cameras, contrast maximization and reward functions: an analysis,
T. Stoffregen and L. Kleeman, “Event cameras, contrast maximization and reward functions: an analysis,” inProc. IEEE/CVF CVPR, 2019
work page 2019
-
[33]
The spatio-temporal poisson point process: A simple model for the alignment of event camera data,
C. Gu, E. Learned-Miller, D. Sheldon, G. Gallego, and P. Bideau, “The spatio-temporal poisson point process: A simple model for the alignment of event camera data,” inProc. IEEE/CVF ICCV, 2021
work page 2021
-
[34]
Constrained linear state estimation—a moving horizon approach,
C. V . Rao, J. B. Rawlings, and J. H. Lee, “Constrained linear state estimation—a moving horizon approach,”Automatica, vol. 37, no. 10, pp. 1619–1628, 2001
work page 2001
-
[35]
C. V . Rao, J. B. Rawlings, and D. Q. Mayne, “Constrained state estimation for nonlinear discrete-time systems: Stability and moving horizon approximations,”IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 246–258, 2003
work page 2003
-
[36]
G. Battistelli, L. Chisci, and S. Gherardini, “Moving horizon estimation for discrete-time linear systems with binary sensors: Algorithms and stability results,”Automatica, vol. 85, pp. 374–385, 2017
work page 2017
-
[37]
Event-triggered maximum likelihood state estimation,
D. Shi, T. Chen, and L. Shi, “Event-triggered maximum likelihood state estimation,”Automatica, vol. 50, no. 1, pp. 247–254, 2014
work page 2014
-
[38]
L. Zou, Z. Wang, J. Hu, and Q.-L. Han, “Moving horizon estimation meets multi-sensor information fusion: Development, opportunities and challenges,”Information Fusion, vol. 60, pp. 1–10, 2020
work page 2020
-
[39]
Neu- ral ordinary differential equations,
R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neu- ral ordinary differential equations,” inAdvances in Neural Information Processing Systems, vol. 31, 2018, pp. 6571–6583
work page 2018
-
[40]
Nonlinear filtering with counting observations,
A. Segall, M. H. A. Davis, and T. Kailath, “Nonlinear filtering with counting observations,”IEEE Transactions on Information Theory, vol. 21, no. 2, pp. 143–149, 1975
work page 1975
-
[41]
Online variational inference for state-space models with point-process observations,
A. Zammit-Mangion, K. Yuan, V . Kadirkamanathan, M. Niranjan, and G. Sanguinetti, “Online variational inference for state-space models with point-process observations,”Neural Computation, vol. 23, no. 8, pp. 1967–1999, 2011
work page 1967
-
[42]
Recurrent marked temporal point processes: Embedding event history to vector,
N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song, “Recurrent marked temporal point processes: Embedding event history to vector,” inProc. ACM SIGKDD (KDD), 2016, pp. 1555–1564
work page 2016
-
[43]
The neural hawkes process: A neurally self-modulating multivariate point process,
H. Mei and J. M. Eisner, “The neural hawkes process: A neurally self-modulating multivariate point process,” inAdvances in Neural Information Processing Systems, 2017
work page 2017
-
[44]
Latent ordinary differential equations for irregularly-sampled time series,
Y . Rubanova, R. T. Q. Chen, and D. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,” inAdvances in Neural Information Processing Systems, 2019
work page 2019
-
[45]
Neural jump ordinary differ- ential equations: Consistent continuous-time prediction and filtering,
C. Herrera, F. Krach, and J. Teichmann, “Neural jump ordinary differ- ential equations: Consistent continuous-time prediction and filtering,” in International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[46]
D. J. Daley and D. Vere-Jones,An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure, 2nd ed. Springer, 2008
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.