pith. sign in

arxiv: 2410.07430 · v3 · submitted 2024-10-09 · 💻 cs.LG · stat.ML

EventFlow: Forecasting Temporal Point Processes with Flow Matching

Pith reviewed 2026-05-23 19:02 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords temporal point processesflow matchingnon-autoregressiveevent forecastinggenerative modelsmachine learning
0
0 comments X

The pith

EventFlow uses flow matching to model temporal point processes non-autoregressively and cuts forecast error 20-53 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EventFlow as a non-autoregressive generative model for temporal point processes based on flow matching. Traditional autoregressive models predict one event after another and can suffer from errors that accumulate over longer forecasts. EventFlow instead learns the full joint distribution of event times directly. On standard benchmarks this yields 20 to 53 percent lower forecast error and requires fewer model evaluations when generating predictions. The result matters for domains that rely on accurate multi-step forecasts of irregular event sequences.

Core claim

EventFlow is a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. It achieves a 20%-53% lower forecast error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.

What carries the argument

Flow matching objective applied to the joint distribution of event times and marks across sequences of varying lengths.

Load-bearing premise

Flow matching can faithfully capture the joint distribution of event times and marks without autoregressive conditioning to prevent cascading errors.

What would settle it

A benchmark result in which EventFlow forecast error on long sequences equals or exceeds autoregressive baselines because the joint distribution modeling fails to capture dependencies.

Figures

Figures reproduced from arXiv: 2410.07430 by Gavin Kerrigan, Kai Nelson, Padhraic Smyth.

Figure 1
Figure 1. Figure 1: All illustration of forecasting with our [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sequence distance (9) between the forecasted and ground-truth event sequences on a held-out test set. We report the mean ± one standard deviation over five random seeds. EventFlow (with 25 NFEs) achieves the lowest mean distance (forecasting error) for each of the 7 datasets. non-autoregressive diffusion model. These models use an RNN-based history encoder, with the exception of Add-and-Thin which uses a C… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our model architecture for unconditional generation. The model takes as input [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our model architecture for conditional generation. The encoder (left) takes as [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of our architecture modeling the event count distribution [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can degrade when forecasting longer horizons due to cascading errors and myopic predictions. We propose EventFlow, a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is simple to implement and achieves a 20%-53% lower forecast error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes EventFlow, a non-autoregressive generative model for temporal point processes based on flow matching. It directly learns joint distributions over event times and marks to avoid the cascading errors and myopic predictions of autoregressive neural TPP models, claiming 20%-53% lower forecast error than the nearest baseline on standard benchmarks while requiring fewer model calls at sampling time.

Significance. If the central empirical claims hold after verification that the flow-matching vector field respects point-process constraints (no simultaneous events, correct ordering) for variable-length sequences, the work would offer a substantive alternative modeling paradigm for TPP forecasting. The non-autoregressive formulation and reported sampling efficiency would be notable strengths if supported by reproducible experiments.

major comments (2)
  1. [Methods (flow-matching adaptation)] The non-autoregressive premise requires explicit verification that the fixed-dimensional flow-matching objective, after any padding/masking/length-conditioning, preserves the point-process measure (no simultaneous events, strict ordering) across the empirical distribution of sequence lengths N; without such checks the reported gains could be artifacts of benchmark length statistics rather than a solution to joint modeling.
  2. [Experiments] The central performance claim (20-53% lower forecast error) is load-bearing; the experiments section must supply baseline definitions, dataset statistics, error-bar information, and the exact forecast horizon/metrics used, as the abstract alone supplies none of these details.
minor comments (1)
  1. [Introduction] Notation for marks m_{1:N} and history conditioning should be clarified when first introduced to avoid ambiguity with standard TPP intensity notation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript to incorporate the requested clarifications and additions.

read point-by-point responses
  1. Referee: [Methods (flow-matching adaptation)] The non-autoregressive premise requires explicit verification that the fixed-dimensional flow-matching objective, after any padding/masking/length-conditioning, preserves the point-process measure (no simultaneous events, strict ordering) across the empirical distribution of sequence lengths N; without such checks the reported gains could be artifacts of benchmark length statistics rather than a solution to joint modeling.

    Authors: We agree that explicit verification strengthens the non-autoregressive claim. In the revision we will add a dedicated subsection describing how the flow-matching vector field, together with the padding/masking and length-conditioning mechanisms, enforces no simultaneous events and strict ordering. We will also report empirical diagnostics (e.g., fraction of invalid sequences and ordering violations) computed on held-out length distributions from each benchmark. revision: yes

  2. Referee: [Experiments] The central performance claim (20-53% lower forecast error) is load-bearing; the experiments section must supply baseline definitions, dataset statistics, error-bar information, and the exact forecast horizon/metrics used, as the abstract alone supplies none of these details.

    Authors: We accept that the experiments section should be self-contained. The revised manuscript will include: (i) precise definitions and hyper-parameter settings for all baselines, (ii) full dataset statistics (number of sequences, mean/variance of lengths and marks), (iii) error bars from at least five independent runs, and (iv) explicit statements of the forecast horizons and metrics (e.g., mean absolute error on inter-event times, log-likelihood on marks) used to obtain the reported 20-53% improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents EventFlow as a new non-autoregressive flow-matching model for joint distributions over event times and marks in temporal point processes, explicitly positioned as an alternative to autoregressive conditioning to avoid cascading errors. No equations, parameter-fitting steps, or self-citations in the abstract or described approach reduce the reported 20-53% error reductions to quantities fitted from the evaluation data by construction. The central claim rests on applying an external flow-matching framework to TPP data with benchmark comparisons, which constitutes independent content rather than self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated modeling assumption that flow matching can represent the required joint distributions.

pith-pipeline@v0.9.0 · 5678 in / 1109 out tokens · 31750 ms · 2026-05-23T19:02:54.391785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    Building normalizing flows with stochastic interpolants

    Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023

  3. [3]

    Analysis and geometry on configuration spaces

    Sergio Albeverio, Yu G Kondratiev, and Michael R \"o ckner. Analysis and geometry on configuration spaces. Journal of Functional Analysis, 154 0 (2): 0 444--500, 1998

  4. [4]

    On the predictive accuracy of neural temporal point process models for continuous-time event data

    Tanguy Bosser and Souhaib Ben Taieb. On the predictive accuracy of neural temporal point process models for continuous-time event data. Transactions on Machine Learning Research, 2023

  5. [5]

    Probabilistic querying of continuous-time event sequences

    Alex Boyd, Yuxin Chang, Stephan Mandt, and Padhraic Smyth. Probabilistic querying of continuous-time event sequences. In International Conference on Artificial Intelligence and Statistics, pp.\ 10235--10251. PMLR, 2023

  6. [6]

    Epic-ly fast particle cloud generation with flow-matching and diffusion

    Erik Buhmann, Cedric Ewen, Darius A Faroughy, Tobias Golling, Gregor Kasieczka, Matthew Leigh, Guillaume Qu \'e tant, John Andrew Raine, Debajyoti Sengupta, and David Shih. Epic-ly fast particle cloud generation with flow-matching and diffusion. arXiv preprint arXiv:2310.00049, 2023

  7. [7]

    Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design

    Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. In Proceedings of the 41st International Conference on Machine Learning, pp.\ 5453--5512, 2024

  8. [8]

    An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods

    Daryl J Daley and David Vere-Jones. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Springer, 2003

  9. [9]

    Flow matching in latent space

    Quan Dao, Hao Phung, Binh Nguyen, and Anh Tran. Flow matching in latent space. arXiv preprint arXiv:2307.08698, 2023

  10. [10]

    Recurrent marked temporal point processes: Embedding event history to vector

    Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.\ 1555--1564, 2016

  11. [11]

    A kernel two-sample test

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch \"o lkopf, and Alexander Smola. A kernel two-sample test. The Journal of Machine Learning Research, 13: 0 723--773, 2012

  12. [12]

    Spectra of some self-exciting and mutually exciting point processes

    Alan G Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58 0 (1): 0 83--90, 1971

  13. [13]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 0 6840--6851, 2020

  14. [14]

    A self-correcting point process

    Valerie Isham and Mark Westcott. A self-correcting point process. Stochastic Processes and Their Applications, 8 0 (3): 0 335--347, 1979

  15. [15]

    Random Measures, Theory and Applications, volume 1

    Olav Kallenberg. Random Measures, Theory and Applications, volume 1. Springer, 2017

  16. [16]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Represenations, 2015

  17. [17]

    Simulation of nonhomogeneous P oisson processes with degree-two exponential polynomial rate function

    Peter AW Lewis and Gerald S Shedler. Simulation of nonhomogeneous P oisson processes with degree-two exponential polynomial rate function. Operations Research, 27 0 (5): 0 1026--1040, 1979

  18. [19]

    Exploring generative neural temporal point process

    Haitao Lin, Lirong Wu, Guojiang Zhao, Liu Pai, and Stan Z Li. Exploring generative neural temporal point process. Transactions on Machine Learning Research, 2022

  19. [20]

    Flow matching for generative modeling

    Yaron Lipman, Ricky T Q Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023

  20. [21]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2023

  21. [22]

    SGDR : Stochastic gradient descent with warm restarts

    Ilya Loshchilov and Frank Hutter. SGDR : Stochastic gradient descent with warm restarts. In International Conference on Learning Representations, 2017

  22. [23]

    u dke, Marin Bilo s , Oleksandr Shchur, Marten Lienen, and Stephan G \

    David L \"u dke, Marin Bilo s , Oleksandr Shchur, Marten Lienen, and Stephan G \"u nnemann. Add and thin: Diffusion for temporal point processes. Advances in Neural Information Processing Systems, 36: 0 56784--56801, 2023

  23. [24]

    Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

    Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. arXiv preprint arXiv:2401.08740, 2024

  24. [25]

    The neural H awkes process: A neurally self-modulating multivariate point process

    Hongyuan Mei and Jason M Eisner. The neural H awkes process: A neurally self-modulating multivariate point process. Advances in Neural Information Processing Systems, 30, 2017

  25. [26]

    Imputing missing events in continuous-time event streams

    Hongyuan Mei, Guanghui Qin, and Jason Eisner. Imputing missing events in continuous-time event streams. In International Conference on Machine Learning, pp.\ 4475--4485, 2019

  26. [27]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pp.\ 8162--8171. PMLR, 2021

  27. [28]

    On L ewis' simulation method for point processes

    Yosihiko Ogata. On L ewis' simulation method for point processes. IEEE Transactions on Information Theory, 27 0 (1): 0 23--31, 1981

  28. [29]

    Space-time point-process models for earthquake occurrences

    Yosihiko Ogata. Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics, 50: 0 379--402, 1998

  29. [30]

    Fully neural network based model for general temporal point processes

    Takahiro Omi, Kazuyuki Aihara, et al. Fully neural network based model for general temporal point processes. Advances in Neural Information Processing Systems, 32, 2019

  30. [31]

    Normalizing flows for probabilistic modeling and inference

    George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22 0 (57): 0 1--64, 2021

  31. [32]

    Temporal point processes: The conditional intensity function

    Jakob Gulddahl Rasmussen. Temporal point processes: The conditional intensity function. Lecture Notes, 2011

  32. [33]

    Intensity-free learning of temporal point processes

    Oleksandr Shchur, Marin Bilo s , and Stephan G \"u nnemann. Intensity-free learning of temporal point processes. In International Conference on Learning Representations, 2020 a

  33. [34]

    Fast and flexible temporal point processes with triangular maps

    Oleksandr Shchur, Nicholas Gao, Marin Bilo s , and Stephan G \"u nnemann. Fast and flexible temporal point processes with triangular maps. Advances in Neural Information Processing Systems, 33: 0 73--84, 2020 b

  34. [35]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  35. [36]

    D irichlet flow matching with applications to DNA sequence design

    Hannes Stark, Bowen Jing, Chenyu Wang, Gabriele Corso, Bonnie Berger, Regina Barzilay, and Tommi Jaakkola. D irichlet flow matching with applications to DNA sequence design. In Proceedings of the 41st International Conference on Machine Learning, pp.\ 46495--46513, 2024

  36. [37]

    Improving and generalizing flow-based generative models with minibatch optimal transport

    Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research, 2024

  37. [38]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 2017

  38. [39]

    Fast point cloud generation with straight flows

    Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, and Qiang Liu. Fast point cloud generation with straight flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp.\ 9445--9454, 2023

  39. [40]

    Wasserstein learning of deep generative point process models

    Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. Wasserstein learning of deep generative point process models. Advances in Neural Information Processing Systems, 30, 2017 a

  40. [41]

    Modeling the intensity function of point process via recurrent neural networks

    Shuai Xiao, Junchi Yan, Xiaokang Yang, Hongyuan Zha, and Stephen Chu. Modeling the intensity function of point process via recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017 b

  41. [42]

    Path to purchase: A mutually exciting point process model for online advertising and conversion

    Lizhen Xu, Jason A Duan, and Andrew Whinston. Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science, 60 0 (6): 0 1392--1412, 2014

  42. [43]

    Hypro: A hybridly normalized probabilistic model for long-horizon prediction of event sequences

    Siqiao Xue, Xiaoming Shi, James Zhang, and Hongyuan Mei. Hypro: A hybridly normalized probabilistic model for long-horizon prediction of event sequences. Advances in Neural Information Processing Systems, 35: 0 34641--34650, 2022

  43. [44]

    Zhang, Qingsong Wen, Jun Zhou, and Hongyuan Mei

    Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Hongyan Hao, Fan Zhou, Caigao Jiang, Chen Pan, James Y. Zhang, Qingsong Wen, Jun Zhou, and Hongyuan Mei. Easy TPP : Towards open benchmarking temporal point processes. In International Conference on Learning Representations, 2024

  44. [45]

    Transformer embeddings of irregularly spaced events and their participants

    Chenghao Yang, Hongyuan Mei, and Jason Eisner. Transformer embeddings of irregularly spaced events and their participants. In International Conference on Learning Representations, 2022

  45. [46]

    Self-attentive H awkes process

    Qiang Zhang, Aldo Lipani, Omer Kirnap, and Emine Yilmaz. Self-attentive H awkes process. In Proceedings of the 37th International Conference on Machine Learning, pp.\ 11183--11193, 2020

  46. [47]

    Transformer H awkes process

    Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, and Hongyuan Zha. Transformer H awkes process. In Proceedings of the 37th International Conference on Machine Learning, pp.\ 11692--11702, 2020

  47. [48]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  48. [49]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  49. [50]

    u dke, David and Bilo s , Marin and Shchur, Oleksandr and Lienen, Marten and G \

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...