Recognition: no theorem link
Massively Parallel Exact Inference for Hawkes Processes
Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3
The pith
The Hawkes intensity for linear exponential models can be rewritten as a product of sparse transition matrices that multiply associatively, allowing exact likelihood computation by parallel prefix scan.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a massively parallelizable algorithm for estimation of linear exponential Hawkes processes that reduces complexity to approximately O(N/P) with P processors, maintains constant memory via batching, and computes the exact likelihood without additional assumptions.
What carries the argument
Sparse transition matrices whose associative product computes the exact Hawkes intensity and permits a parallel prefix scan.
If this is right
- Exact maximum-likelihood estimation becomes feasible at scales of tens of millions of events and thousands of nodes.
- Memory usage stays bounded by a constant batch size independent of total event count.
- The method delivers orders-of-magnitude wall-clock speedups on both simulated and real multivariate point-process data.
- The algorithm remains fully interpretable because it computes the original likelihood without approximation.
Where Pith is reading between the lines
- The same associative-matrix trick could be tested on other self-exciting point-process families if analogous linear recurrences exist.
- Streaming or online updating of Hawkes parameters might become practical once the prefix-scan structure is adapted to incremental arrivals.
- Large-scale social or financial event datasets that were previously intractable for exact fitting can now be modeled without subsampling.
Load-bearing premise
The matrix-product reformulation and its associative property hold exactly only for the canonical linear exponential Hawkes process.
What would settle it
Running both the new parallel scan and the classic sequential O(N) recurrence on the same dataset of several million events and verifying that the computed log-likelihood values match to machine precision.
Figures
read the original abstract
Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a massively parallelizable algorithm for estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the intensity and integrated compensator of the canonical linear-exponential multivariate Hawkes process can be exactly rewritten as a product of sparse transition matrices whose multiplication is associative; this identity permits exact likelihood evaluation via a parallel prefix scan, yielding an O(N/P) algorithm on P processors together with a constant-memory batching scheme, all without approximation. An open-source PyTorch implementation is provided.
Significance. If the algebraic identity holds, the work supplies a practical, exact, and hardware-native route to maximum-likelihood estimation for linear Hawkes models at scales (tens of millions of events, thousands of nodes) that were previously inaccessible to sequential O(N) recurrences. The absence of any post-hoc approximation or fitted parameters, combined with released code, makes the contribution immediately verifiable and usable.
minor comments (3)
- [§3.2] §3.2, Eq. (9): the definition of the transition matrix A_t could explicitly state the dimension of the zero blocks to avoid ambiguity when the number of marks differs from the state dimension.
- [Figure 2] Figure 2 caption: the legend should indicate whether the plotted curves are wall-clock time or normalized speedup; the current wording leaves the y-axis interpretation slightly unclear.
- [§5.3] §5.3: the memory-complexity claim for the batching scheme is stated as O(1) per batch but would benefit from a short derivation showing that the prefix-scan workspace does not grow with batch size.
Simulated Author's Rebuttal
We thank the referee for their positive review, accurate summary of the contribution, and recommendation to accept the manuscript. No major comments were raised that require a point-by-point response.
Circularity Check
No significant circularity: exact algebraic rewriting of known linear Hawkes recurrence
full rationale
The central derivation rewrites the standard linear-exponential Hawkes intensity recurrence as a product of explicitly constructed sparse transition matrices. Matrix multiplication is associative by the definition of matrix algebra, and the parallel prefix scan is the standard parallel algorithm for any associative binary operation. The paper states that the reformulation holds exactly for the canonical linear exponential kernel and computes the identical likelihood value as the sequential O(N) recurrence. No parameters are fitted to data and then relabeled as predictions, no self-citations supply load-bearing uniqueness theorems, and no ansatz is smuggled in. The released PyTorch implementation supplies an independent executable verification that the matrix-product form equals the original recurrence. The derivation is therefore self-contained against external benchmarks and contains no circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Matrix multiplication is associative
Reference graph
Works this paper leans on
-
[1]
Paul Embrechts, Thomas Liniger, and Lu Lin. Multivariate Hawkes processes: An application to financial data.Journal of Applied Probability, 48(A):367–378, August 2011
work page 2011
-
[2]
Yacine Aït-Sahalia, Julio Cacho-Diaz, and Roger J. A. Laeven. Modeling financial contagion using mutually exciting jump processes.Journal of Financial Economics, 117(3):585–606, September 2015
work page 2015
-
[3]
Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. Hawkes Processes in Finance. Market Microstructure and Liquidity, 01(01):1550005, June 2015
work page 2015
-
[4]
Alan G. Hawkes. Hawkes processes and their applications to finance: A review.Quantitative Finance, 18(2):193–198, February 2018
work page 2018
-
[5]
Yosihiko Ogata. Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes.Journal of the American Statistical Association, 83(401):9–27, March 1988
work page 1988
-
[6]
Yosihiko Ogata. Space-Time Point-Process Models for Earthquake Occurrences.Annals of the Institute of Statistical Mathematics, 50(2):379–402, June 1998
work page 1998
-
[7]
Jiancang Zhuang, Yosihiko Ogata, and David Vere-Jones. Stochastic Declustering of Space-Time Earthquake Occurrences.Journal of the American Statistical Association, 97(458):369–380, June 2002
work page 2002
-
[8]
Felipe Gerhard, Moritz Deger, and Wilson Truccolo. On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs.PLOS Computa- tional Biology, 13(2):e1005390, February 2017
work page 2017
-
[9]
H. Juliette T. Unwin, Isobel Routledge, Seth Flaxman, Marian-Andrei Rizoiu, Shengjie Lai, Justin Cohen, Daniel J. Weiss, Swapnil Mishra, and Samir Bhatt. Using Hawkes Processes to model imported and local malaria cases in near-elimination settings.PLOS Computational Biology, 17(4):e1008830, April 2021
work page 2021
-
[10]
Raiha Browning, Deborah Sulem, Kerrie Mengersen, Vincent Rivoirard, and Judith Rousseau. Simple discrete-time self-exciting models can describe complex dynamic processes: A case study of COVID-19.PLOS ONE, 16(4):e0250015, April 2021
work page 2021
-
[11]
Wen-Hao Chiang, Xueying Liu, and George Mohler. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates.International Journal of Forecasting, 38(2):505–520, April 2022
work page 2022
-
[12]
Wilson Truccolo. From point process observations to collective neural dynamics: Nonlinear Hawkes process GLMs, low-dimensional dynamics and coarse graining.Journal of Physiology- Paris, 110(4, Part A):336–347, November 2016
work page 2016
-
[13]
Régis C. Lambert, Christine Tuleau-Malot, Thomas Bessaih, Vincent Rivoirard, Yann Bouret, Nathalie Leresche, and Patricia Reynaud-Bouret. Reconstructing the functional connectivity of multiple spike trains using Hawkes models.Journal of Neuroscience Methods, 297:9–21, March 2018
work page 2018
-
[14]
Learning Social Infectivity in Sparse Low-rank Net- works Using Multi-dimensional Hawkes Processes
Ke Zhou, Hongyuan Zha, and Le Song. Learning Social Infectivity in Sparse Low-rank Net- works Using Multi-dimensional Hawkes Processes. InProceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pages 641–649. PMLR, April 2013
work page 2013
-
[15]
Ryota Kobayashi and Renaud Lambiotte. TiDeH: Time-Dependent Hawkes Process for Predict- ing Retweet Dynamics.Proceedings of the International AAAI Conference on Web and Social Media, 10(1):191–200, 2016. 10
work page 2016
-
[16]
Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity
Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, and Pascal Van Hentenryck. Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity. InProceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 735–744, Republic and Canton of Geneva, CHE, April 2017. International World Wide Web Confer...
work page 2017
-
[17]
Hawkes processes for events in social media
Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra, and Lexing Xie. Hawkes processes for events in social media. InFrontiers of Multimedia Research, volume 17, pages 191–218. Association for Computing Machinery and Morgan & Claypool, December 2017
work page 2017
-
[18]
Learning network of multivariate Hawkes processes: A time series approach
Jalal Etesami, Negar Kiyavash, Kun Zhang, and Kushagra Singhal. Learning network of multivariate Hawkes processes: A time series approach. InProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, pages 162–171, Arlington, Vir- ginia, USA, June 2016. AUAI Press
work page 2016
-
[19]
Michael Eichler, Rainer Dahlhaus, and Johannes Dueck. Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.Journal of Time Series Analysis, 38(2):225–242, 2017
work page 2017
-
[20]
Y . Ogata. On Lewis’ simulation method for point processes.IEEE Transactions on Information Theory, 27(1):23–31, January 1981
work page 1981
-
[21]
Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes
Maximilian Nickel and Matthew Le. Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes. InProceedings of the Web Conference 2021, WWW ’21, pages 706–717, New York, NY , USA, June 2021. Association for Computing Machinery
work page 2021
-
[22]
Rémi Lemonnier, Kevin Scaman, and Argyris Kalogeratos. Multivariate Hawkes Processes for Large-Scale Inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), February 2017
work page 2017
-
[23]
Emmanuel Bacry, Martin Bompaire, Stéphane Gaïffas, and Jean-Francois Muzy. Sparse and low-rank multivariate Hawkes processes.Journal of Machine Learning Research, 21(50):1–32, 2020
work page 2020
-
[24]
Alejandro Veen and Frederic P Schoenberg. Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm.Journal of the American Statistical Association, 103(482):614–624, June 2008
work page 2008
-
[25]
Erik Lewis and George Mohler. A Nonparametric EM Algorithm for Multiscale Hawkes Processes.Journal of Nonparametric Statistics, January 2011
work page 2011
-
[26]
An estimation procedure for the Hawkes process.Quantitative Finance, 17(4):571–595, April 2017
Matthias Kirchner. An estimation procedure for the Hawkes process.Quantitative Finance, 17(4):571–595, April 2017
work page 2017
-
[27]
W. Daniel Hillis and Guy L. Steele. Data parallel algorithms.Commun. ACM, 29(12):1170– 1183, December 1986
work page 1986
- [28]
-
[29]
Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Directed scale-free graphs. InProceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pages 132–139, USA, January 2003. Society for Industrial and Applied Mathematics
work page 2003
-
[30]
Scale-Free Networks: A Decade and Beyond.Science, 325(5939):412– 413, July 2009
Albert-László Barabási. Scale-Free Networks: A Decade and Beyond.Science, 325(5939):412– 413, July 2009
work page 2009
-
[31]
Meme-tracking and the dynamics of the news cycle
Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 497–506, New York, NY , USA, June 2009. Association for Computing Machinery. 11
work page 2009
-
[32]
Structure and dynamics of information pathways in online media
Manuel Gomez Rodriguez, Jure Leskovec, and Bernhard Schölkopf. Structure and dynamics of information pathways in online media. InProceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pages 23–32, New York, NY , USA, February 2013. Association for Computing Machinery
work page 2013
-
[33]
G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita. Self- Exciting Point Process Modeling of Crime.Journal of the American Statistical Association, 106(493):100–108, March 2011
work page 2011
-
[34]
G. O. Mohler, M. B. Short, Sean Malinowski, Mark Johnson, G. E. Tita, Andrea L. Bertozzi, and P. J. Brantingham. Randomized Controlled Field Trials of Predictive Policing.Journal of the American Statistical Association, 110(512):1399–1411, October 2015
work page 2015
-
[35]
Jiancang Zhuang and Jorge Mateu. A Semiparametric Spatiotemporal Hawkes-Type Point Process Model with Periodic Background for Crime Data.Journal of the Royal Statistical Society Series A: Statistics in Society, 182(3):919–942, June 2019
work page 2019
-
[36]
Crimes - 2001 to Present, 2025
Chicago Police Department. Crimes - 2001 to Present, 2025
work page 2001
-
[37]
Chicago Police Department. 2024 Annual Report. Technical report, Chicago Police Department, 2024. 12 A Batched Training Algorithm In this section, we present the extension of Algorithm 1 to the batched case. That is, the forward and backward passes for training multivariate Hawkes processes using our approach, split across batches. Note that in order to a...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.