pith. machine review for the scientific record. sign in

arxiv: 2604.01342 · v2 · submitted 2026-04-01 · 💻 cs.LG · stat.ML

Recognition: no theorem link

Massively Parallel Exact Inference for Hawkes Processes

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords Hawkes processesparallel prefix scanexact inferenceGPU accelerationpoint processesmaximum likelihoodsparse matricesself-exciting processes
0
0 comments X

The pith

The Hawkes intensity for linear exponential models can be rewritten as a product of sparse transition matrices that multiply associatively, allowing exact likelihood computation by parallel prefix scan.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that the standard linear exponential Hawkes process intensity admits an exact reformulation as a chain of sparse transition matrices whose multiplication is associative and linear-time. This structure replaces the usual sequential recurrence with a parallel prefix scan that runs across many processors while preserving the exact likelihood value. The approach also introduces a batching scheme that keeps memory usage constant regardless of event count. As a result, maximum-likelihood fitting becomes feasible for datasets with tens of millions of events on current GPUs, without introducing approximations or changing the model.

Core claim

The Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a massively parallelizable algorithm for estimation of linear exponential Hawkes processes that reduces complexity to approximately O(N/P) with P processors, maintains constant memory via batching, and computes the exact likelihood without additional assumptions.

What carries the argument

Sparse transition matrices whose associative product computes the exact Hawkes intensity and permits a parallel prefix scan.

If this is right

  • Exact maximum-likelihood estimation becomes feasible at scales of tens of millions of events and thousands of nodes.
  • Memory usage stays bounded by a constant batch size independent of total event count.
  • The method delivers orders-of-magnitude wall-clock speedups on both simulated and real multivariate point-process data.
  • The algorithm remains fully interpretable because it computes the original likelihood without approximation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same associative-matrix trick could be tested on other self-exciting point-process families if analogous linear recurrences exist.
  • Streaming or online updating of Hawkes parameters might become practical once the prefix-scan structure is adapted to incremental arrivals.
  • Large-scale social or financial event datasets that were previously intractable for exact fitting can now be modeled without subsampling.

Load-bearing premise

The matrix-product reformulation and its associative property hold exactly only for the canonical linear exponential Hawkes process.

What would settle it

Running both the new parallel scan and the classic sequential O(N) recurrence on the same dataset of several million events and verifying that the computed log-likelihood values match to machine precision.

Figures

Figures reproduced from arXiv: 2604.01342 by Ahmer Raza, Hudson Smith.

Figure 1
Figure 1. Figure 1: Scaling of our method across various sequence lengths, showing average epoch time (left) [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between our method (black) and other implementations of the intensity [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Interaction matrix for all crime types learned from Chicago crime data. Choropleth maps [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a massively parallelizable algorithm for estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that the intensity and integrated compensator of the canonical linear-exponential multivariate Hawkes process can be exactly rewritten as a product of sparse transition matrices whose multiplication is associative; this identity permits exact likelihood evaluation via a parallel prefix scan, yielding an O(N/P) algorithm on P processors together with a constant-memory batching scheme, all without approximation. An open-source PyTorch implementation is provided.

Significance. If the algebraic identity holds, the work supplies a practical, exact, and hardware-native route to maximum-likelihood estimation for linear Hawkes models at scales (tens of millions of events, thousands of nodes) that were previously inaccessible to sequential O(N) recurrences. The absence of any post-hoc approximation or fitted parameters, combined with released code, makes the contribution immediately verifiable and usable.

minor comments (3)
  1. [§3.2] §3.2, Eq. (9): the definition of the transition matrix A_t could explicitly state the dimension of the zero blocks to avoid ambiguity when the number of marks differs from the state dimension.
  2. [Figure 2] Figure 2 caption: the legend should indicate whether the plotted curves are wall-clock time or normalized speedup; the current wording leaves the y-axis interpretation slightly unclear.
  3. [§5.3] §5.3: the memory-complexity claim for the batching scheme is stated as O(1) per batch but would benefit from a short derivation showing that the prefix-scan workspace does not grow with batch size.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, accurate summary of the contribution, and recommendation to accept the manuscript. No major comments were raised that require a point-by-point response.

Circularity Check

0 steps flagged

No significant circularity: exact algebraic rewriting of known linear Hawkes recurrence

full rationale

The central derivation rewrites the standard linear-exponential Hawkes intensity recurrence as a product of explicitly constructed sparse transition matrices. Matrix multiplication is associative by the definition of matrix algebra, and the parallel prefix scan is the standard parallel algorithm for any associative binary operation. The paper states that the reformulation holds exactly for the canonical linear exponential kernel and computes the identical likelihood value as the sequential O(N) recurrence. No parameters are fitted to data and then relabeled as predictions, no self-citations supply load-bearing uniqueness theorems, and no ansatz is smuggled in. The released PyTorch implementation supplies an independent executable verification that the matrix-product form equals the original recurrence. The derivation is therefore self-contained against external benchmarks and contains no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard algebraic property that matrix multiplication is associative, which is invoked to justify the parallel prefix scan. No free parameters or new entities are introduced beyond the existing linear Hawkes model.

axioms (1)
  • standard math Matrix multiplication is associative
    Invoked to allow reordering of the transition matrix products into a parallel prefix scan without changing the result.

pith-pipeline@v0.9.0 · 5487 in / 1166 out tokens · 28104 ms · 2026-05-13T22:00:08.240802+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Multivariate Hawkes processes: An application to financial data.Journal of Applied Probability, 48(A):367–378, August 2011

    Paul Embrechts, Thomas Liniger, and Lu Lin. Multivariate Hawkes processes: An application to financial data.Journal of Applied Probability, 48(A):367–378, August 2011

  2. [2]

    Yacine Aït-Sahalia, Julio Cacho-Diaz, and Roger J. A. Laeven. Modeling financial contagion using mutually exciting jump processes.Journal of Financial Economics, 117(3):585–606, September 2015

  3. [3]

    Hawkes Processes in Finance

    Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. Hawkes Processes in Finance. Market Microstructure and Liquidity, 01(01):1550005, June 2015

  4. [4]

    Alan G. Hawkes. Hawkes processes and their applications to finance: A review.Quantitative Finance, 18(2):193–198, February 2018

  5. [5]

    Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes.Journal of the American Statistical Association, 83(401):9–27, March 1988

    Yosihiko Ogata. Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes.Journal of the American Statistical Association, 83(401):9–27, March 1988

  6. [6]

    Space-Time Point-Process Models for Earthquake Occurrences.Annals of the Institute of Statistical Mathematics, 50(2):379–402, June 1998

    Yosihiko Ogata. Space-Time Point-Process Models for Earthquake Occurrences.Annals of the Institute of Statistical Mathematics, 50(2):379–402, June 1998

  7. [7]

    Stochastic Declustering of Space-Time Earthquake Occurrences.Journal of the American Statistical Association, 97(458):369–380, June 2002

    Jiancang Zhuang, Yosihiko Ogata, and David Vere-Jones. Stochastic Declustering of Space-Time Earthquake Occurrences.Journal of the American Statistical Association, 97(458):369–380, June 2002

  8. [8]

    On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs.PLOS Computa- tional Biology, 13(2):e1005390, February 2017

    Felipe Gerhard, Moritz Deger, and Wilson Truccolo. On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs.PLOS Computa- tional Biology, 13(2):e1005390, February 2017

  9. [9]

    Juliette T

    H. Juliette T. Unwin, Isobel Routledge, Seth Flaxman, Marian-Andrei Rizoiu, Shengjie Lai, Justin Cohen, Daniel J. Weiss, Swapnil Mishra, and Samir Bhatt. Using Hawkes Processes to model imported and local malaria cases in near-elimination settings.PLOS Computational Biology, 17(4):e1008830, April 2021

  10. [10]

    Simple discrete-time self-exciting models can describe complex dynamic processes: A case study of COVID-19.PLOS ONE, 16(4):e0250015, April 2021

    Raiha Browning, Deborah Sulem, Kerrie Mengersen, Vincent Rivoirard, and Judith Rousseau. Simple discrete-time self-exciting models can describe complex dynamic processes: A case study of COVID-19.PLOS ONE, 16(4):e0250015, April 2021

  11. [11]

    Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates.International Journal of Forecasting, 38(2):505–520, April 2022

    Wen-Hao Chiang, Xueying Liu, and George Mohler. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates.International Journal of Forecasting, 38(2):505–520, April 2022

  12. [12]

    Wilson Truccolo. From point process observations to collective neural dynamics: Nonlinear Hawkes process GLMs, low-dimensional dynamics and coarse graining.Journal of Physiology- Paris, 110(4, Part A):336–347, November 2016

  13. [13]

    Lambert, Christine Tuleau-Malot, Thomas Bessaih, Vincent Rivoirard, Yann Bouret, Nathalie Leresche, and Patricia Reynaud-Bouret

    Régis C. Lambert, Christine Tuleau-Malot, Thomas Bessaih, Vincent Rivoirard, Yann Bouret, Nathalie Leresche, and Patricia Reynaud-Bouret. Reconstructing the functional connectivity of multiple spike trains using Hawkes models.Journal of Neuroscience Methods, 297:9–21, March 2018

  14. [14]

    Learning Social Infectivity in Sparse Low-rank Net- works Using Multi-dimensional Hawkes Processes

    Ke Zhou, Hongyuan Zha, and Le Song. Learning Social Infectivity in Sparse Low-rank Net- works Using Multi-dimensional Hawkes Processes. InProceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pages 641–649. PMLR, April 2013

  15. [15]

    TiDeH: Time-Dependent Hawkes Process for Predict- ing Retweet Dynamics.Proceedings of the International AAAI Conference on Web and Social Media, 10(1):191–200, 2016

    Ryota Kobayashi and Renaud Lambiotte. TiDeH: Time-Dependent Hawkes Process for Predict- ing Retweet Dynamics.Proceedings of the International AAAI Conference on Web and Social Media, 10(1):191–200, 2016. 10

  16. [16]

    Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity

    Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, and Pascal Van Hentenryck. Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity. InProceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 735–744, Republic and Canton of Geneva, CHE, April 2017. International World Wide Web Confer...

  17. [17]

    Hawkes processes for events in social media

    Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra, and Lexing Xie. Hawkes processes for events in social media. InFrontiers of Multimedia Research, volume 17, pages 191–218. Association for Computing Machinery and Morgan & Claypool, December 2017

  18. [18]

    Learning network of multivariate Hawkes processes: A time series approach

    Jalal Etesami, Negar Kiyavash, Kun Zhang, and Kushagra Singhal. Learning network of multivariate Hawkes processes: A time series approach. InProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, pages 162–171, Arlington, Vir- ginia, USA, June 2016. AUAI Press

  19. [19]

    Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.Journal of Time Series Analysis, 38(2):225–242, 2017

    Michael Eichler, Rainer Dahlhaus, and Johannes Dueck. Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.Journal of Time Series Analysis, 38(2):225–242, 2017

  20. [20]

    Y . Ogata. On Lewis’ simulation method for point processes.IEEE Transactions on Information Theory, 27(1):23–31, January 1981

  21. [21]

    Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes

    Maximilian Nickel and Matthew Le. Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes. InProceedings of the Web Conference 2021, WWW ’21, pages 706–717, New York, NY , USA, June 2021. Association for Computing Machinery

  22. [22]

    Multivariate Hawkes Processes for Large-Scale Inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), February 2017

    Rémi Lemonnier, Kevin Scaman, and Argyris Kalogeratos. Multivariate Hawkes Processes for Large-Scale Inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), February 2017

  23. [23]

    Sparse and low-rank multivariate Hawkes processes.Journal of Machine Learning Research, 21(50):1–32, 2020

    Emmanuel Bacry, Martin Bompaire, Stéphane Gaïffas, and Jean-Francois Muzy. Sparse and low-rank multivariate Hawkes processes.Journal of Machine Learning Research, 21(50):1–32, 2020

  24. [24]

    Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm.Journal of the American Statistical Association, 103(482):614–624, June 2008

    Alejandro Veen and Frederic P Schoenberg. Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm.Journal of the American Statistical Association, 103(482):614–624, June 2008

  25. [25]

    A Nonparametric EM Algorithm for Multiscale Hawkes Processes.Journal of Nonparametric Statistics, January 2011

    Erik Lewis and George Mohler. A Nonparametric EM Algorithm for Multiscale Hawkes Processes.Journal of Nonparametric Statistics, January 2011

  26. [26]

    An estimation procedure for the Hawkes process.Quantitative Finance, 17(4):571–595, April 2017

    Matthias Kirchner. An estimation procedure for the Hawkes process.Quantitative Finance, 17(4):571–595, April 2017

  27. [27]

    Daniel Hillis and Guy L

    W. Daniel Hillis and Guy L. Steele. Data parallel algorithms.Commun. ACM, 29(12):1170– 1183, December 1986

  28. [28]

    Blelloch

    Guy E. Blelloch. Prefix Sums and Their Applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990

  29. [29]

    Directed scale-free graphs

    Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Directed scale-free graphs. InProceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pages 132–139, USA, January 2003. Society for Industrial and Applied Mathematics

  30. [30]

    Scale-Free Networks: A Decade and Beyond.Science, 325(5939):412– 413, July 2009

    Albert-László Barabási. Scale-Free Networks: A Decade and Beyond.Science, 325(5939):412– 413, July 2009

  31. [31]

    Meme-tracking and the dynamics of the news cycle

    Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 497–506, New York, NY , USA, June 2009. Association for Computing Machinery. 11

  32. [32]

    Structure and dynamics of information pathways in online media

    Manuel Gomez Rodriguez, Jure Leskovec, and Bernhard Schölkopf. Structure and dynamics of information pathways in online media. InProceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pages 23–32, New York, NY , USA, February 2013. Association for Computing Machinery

  33. [33]

    G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita. Self- Exciting Point Process Modeling of Crime.Journal of the American Statistical Association, 106(493):100–108, March 2011

  34. [34]

    G. O. Mohler, M. B. Short, Sean Malinowski, Mark Johnson, G. E. Tita, Andrea L. Bertozzi, and P. J. Brantingham. Randomized Controlled Field Trials of Predictive Policing.Journal of the American Statistical Association, 110(512):1399–1411, October 2015

  35. [35]

    Jiancang Zhuang and Jorge Mateu. A Semiparametric Spatiotemporal Hawkes-Type Point Process Model with Periodic Background for Crime Data.Journal of the Royal Statistical Society Series A: Statistics in Society, 182(3):919–942, June 2019

  36. [36]

    Crimes - 2001 to Present, 2025

    Chicago Police Department. Crimes - 2001 to Present, 2025

  37. [37]

    2024 Annual Report

    Chicago Police Department. 2024 Annual Report. Technical report, Chicago Police Department, 2024. 12 A Batched Training Algorithm In this section, we present the extension of Algorithm 1 to the batched case. That is, the forward and backward passes for training multivariate Hawkes processes using our approach, split across batches. Note that in order to a...