pith. sign in

arxiv: 2307.09077 · v2 · submitted 2023-07-18 · 💱 q-fin.TR · stat.ML

Estimation of an Order Book Dependent Hawkes Process for Large Datasets

Pith reviewed 2026-05-24 08:18 UTC · model grok-4.3

classification 💱 q-fin.TR stat.ML
keywords Hawkes processorder book covariateshigh frequency tradingpoint processlarge dataset estimationNYSE stocksnonlinear intensityself-exciting process
0
0 comments X

The pith

A Hawkes process multiplied by high-dimensional functions of order book covariates models high-frequency trading arrivals, with out-of-sample tests on NYSE stocks showing that the nonlinear terms add value beyond self-excitation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a point process for high-frequency trading event arrivals whose intensity equals the product of a standard Hawkes process and high-dimensional functions of order book covariates. It states stationarity conditions that keep the process well-defined, then supplies an estimation algorithm that scales to billions of observations across multiple instruments. Consistency of the estimator is established under weak conditions, and a test statistic is given for comparing model specifications on held-out data. When fitted to four NYSE stocks, the out-of-sample results indicate that the order-book nonlinearity improves predictive performance over pure self-exciting specifications. A reader cares because better intensity models directly affect forecasts of trade timing and market liquidity in liquid assets.

Core claim

The intensity of the point process is the product of a Hawkes process and high-dimensional functions of covariates derived from the order book. Conditions for stationarity of the process are stated. An algorithm is presented to estimate the model even in the presence of billions of data points. Convergence of the algorithm is shown, consistency results under weak conditions are established, and a test statistic to assess out-of-sample performance of different model specifications is suggested. The methodology is applied to four stocks that trade on the NYSE, where the out-of-sample testing procedure suggests that capturing the nonlinearity of the order book information adds value to the self

What carries the argument

The intensity function formed as the product of a Hawkes process and high-dimensional functions of order book covariates.

If this is right

  • The estimation procedure remains consistent and computationally feasible when the dataset grows to billions of points from multiple liquid instruments.
  • A formal test statistic can rank model variants according to their out-of-sample performance on held-out high-frequency data.
  • The stationarity conditions ensure that the combined Hawkes-plus-covariate intensity defines a valid point process for long trading sessions.
  • Application to NYSE stocks shows measurable predictive gain from the order-book component over a pure Hawkes specification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same product-form construction could be tried on limit-order data from other exchanges to check whether the added value persists outside NYSE equities.
  • If the nonlinear terms capture persistent order-book effects, forecasts derived from the model could be used to adjust execution schedules in real time.
  • The framework leaves open the possibility of replacing the Hawkes kernel with other self-exciting specifications while retaining the same covariate multiplier and estimation algorithm.

Load-bearing premise

The true intensity takes exactly the product form of a Hawkes process multiplied by high-dimensional functions of order book covariates, and the stated stationarity conditions hold so that the process remains well-defined for estimation.

What would settle it

Applying the same out-of-sample test procedure to additional stocks or later time periods and obtaining no statistically significant improvement when the nonlinear order-book terms are included would falsify the claim that those terms add value.

Figures

Figures reproduced from arXiv: 2307.09077 by Alessio Sancetta, Luca Mucciante.

Figure 1
Figure 1. Figure 1: CSCO Volume Imbalance Level 1. The model is for L = 1 in (2) for CISCO and Any Trade Arrivals of buy trades. The estimated coefficients bk for VolImb1 are plotted on the Y-axis as a function of k, which is the bin number out of 8 bins based on quantiles (see Section 5.2). Bin numbers greater than 4 correspond to positive values of VolImb1 [PITH_FULL_IMAGE:figures/full_fig_p029_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CSCO Duration (Dur98). The model is for L = 1 in (2) for CISCO and Any Trade Arrivals of buy trades. The estimated coefficients bk for Dur98 are plotted on the Y-axis as a function of k, which is the bin number out of 8 bins based on quantiles (see Section 5.2). For example, bin number 1 corresponds to Dur98 smaller that the 1% quantile. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
read the original abstract

A point process for event arrivals in high frequency trading is presented. The intensity is the product of a Hawkes process and high dimensional functions of covariates derived from the order book. Conditions for stationarity of the process are stated. An algorithm is presented to estimate the model even in the presence of billions of data points, possibly mapping covariates into a high dimensional space. The large sample size can be common for high frequency data applications using multiple liquid instruments. Convergence of the algorithm is shown, consistency results under weak conditions is established, and a test statistic to assess out of sample performance of different model specifications is suggested. The methodology is applied to the study of four stocks that trade on the New York Stock Exchange (NYSE). The out of sample testing procedure suggests that capturing the nonlinearity of the order book information adds value to the self exciting nature of high frequency trading events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a point process model for high-frequency trading event arrivals in which the intensity is the product of a Hawkes process and high-dimensional functions of order-book covariates. Stationarity conditions are stated, an estimation algorithm scalable to billions of data points is developed with proofs of convergence and consistency under weak conditions, an out-of-sample test statistic is proposed, and the model is applied to four NYSE stocks; the out-of-sample results indicate that nonlinear order-book effects add value beyond self-excitation alone.

Significance. If the stationarity conditions hold for the fitted model and the out-of-sample comparison is valid, the work supplies a practical, scalable framework for combining self-exciting dynamics with nonlinear order-book information in large HFT datasets. The consistency results and large-scale algorithm constitute a technical contribution that could support more accurate modeling of event clustering in quantitative finance.

major comments (2)
  1. [Theoretical section stating stationarity conditions] The stationarity conditions (stated in the theoretical development) impose integrability or boundedness requirements on the covariate multiplier function f. The manuscript does not report post-estimation verification that the fitted high-dimensional nonlinear f satisfies these bounds on the observed NYSE order-book paths, including extreme states such as near-empty books or large imbalances. Because the central out-of-sample test statistic and consistency claims presuppose a well-defined stationary process, this verification is load-bearing.
  2. [Section describing the out-of-sample test statistic and its application] The out-of-sample test statistic is presented as an independent check on model specifications. However, the manuscript does not detail how the test accounts for the estimation of the high-dimensional nonlinear covariate functions (e.g., via cross-validation or penalty terms), raising the possibility that apparent gains from nonlinearity partly reflect in-sample overfitting rather than genuine predictive improvement.
minor comments (2)
  1. [Abstract] The abstract refers to 'consistency results under weak conditions' without enumerating those conditions or citing the relevant theorem; a brief pointer would improve readability.
  2. [Empirical application section] Notation for the covariate functions and their mapping to high-dimensional space is introduced without an explicit table or diagram summarizing the functional forms used in the NYSE application.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and verifications.

read point-by-point responses
  1. Referee: [Theoretical section stating stationarity conditions] The stationarity conditions (stated in the theoretical development) impose integrability or boundedness requirements on the covariate multiplier function f. The manuscript does not report post-estimation verification that the fitted high-dimensional nonlinear f satisfies these bounds on the observed NYSE order-book paths, including extreme states such as near-empty books or large imbalances. Because the central out-of-sample test statistic and consistency claims presuppose a well-defined stationary process, this verification is load-bearing.

    Authors: We agree that explicit post-estimation verification of the stationarity conditions on the fitted f is important, especially for extreme order-book states. In the revised version we will add numerical checks computing the relevant integrability/boundedness quantities on the observed NYSE paths (including near-empty books and large imbalances) using the estimated nonlinear f. This will directly support the validity of the out-of-sample statistic and consistency results. revision: yes

  2. Referee: [Section describing the out-of-sample test statistic and its application] The out-of-sample test statistic is presented as an independent check on model specifications. However, the manuscript does not detail how the test accounts for the estimation of the high-dimensional nonlinear covariate functions (e.g., via cross-validation or penalty terms), raising the possibility that apparent gains from nonlinearity partly reflect in-sample overfitting rather than genuine predictive improvement.

    Authors: We acknowledge that the current description of the out-of-sample test does not explicitly address how estimation of the high-dimensional nonlinear functions is accounted for. In revision we will expand the relevant section to detail the procedure, including the role of any cross-validation, regularization, or penalty terms used during estimation and how these carry into the test statistic, thereby clarifying that reported gains reflect genuine predictive improvement rather than overfitting. revision: yes

Circularity Check

0 steps flagged

No circularity: model, estimation, and out-of-sample test are independently specified.

full rationale

The paper defines a product-form intensity (Hawkes times covariate functions), states stationarity conditions as assumptions, derives an estimation algorithm with convergence and consistency proofs under weak conditions, and proposes a separate out-of-sample test statistic. The central empirical claim (nonlinear order-book terms improve fit) is evaluated via this test on NYSE data rather than being forced by construction from the fitted parameters or any self-citation chain. No step reduces a claimed prediction or uniqueness result to a redefinition or fit of its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the product intensity form, stationarity conditions, and convergence of the estimation algorithm; no explicit free parameters, axioms, or invented entities are detailed in the abstract.

free parameters (1)
  • Hawkes kernel and covariate function parameters
    Parameters of the intensity functions are estimated from data but not enumerated in the abstract.
axioms (1)
  • domain assumption Stationarity conditions for the process
    Required for the intensity to define a valid stationary point process.

pith-pipeline@v0.9.0 · 5672 in / 1175 out tokens · 24721 ms · 2026-05-24T08:18:32.596713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

  1. [1]

    Bussy, S

    Alaya, M.Z., S. Bussy, S. Gaïffas, and A. Guilloux (2019) Binarsity: A Penalization for One-Hot Encoded Features in Linear Supervised Learning. Journal of Machine Learning Research 20, 1-34

  2. [2]

    Mastromatteo and J.-F

    Bacry, E., I. Mastromatteo and J.-F. Muzy (2015) Hawkes Processes in Finance. Market Microstructure and Liquidity 1, No.1

  3. [3]

    Bauwens, L.andN.Hautsch(2009)ModellingFinancialHighFrequencyDataUsingPoint Processes. In T.G. Andersen, R.A. Davis, J.-P. Kreiss and T. Mikosch (eds.), Handbook of Financial Time Series, 953-982. New York: Springer

  4. [4]

    Beck, A. and L. Tetruashvili (2013) On the Convergence of Block Coordinate Descent Type Methods. SIAM Journal on Optimization 23, 2037–2060

  5. [5]

    (1981) Point Processes and Queues: Martingales Dynamics

    Brémaud, P. (1981) Point Processes and Queues: Martingales Dynamics. New York: Springer

  6. [6]

    Brémaud, P. and L. Massoulié (1996) Stability of Nonlinear Hawkes Processes. Annals of Probability 24, 1563-1588

  7. [7]

    https://arxiv.org/abs/2111.10637

    Cartea, A., S.N.CohenandS.Labyad(2021)Gradient-basedEstimationofLinearHawkes Processes with General Kernels. https://arxiv.org/abs/2111.10637

  8. [8]

    Cheysson, F. and G. Lang (2022) Spectral estimation of Hawkes processes from count data. Forthcoming in the Annals of Statistics, https://arxiv.org/abs/2003.04314

  9. [9]

    Kukanov and S

    Cont, R., A. Kukanov and S. Stoikov (2014) The Price Impact of Order Book Events. Journal of Financial Econometrics 12, 47-88

  10. [10]

    Da Fonseca, J. and R. Zaatour (2014) Hawkes Process: Fast Calibration, Application to Trade Clustering, and Diffusive Limit. Journal of Futures Markets 34, 548-579

  11. [11]

    Daley, D.J. and D. Vere-Jones (2003) An Introduction to the Theory of Point Processes, Volume II. New York: Springer

  12. [12]

    Engle, R.F. and J.R. Russell (1998) Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data. Econometrica 66, 1127-1162. 30

  13. [13]

    Sornette (2015) Apparent Criticality and Calibration Issues in the HawkesSelf-ExcitedPointProcessModel: ApplicationtoHigh-FrequencyFinancialData

    Filimonov, V and D. Sornette (2015) Apparent Criticality and Calibration Issues in the HawkesSelf-ExcitedPointProcessModel: ApplicationtoHigh-FrequencyFinancialData. Quantitative Finance 15, 1293-1314

  14. [14]

    Bouchaud and Michael Benzaquen (2020) Endogenous Liquidity Crises

    Fosset, A., J.-P. Bouchaud and Michael Benzaquen (2020) Endogenous Liquidity Crises. https://arxiv.org/abs/1912.00359

  15. [15]

    Hastie, H

    Friedman, J., T. Hastie, H. Höfling, and R. Tibshirani (2007) Pathwise Coordinate Opti- mization. Annals of Applied Statistics 1, 302-332

  16. [16]

    Gaïffas, S. and A. Guilloux (2012) High-Dimensional Additive Hazards Models and the Lasso. Electronic Journal of Statistics 6, 522-546

  17. [17]

    Gao, X. and L. Zhu (2018) Functional Central Limit Theorems for Stationary Hawkes Processes and Application to Infinite-Server Queues. Queueing Systems 90, 161–206

  18. [18]

    Grossman, S.-J. and J.E. Stiglitz (1980) On the Impossibility of Informationally Efficient Markets. American Economic Review 70, 393-408

  19. [19]

    Hall, A.D. and N. Hautsch (2007) Modelling the Buy and Sell Intensity in a Limit Order Book Market. Journal of Financial Markets 10, 249-286

  20. [20]

    (1971) Spectra of Some Self-Exciting and Mutually Exciting Point Pro- cesses

    Hawkes, A.G. (1971) Spectra of Some Self-Exciting and Mutually Exciting Point Pro- cesses. Biometrika 58, 83–90

  21. [21]

    Hawkes, A.G. and D. Oakes (1974) A Cluster Process Representation of a Self-Exciting Process. Journal of Applied Probability 11, 493-503

  22. [22]

    Huang, R. and T. Polak (2011) LOBSTER: The Limit Order Book Reconstructor. Tech- nical Report, School of Business and Economics, Humboldt Universität zu Berlin

  23. [23]

    (1997) Foundations of Modern Probability

    Kallneberg, O. (1997) Foundations of Modern Probability. New York: Springer

  24. [24]

    Zhang (2015) Modelling High-Frequency Limit Order Book Dynamics with Support Vector Machines

    Kercheval, A.N., Y. Zhang (2015) Modelling High-Frequency Limit Order Book Dynamics with Support Vector Machines. Quantitative Finance 15, 1-15

  25. [25]

    (2017) An Estimation Procedure for the Hawkes Process

    Kirchner, M. (2017) An Estimation Procedure for the Hawkes Process. Quantitative Fi- nance 17, 571-595

  26. [26]

    (2017) A Material Political Economy: Automated Trading Desk and Price Prediction in High - Frequency Trading

    MacKenzie, D. (2017) A Material Political Economy: Automated Trading Desk and Price Prediction in High - Frequency Trading. Social Studies of Science 47, 172-194

  27. [27]

    Morariu-Patrichi, M. and M.S. Pakkanen (2022) State-Dependent Hawkes Processes and their Application to Limit Order Book Modelling. Quantitative Finance 22, 563-583. 31

  28. [28]

    From asymptotic properties of general point processes to the ranking of financial agents

    Mounjid, O., M. Rosenbaum and P. Saliba (2019) From Asymptotic Properties of General Point Processes to the Ranking of Financial Agents. https://arxiv.org/abs/1906.05420

  29. [29]

    Mucciante, L. and A. Sancetta (2022) Estimation of a High Dimensional Count- ing Process Without Penalty for High Frequency Events. Econometric Theory: https://doi.org/10.1017/S0266466622000238

  30. [30]

    (2000) Weak Convergence of Some Classes of Martingales with Jumps

    Nishiyama, Y. (2000) Weak Convergence of Some Classes of Martingales with Jumps. Annals of Probability 28, 685-712

  31. [31]

    (1978) The Asymptotic Behaviour of the Maximum Likelihood Estimator for Stationary Point Processes

    Ogata, Y. (1978) The Asymptotic Behaviour of the Maximum Likelihood Estimator for Stationary Point Processes. Annals of the Institute of Statistical Mathematics 30, 243-261

  32. [32]

    Ogata, Y. and H. Akaike (1982) On Linear Intensity Models for Mixed Doubly Stochastic Poisson and Self-Exciting Point Processes. Journal of the Royal Statistical Society B 44, 102-107

  33. [33]

    (2016) Greedy Algorithms for Prediction

    Sancetta, A. (2016) Greedy Algorithms for Prediction. Bernoulli 22, 1227-1277

  34. [34]

    (2018) Estimation for the Prediction of Point Processes with Many Covari- ates

    Sancetta, A. (2018) Estimation for the Prediction of Point Processes with Many Covari- ates. Econometric Theory 34, 598-627.89-107

  35. [35]

    Tsybakov, A.B.(2003)OptimalRatesofAggregation.ProceedingsofCOLT-2003, Lecture Notes in Artificial Intelligence, 303-313

  36. [36]

    Queue-reactive Hawkes models for the order flow

    Wu, P., M. Rambaldi, J.-F. Muzy and E. Bacry (2019) Queue-Reactive Hawkes Models for the Order Flow. https://arxiv.org/abs/1901.08938

  37. [37]

    Duchi and M

    Zhang, Y., J. Duchi and M. Wainwright (2015) Divide and Conquer Kernel Ridge Regres- sion:A Distributed Algorithm with Minimax Optimal Rates. Journal of Machine Learning Research 16, 3299-3340. 32 Appendix In Section A.1, we give further information on the interpretation of the model in terms of exogenous and endogenous information arrival. Section A.2 pr...

  38. [38]

    We multiply by 10 to avoid skrinking coefficients too much

    This implies thatB = 10Kβ. We multiply by 10 to avoid skrinking coefficients too much. For the models that are linear in the raw covariates we allow the linear coefficientsbk to be in [−10β, 10β], as otherwise we cannot capture negative impact. In this case, the intensity is not guaranteed to be nonnegative. Hence, when testing, we impose a lower bound on...