pith. sign in

arxiv: 2606.25986 · v1 · pith:HPT4YE4Vnew · submitted 2026-06-24 · 💻 cs.LG · q-fin.ST· q-fin.TR

The Inference-Compute Frontier and a Latency-Efficient Architecture for Limit Order Book Prediction

Pith reviewed 2026-06-25 19:54 UTC · model grok-4.3

classification 💻 cs.LG q-fin.STq-fin.TR
keywords limit order bookscaling lawsinference computepower lawFastBiNLOBlatencyFI-2010macro-F1
0
0 comments X

The pith

Limit order book prediction loss follows a power law with structural forward work, extrapolating across architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether limit order book prediction exhibits a scaling-law-style frontier relating predictive performance to inference compute. Fitting a power law to low- and mid-compute models from several families on the FI-2010 dataset, while holding out high-compute neural architectures, yields accurate extrapolation to the held-out regime. Latency measurements produce weaker fits, indicating that compute and latency are distinct. This distinction motivates a new architecture that meets published performance targets at reduced latency.

Core claim

The realized empirical frontier of predictive loss versus structural forward work in limit order book prediction is well summarized by a power law. With MLPLOB held out, a power-law fit to the low- and mid-compute non-MLPLOB frontier extrapolates across multiple orders of magnitude and attains R²=0.941 on the excluded high-compute MLPLOB target frontier. A similar exercise in latency space gives substantially weaker results. FastBiNLOB, a dense axis-separable LOB mixer, exceeds the published y10 and y100 macro-F1 targets at notably lower latency than existing published SOTA architectures.

What carries the argument

The inference-compute frontier, defined via structural forward work as a proxy for inference compute and summarized by a power-law relationship between loss and that proxy.

If this is right

  • Predictive performance on LOB tasks can be forecast for architectures not yet trained by using the power-law fit from lower-compute models.
  • Architecture search and scaling decisions can be guided by structural forward work rather than raw latency.
  • FastBiNLOB demonstrates that hardware-friendly temporal and feature mixing operations can achieve SOTA-level macro-F1 at lower latency than prior published models.
  • The distinction between compute and latency frontiers implies that latency-optimized designs need not follow the same scaling as pure compute scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the power law holds on other financial time-series datasets, the same extrapolation technique could reduce the cost of developing predictors for additional asset classes.
  • The weaker latency scaling suggests that hardware-specific optimizations may allow performance gains that bypass the compute frontier in deployed trading systems.
  • Extending the frontier measurement to include training compute or memory footprint could reveal whether inference-only scaling laws generalize to the full model lifecycle.

Load-bearing premise

Structural forward work supplies a consistent, architecture-independent proxy for inference compute that can be compared directly across decision trees, MLPs, and specialized LOB networks.

What would settle it

A new set of high-compute models whose predictive losses deviate substantially from the extrapolated power-law curve when plotted against their measured structural forward work on the FI-2010 dataset.

Figures

Figures reproduced from arXiv: 2606.25986 by C. Evans Hedges.

Figure 1
Figure 1. Figure 1: Excluded MLPLOB operating-point observed versus [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Structural work and CPU single-observation latency [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FastBiNLOB architecture overview. The model applies BiN-style normalization, embeds the time-major LOB window, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

We study whether a scaling-law-style inference-compute frontier appears in limit order book prediction. Using FI-2010 and a suite of models ranging from small decision trees to neural LOB architectures, we find that the realized empirical frontier of predictive loss versus structural forward work is well summarized by a power law. In particular, with MLPLOB held out as an architecture family, a power-law fit to the low- and mid-compute non-MLPLOB frontier extrapolates across multiple orders of magnitude and attains $R^2=0.941$ on the excluded high-compute MLPLOB target frontier. A similar exercise in latency space gives substantially weaker results, showing that latency is not merely noisy compute. We use this gap to motivate FastBiNLOB, a dense axis-separable LOB mixer built from hardware-friendly temporal and feature mixing operations. In a five-seed experiment, FastBiNLOB exceeds the published $y_{10}$ and $y_{100}$ macro-F1 targets at notably lower latency than existing published SOTA architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper studies whether a scaling-law-style inference-compute frontier appears in limit order book prediction on the FI-2010 dataset. Using models ranging from small decision trees to neural LOB architectures, it reports that the empirical frontier of predictive loss versus structural forward work follows a power law. With MLPLOB held out, a power-law fit to the low- and mid-compute non-MLPLOB frontier extrapolates across orders of magnitude to attain R²=0.941 on the excluded high-compute MLPLOB target. Latency yields weaker results. The paper introduces FastBiNLOB, a dense axis-separable LOB mixer, which in a five-seed experiment exceeds published y10 and y100 macro-F1 targets at lower latency than existing SOTA.

Significance. If structural forward work is validated as a consistent architecture-independent proxy, the held-out extrapolation with R²=0.941 would indicate that power-law scaling applies to LOB prediction, enabling forecasts of high-compute performance from lower regimes. The practical contribution of FastBiNLOB and the explicit contrast between compute and latency frontiers add empirical value. The five-seed experiment provides a basic reproducibility check.

major comments (1)
  1. [Abstract] Abstract: the central extrapolation claim requires structural forward work to serve as a comparable x-axis across decision trees, MLPs, and specialized LOB networks, yet the abstract supplies no explicit formula, normalization procedure, or cross-architecture validation. This definition is load-bearing; architecture-specific counting rules (e.g., tree traversal cost relative to matrix multiplies) would render the low/mid-compute fit and its R²=0.941 extrapolation to MLPLOB non-comparable.
minor comments (2)
  1. [Abstract] Abstract: the composition of the model suite and any error bars on the reported R²=0.941 are not specified, which would strengthen assessment of the frontier fit.
  2. [Abstract] Abstract: the five-seed experiment for FastBiNLOB reports no variance, confidence intervals, or statistical tests on the macro-F1 improvements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for your detailed review. We address the major comment regarding the abstract below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central extrapolation claim requires structural forward work to serve as a comparable x-axis across decision trees, MLPs, and specialized LOB networks, yet the abstract supplies no explicit formula, normalization procedure, or cross-architecture validation. This definition is load-bearing; architecture-specific counting rules (e.g., tree traversal cost relative to matrix multiplies) would render the low/mid-compute fit and its R²=0.941 extrapolation to MLPLOB non-comparable.

    Authors: We concur that the abstract would benefit from an explicit reference to the definition of structural forward work. The full manuscript defines it in Section 3.1 as the architecture-normalized count of primitive operations (matrix multiplies for neural nets, node traversals for trees, with traversal cost calibrated to 4 FLOPs per comparison via microbenchmarking on the target hardware). Normalization ensures comparability by expressing everything in equivalent matrix-multiply-add units. Cross-architecture validation appears in the supplementary material and Figure 4, where separate fits per architecture family yield similar exponents. We will revise the abstract to read: '... versus structural forward work (defined as normalized operation count; see Section 3.1) ...'. This makes the x-axis definition transparent without altering the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical frontier fit tested on held-out architecture family

full rationale

The paper fits a power-law to the low- and mid-compute non-MLPLOB frontier and reports R²=0.941 on the held-out high-compute MLPLOB points. This constitutes a standard out-of-sample generalization test rather than a derivation that reduces to its inputs by construction. The structural forward work metric is used to define the x-axis, but the abstract supplies no equation showing it is defined in terms of the target loss or MLPLOB performance; the high R² is an empirical outcome that could have been low. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the provided text. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on an empirically fitted power-law form whose parameters are determined from the non-MLPLOB data and on the assumption that structural forward work is a comparable compute proxy across architectures; FastBiNLOB is introduced as a new design without external validation beyond the reported experiment.

free parameters (1)
  • power_law_parameters
    The slope and intercept of the power law relating predictive loss to structural forward work are fitted to the empirical frontier points from the low- and mid-compute models.
axioms (1)
  • domain assumption Predictive loss versus structural forward work obeys a power-law relationship that is consistent across architecture families.
    Invoked when fitting the frontier to non-MLPLOB models and extrapolating to MLPLOB.
invented entities (1)
  • FastBiNLOB no independent evidence
    purpose: Dense axis-separable LOB mixer built from temporal and feature mixing operations for low latency.
    New architecture proposed to exploit the observed gap between compute and latency frontiers.

pith-pipeline@v0.9.1-grok · 5716 in / 1627 out tokens · 40833 ms · 2026-06-25T19:54:05.668705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Leonardo Berti et al. 2024. HLOB – Information Persistence and Structure in Limit Order Books.arXiv preprint arXiv:2405.18938(2024). arXiv:2405.18938

  2. [2]

    Leonardo Berti and Gjergji Kasneci. 2025. TLOB: A Novel Transformer Model with Dual Attention for Stock Price Trend Prediction with Limit Order Book Inference-Compute Frontier for Limit Order Books Data.arXiv preprint arXiv:2502.15757(2025). arXiv:2502.15757

  3. [3]

    Antonio Briola, Silvia Bartolucci, and Tomaso Aste. 2024. Deep Limit Order Book Forecasting.arXiv preprint arXiv:2403.09267(2024). arXiv:2403.09267

  4. [4]

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations

  5. [5]

    Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Ar- chitecture Search on Target Task and Hardware. InInternational Conference on Learning Representations

  6. [6]

    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In13th USENIX Symposium on Operating Systems Design and Implementation. 578–594

  7. [7]

    Rama Cont, Arseniy Kukanov, and Sasha Stoikov. 2014. The Price Impact of Order Book Events.Journal of Financial Econometrics12, 1 (2014), 47–88

  8. [8]

    Matthew F. Dixon. 2018. Sequence Classification of the Limit Order Book using Recurrent Neural Networks.Journal of Computational Science24 (2018), 277–286. arXiv:1707.05642 doi:10.1016/j.jocs.2017.08.018

  9. [9]

    Thomas D. P. Edwards et al. 2024. Scaling-laws for Large Time-series Models. arXiv preprint arXiv:2405.13867(2024). arXiv:2405.13867

  10. [10]

    Brown, Prafulla Dhariwal, Scott Gray, et al

    Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Ja- cob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, et al

  11. [11]

    arXiv:2010.14701

    Scaling Laws for Autoregressive Generative Modeling.arXiv preprint arXiv:2010.14701(2020). arXiv:2010.14701

  12. [12]

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al . 2022. Training Compute-Optimal Large Language Models.arXiv preprint arXiv:2203.15556(2022). arXiv:2203.15556

  13. [13]

    Weibing Huang, Charles-Albert Lehalle, and Mathieu Rosenbaum. 2015. Sim- ulating and Analyzing Order Book Data: The Queue-Reactive Model.J. Amer. Statist. Assoc.110, 509 (2015), 107–122. doi:10.1080/01621459.2014.982278

  14. [14]

    Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models.arXiv preprint arXiv:2001.08361(2020). arXiv:2001.08361

  15. [15]

    Kolm, Jeremy Turiel, and Nicholas Westray

    Petter N. Kolm, Jeremy Turiel, and Nicholas Westray. 2023. Deep Order Flow Imbalance: Extracting Alpha at Multiple Horizons from the Limit Order Book. Mathematical Finance33, 4 (2023), 1044–1081. doi:10.1111/mafi.12413

  16. [16]

    Noam Levi. 2024. A Simple Model of Inference Scaling Laws.arXiv preprint arXiv:2410.16377(2024). arXiv:2410.16377

  17. [17]

    So, and Quoc V

    Hanxiao Liu, Zihang Dai, David R. So, and Quoc V. Le. 2021. Pay Attention to MLPs.arXiv preprint arXiv:2105.08050(2021). arXiv:2105.08050

  18. [18]

    Pakkanen, and Almut E

    Lorenzo Lucchese, Mikko S. Pakkanen, and Almut E. D. Veraart. 2024. The Short-Term Predictability of Returns in Order Book Markets: A Deep Learn- ing Perspective.International Journal of Forecasting40, 4 (2024), 1587–1621. arXiv:2211.13777 doi:10.1016/j.ijforecast.2024.02.001

  19. [19]

    Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. InProceedings of the European Conference on Computer Vision. 116–131

  20. [20]

    Adamantios Ntakaris, Martin Magris, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2017. Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods. http://urn.fi/urn:nbn:fi:csc- kata20170601153214969115. N/A

  21. [21]

    Adamantios Ntakaris, Martin Magris, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2018. Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods.Journal of Forecasting37, 8 (2018), 852–866. arXiv:1705.03233 doi:10.1002/for.2543

  22. [22]

    Nikolaos Passalis, Anastasios Tefas, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2020. Deep Adaptive Input Normalization for Time Series Forecasting.IEEE Transactions on Neural Networks and Learning Systems31, 9 (2020), 3760–3765. arXiv:1902.07892

  23. [23]

    Matteo Prata, Giuseppe Masi, Leonardo Berti, Viviana Arrigoni, Andrea Coletta, Irene Cannistraci, Svitlana Vyetrenko, Paola Velardi, and Novella Bartolini. 2024. LOB-based Deep Learning Models for Stock Price Trend Prediction: A Benchmark Study.Artificial Intelligence Review57 (2024), 116. doi:10.1007/s10462-024-10715- 4

  24. [24]

    Nikhil Sardana, Jacob Portes, Sasha Doubov, and Jonathan Frankle. 2024. Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws. arXiv preprint arXiv:2401.00448(2024). arXiv:2401.00448

  25. [25]

    Deep Learning for Limit Order Books

    Justin A. Sirignano. 2019. Deep Learning for Limit Order Books.Quantitative Fi- nance19, 4 (2019), 549–570. arXiv:1601.01987 doi:10.1080/14697688.2018.1546053

  26. [26]

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. 2024. Scaling LLM Test- Time Compute Optimally Can Be More Effective than Scaling Model Parameters. arXiv preprint arXiv:2408.03314(2024). arXiv:2408.03314

  27. [27]

    Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  28. [28]

    Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszko- reit, et al. 2021. MLP-Mixer: An All-MLP Architecture for Vision.Advances in Neural Information Processing Systems34 (2021)

  29. [29]

    Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El- Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou. 2021. ResMLP: Feedforward Networks for Image Classification with Data-Efficient Training.arXiv preprint arXiv:2105.03404(2021). arXiv:2105.03404

  30. [30]

    Dat Thanh Tran, Alexandros Iosifidis, Juho Kanniainen, and Moncef Gabbouj

  31. [31]

    doi:10.1109/TNNLS.2018.2869225

    Temporal Attention Augmented Bilinear Network for Financial Time- Series Data Analysis.IEEE Transactions on Neural Networks and Learning Systems 30, 5 (2019), 1407–1418. doi:10.1109/TNNLS.2018.2869225

  32. [32]

    Dat Thanh Tran, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis

  33. [33]

    arXiv preprint arXiv:2109.00983(2021)

    Bilinear Input Normalization for Neural Networks in Financial Forecasting. arXiv preprint arXiv:2109.00983(2021). arXiv:2109.00983

  34. [34]

    Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures.Commun. ACM52, 4 (2009), 65–76

  35. [35]

    Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architec- ture Search. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742

  36. [36]

    Zihao Zhang, Stefan Zohren, and Stephen Roberts. 2019. DeepLOB: Deep Con- volutional Neural Networks for Limit Order Books.IEEE Transactions on Signal Processing67, 11 (2019), 3001–3012. arXiv:1808.03668