Zero-shot adaptation to order book dynamics

Arip Asadulaev

arxiv: 2605.21707 · v1 · pith:5ZZSJY7Mnew · submitted 2026-05-20 · 💻 cs.CE · cs.LG

Zero-shot adaptation to order book dynamics

Arip Asadulaev This is my paper

Pith reviewed 2026-05-22 07:50 UTC · model grok-4.3

classification 💻 cs.CE cs.LG

keywords market makingorder bookAvellaneda-StoikovHamilton-Jacobi-Bellmanzero-shot adaptationadaptive tradingregime shifts

0 comments

The pith

Market makers can adapt to new regimes by feeding recent rewards into an unchanged Avellaneda-Stoikov HJB map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to keep the fast analytical structure of the classic Avellaneda-Stoikov market-making model while allowing the strategy to adjust to changing market conditions and trading goals. Market dynamics are captured in a small set of parameters, while recent realized rewards are compressed into a separate low-dimensional objective vector. The existing HJB forward map then turns that objective vector into optimal bid and ask quotes by scalarizing future reward features. This separation is meant to enable zero-shot adaptation whenever regimes or objectives shift. A sympathetic reader would care because it offers a route to flexible, interpretable trading without sacrificing the speed and transparency of analytical solutions.

Core claim

By separating market-state parameters from a low-dimensional objective vector derived from recent realized rewards, the original Avellaneda-Stoikov HJB structure can be reused without modification. The HJB forward map converts the objective vector into optimal bid and ask quotes through scalarization of future reward features, preserving analytical tractability across arbitrary regime shifts and trading objectives.

What carries the argument

The HJB forward map that scalarizes future reward features from the low-dimensional objective vector into bid and ask quotes while the market-state parameters remain fixed.

If this is right

Optimal quotes for any new trading objective can be obtained instantly once the objective vector is estimated from recent rewards.
Market regime changes require only an update to the low-dimensional market-state parameters while the HJB map stays fixed.
The computational cost of producing quotes remains the same as the original non-adaptive Avellaneda-Stoikov solver.
Trading decisions stay interpretable because the objective is explicitly expressed as a scalarization of observable reward features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation pattern could be applied to other stochastic control problems in finance to enable regime-adaptive control without re-deriving the value function.
If the low-dimensional objective vector generalizes across assets, the method might support zero-shot transfer of market-making policies between related instruments.
Backtesting the approach on real order-book data with documented volatility jumps would directly measure how much performance is retained relative to full re-optimization.

Load-bearing premise

Recent realized rewards can be compressed into a low-dimensional objective vector independent of the market-state parameters, and the HJB forward map remains valid under this separation for arbitrary regime shifts.

What would settle it

In a simulation with known abrupt regime shifts, if the quotes generated by the separated objective vector produce materially lower cumulative reward than those obtained by fully re-solving the HJB equation on the new regime, the separation would fail to preserve optimality.

Figures

Figures reproduced from arXiv: 2605.21707 by Arip Asadulaev.

**Figure 2.** Figure 2: AS single diagnostic. 7 Experiments We evaluate three market-making policies on the same reconstructed BTCUSDT limit-order-book dataset: a Gu´eant–Lehalle–Fernandez-Tapia (GLFT) fixed-latency strategy [3], a fixed-latency Avellaneda–Stoikov strategy, and our rolling forward-backward Avellaneda–Stoikov HJB method. Please see full algorithm implementation details in Appendix C. All experiments use the same m… view at source ↗

**Figure 3.** Figure 3: FB-AS diagnostics. drawdown, daily number of trades, daily turnover, return over maximum drawdown, return per trade, and maximum position value. Maximum drawdown is reported as a positive magnitude. For each method, we save both summary equity and position plots and diagnostic plots for volatility, order-arrival parameters, quote distances, and adaptive variables. 7.2 Baselines The first baseline is the GL… view at source ↗

read the original abstract

We describe an adaptive market-making architecture that preserves the analytical structure of the Avellaneda--Stoikov framework while introducing a successor measure-style adaptation mechanism. In our paper we keep Avellaneda--Stoikov fast Hamilton--Jacobi--Bellman structure and make it adaptive to changing market regimes and trading objectives. The central idea is to separate market dynamics from the trading objective. The market state determines a low-dimensional set of Avellaneda--Stoikov parameters, while recent realized rewards determine a low-dimensional objective vector. The HJB forward map then converts this objective into optimal bid and ask quotes through a scalarization of future reward features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a zero-shot adaptive market-making architecture that retains the analytical structure of the Avellaneda-Stoikov framework. Market state determines a low-dimensional set of AS parameters while recent realized rewards determine a low-dimensional objective vector; an HJB forward map then scalarizes future reward features to produce optimal bid and ask quotes, enabling adaptation to regime shifts and changing objectives without retraining.

Significance. If the separation of dynamics from objectives can be shown to preserve optimality and the forward map remains valid under arbitrary regime shifts, the method would provide a computationally lightweight way to adapt market-making policies while retaining the closed-form advantages of the original AS HJB solution. This addresses a practical gap in high-frequency trading where both market microstructure and reward preferences evolve.

major comments (2)

[Abstract] Abstract: the central claim that the HJB forward map converts the separated objective vector into optimal quotes rests on an unstated derivation; no explicit PDE, intensity function λ(δ), or scalarization step is supplied, making it impossible to verify whether the construction recovers the true optimum of the joint dynamics-reward problem when arrival rates or volatility change.
[Abstract] Abstract: the assumption that recent realized rewards can be compressed into a low-dimensional objective vector that remains independent of the current market-state parameters is load-bearing for zero-shot adaptation, yet the original AS value function couples the control offsets to both the generator and the running reward; regime shifts that alter the state process therefore risk making the separated map suboptimal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address the two major points raised regarding the abstract in turn, providing clarifications drawn from the full manuscript and indicating planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the HJB forward map converts the separated objective vector into optimal quotes rests on an unstated derivation; no explicit PDE, intensity function λ(δ), or scalarization step is supplied, making it impossible to verify whether the construction recovers the true optimum of the joint dynamics-reward problem when arrival rates or volatility change.

Authors: The abstract is intentionally concise and therefore omits the explicit equations. The underlying PDE is the standard Avellaneda-Stoikov Hamilton-Jacobi-Bellman equation with intensity λ(δ) = A exp(−k δ) on each side. The scalarization step incorporates the low-dimensional objective vector as linear weights on the inventory-penalty and terminal-wealth features inside the running reward; the successor measure then yields the adjusted controls while the market-state parameters remain fixed. This construction is derived in Section 3. We will revise the abstract to include a one-sentence reference to the PDE, the intensity form, and the scalarization weights so that the forward-map claim can be verified without reading the body. revision: partial
Referee: [Abstract] Abstract: the assumption that recent realized rewards can be compressed into a low-dimensional objective vector that remains independent of the current market-state parameters is load-bearing for zero-shot adaptation, yet the original AS value function couples the control offsets to both the generator and the running reward; regime shifts that alter the state process therefore risk making the separated map suboptimal.

Authors: The original AS value function does couple controls to both generator and reward. Our separation exploits the successor measure to treat the objective vector as an independent rescaling of the reward features; the market-state parameters (volatility, arrival-rate coefficients) are updated separately from recent reward statistics. Under this decomposition the forward map remains optimal for the current dynamics and the new objective. We will add an explicit paragraph in the methods section stating the precise conditions under which the independence holds and noting the limitation when a regime shift also alters the reward structure itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity; separation and HJB map rely on external AS framework

full rationale

The paper presents a modeling choice to separate market-state parameters (from dynamics) from an objective vector (from recent rewards), followed by application of the HJB forward map drawn from the established Avellaneda-Stoikov framework. No equations or derivations are shown that reduce a claimed prediction to a fitted parameter by construction, nor is there a self-definitional loop or load-bearing self-citation chain. The central construction introduces an adaptation mechanism while explicitly preserving the external analytical structure, rendering the derivation self-contained against independent benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the validity of the original Avellaneda-Stoikov HJB solution and on the unstated assumption that market state and trading objective can be treated as independent low-dimensional inputs; no free parameters or new entities are explicitly listed in the abstract.

axioms (1)

domain assumption The Avellaneda-Stoikov framework supplies a valid and fast HJB solution for market making under standard assumptions on price dynamics and inventory risk.
The paper states that it preserves the analytical structure of this framework.

invented entities (1)

successor measure-style adaptation mechanism no independent evidence
purpose: To update the objective vector from recent realized rewards while keeping market-state parameters separate.
Introduced as the central idea enabling zero-shot adaptation.

pith-pipeline@v0.9.0 · 5621 in / 1404 out tokens · 65767 ms · 2026-05-22T07:50:51.800949+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The central idea is to separate market dynamics from the trading objective. The market state determines a low-dimensional set of Avellaneda–Stoikov parameters, while recent realized rewards determine a low-dimensional objective vector. The HJB forward map then converts this objective into optimal bid and ask quotes through a scalarization of future reward features.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Zero-shot off-policy learning.arXiv preprint arXiv:2602.01962, 2026

Arip Asadulaev, Maksim Bobrin, Salem Lahlou, Dmitry Dylov, Fakhri Karray, and Martin Takac. Zero-shot off-policy learning.arXiv preprint arXiv:2602.01962, 2026

work page internal anchor Pith review arXiv 2026
[2]

High-frequency trading in a limit order book.Quantitative Finance, 8(3):217–224, 2008

Marco Avellaneda and Sasha Stoikov. High-frequency trading in a limit order book.Quantitative Finance, 8(3):217–224, 2008

work page 2008
[3]

Optimal market making.Applied Mathematical Finance, 24(2):112–154, 2017

Olivier Gu´ eant. Optimal market making.Applied Mathematical Finance, 24(2):112–154, 2017

work page 2017
[4]

Hftbacktest, 2024

nkaz001 https://github.com/nkaz001/hftbacktest. Hftbacktest, 2024

work page 2024
[5]

arXiv preprint arXiv:2504.11054 , year=

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta. Zero-shot whole-body humanoid control via behavioral foundation models.arXiv preprint arXiv:2504.11054, 2025

work page arXiv 2025
[6]

Learning one representation to optimize all rewards.Advances in Neural Information Processing Systems, 34:13–23, 2021

Ahmed Touati and Yann Ollivier. Learning one representation to optimize all rewards.Advances in Neural Information Processing Systems, 34:13–23, 2021. 9 A Practical Specifications A.1 Stabilization and Corrections The pure FB estimate bzt =E ρt rH(s, a)Bψ(s, a) can be noisy, especially in short windows. A stable production version should therefore mix it ...

work page 2021

[1] [1]

Zero-shot off-policy learning.arXiv preprint arXiv:2602.01962, 2026

Arip Asadulaev, Maksim Bobrin, Salem Lahlou, Dmitry Dylov, Fakhri Karray, and Martin Takac. Zero-shot off-policy learning.arXiv preprint arXiv:2602.01962, 2026

work page internal anchor Pith review arXiv 2026

[2] [2]

High-frequency trading in a limit order book.Quantitative Finance, 8(3):217–224, 2008

Marco Avellaneda and Sasha Stoikov. High-frequency trading in a limit order book.Quantitative Finance, 8(3):217–224, 2008

work page 2008

[3] [3]

Optimal market making.Applied Mathematical Finance, 24(2):112–154, 2017

Olivier Gu´ eant. Optimal market making.Applied Mathematical Finance, 24(2):112–154, 2017

work page 2017

[4] [4]

Hftbacktest, 2024

nkaz001 https://github.com/nkaz001/hftbacktest. Hftbacktest, 2024

work page 2024

[5] [5]

arXiv preprint arXiv:2504.11054 , year=

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta. Zero-shot whole-body humanoid control via behavioral foundation models.arXiv preprint arXiv:2504.11054, 2025

work page arXiv 2025

[6] [6]

Learning one representation to optimize all rewards.Advances in Neural Information Processing Systems, 34:13–23, 2021

Ahmed Touati and Yann Ollivier. Learning one representation to optimize all rewards.Advances in Neural Information Processing Systems, 34:13–23, 2021. 9 A Practical Specifications A.1 Stabilization and Corrections The pure FB estimate bzt =E ρt rH(s, a)Bψ(s, a) can be noisy, especially in short windows. A stable production version should therefore mix it ...

work page 2021