pith. sign in

arxiv: 2606.19772 · v1 · pith:K4HBFVW3new · submitted 2026-06-18 · 🧮 math.OC

Signature Methods for Optimal Market Making

Pith reviewed 2026-06-26 16:40 UTC · model grok-4.3

classification 🧮 math.OC
keywords market makingsignature methodsoptimal controlreinforcement learningmean-variance criterionHawkes processbid-ask quotesstochastic optimization
0
0 comments X

The pith

Signature linearization reduces optimal market making to pseudo-linear optimization over expected path signatures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that signature methods can convert the market-making problem of choosing bid and ask quotes into a simpler optimization task. It does so by expressing the mean-variance objective as a linear functional of the expected signature of an augmented market path that includes order arrivals and price moves. From this reduction the authors derive the Sig-REINFORCE algorithm, which learns the quoting policy directly from samples of the signature. The method is demonstrated on two arrival models: Poisson and self-exciting Hawkes processes, with performance measured against a Proximal Policy Optimization baseline. A reader cares because the approach replaces full dynamic programming or black-box reinforcement learning with a structured, lower-dimensional optimization that still respects the stochastic nature of the order book.

Core claim

By exploiting signature linearization techniques, we reduce the market-making problem to a pseudo-linear optimization over the expected signature of an augmented market path, and we develop a signature algorithm named Sig-REINFORCE to learn the optimal bid and ask quotes. We test our method in two scenarios, in which market-order arrivals follow either a Poisson or a self-exciting Hawkes process, and we benchmark it against a Proximal Policy Optimization (PPO) baseline.

What carries the argument

Signature linearization of the mean-variance market-making objective, which converts the control problem into optimization over the expected signature of an augmented price-and-arrival path.

If this is right

  • Sig-REINFORCE produces explicit quoting rules for both Poisson and Hawkes order arrivals.
  • The learned quotes can be evaluated directly on simulated paths without retraining the full policy network.
  • The pseudo-linear form allows the same signature representation to be reused across different mean-variance risk parameters.
  • Benchmark comparisons with PPO become possible on identical simulated trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linearization step could be applied to other stochastic-control problems whose payoff is a low-degree polynomial in path integrals.
  • Because the signature is computed once per path, the method may scale to higher-dimensional order-book features without a proportional increase in policy parameters.
  • If the arrival process is misspecified, the learned quotes will be optimal only under the assumed dynamics, not necessarily under real-market statistics.

Load-bearing premise

The signature linearization accurately captures the mean-variance objective for the Poisson and Hawkes arrival processes used in the tests.

What would settle it

Generate synthetic order-book paths under a Poisson arrival process, compute the mean-variance value achieved by Sig-REINFORCE quotes, and check whether it matches the value obtained by solving the same problem with PPO or by direct dynamic programming on a discretized state space.

Figures

Figures reproduced from arXiv: 2606.19772 by Alberto Gennaro, Francesca Primavera, Thibaut Mastrolia.

Figure 1
Figure 1. Figure 1: R2 of the ridge fit to the expert policy and out-of-sample percentage reward of the fitted linear-in-signature rule, as a function of the signature truncation level K, for the Poisson order flow. Conditional independence of the state processes. The Brownian motion is pre-sampled once per batch on a fine grid and then frozen: the price path is the deterministic functional St “ S0`σWt obtained by interpolati… view at source ↗
Figure 2
Figure 2. Figure 2: Averaged market order arrivals (left) and average bid and ask spread quoted (right) [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Payoff distribution over time of the market maker with Sig-REINFORCE, Sig-PPO [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Return Sig-REINFORCE v.s. Sig-PPO [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Expected Bid-Ask spread value with Sig-REINFORCE v.s. Sig-PPO (top) and [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

We propose a signature-based method to solve the optimal market-making problem under a mean-variance criterion. By exploiting signature linearization techniques, we reduce the market-making problem to a pseudo-linear optimization over the expected signature of an augmented market path, and we develop a signature algorithm named Sig-REINFORCE to learn the optimal bid and ask quotes. We test our method in two scenarios, in which market-order arrivals follow either a Poisson or a self-exciting Hawkes process, and we benchmark it against a Proximal Policy Optimization (PPO) baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a signature-based method to solve the optimal market-making problem under a mean-variance criterion. By exploiting signature linearization techniques, it reduces the market-making problem to a pseudo-linear optimization over the expected signature of an augmented market path and develops the Sig-REINFORCE algorithm to learn optimal bid and ask quotes. The method is tested in two scenarios with Poisson or Hawkes process market-order arrivals and benchmarked against a PPO baseline.

Significance. If the reduction and algorithm are shown to be correct and effective, the approach could provide a novel way to handle market-making optimization via signature methods, potentially offering advantages in linearity and applicability to point processes over standard RL methods. However, the absence of any derivation, error bounds, or empirical results in the available text makes it impossible to assess whether these benefits materialize.

major comments (2)
  1. [Abstract] Abstract: The central claim that signature linearization reduces the mean-variance market-making objective to a pseudo-linear optimization is asserted without any derivation steps, error analysis, or numerical evidence. This prevents verification of whether the linearization accurately captures the objective for the chosen arrival processes.
  2. [Abstract] Abstract: No details are given on how the augmented market path is constructed, how the expected signature is computed, or how Sig-REINFORCE differs from standard policy-gradient methods, making it impossible to evaluate the algorithm's correctness or novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on our manuscript. The concerns focus on the level of detail in the abstract. We respond point-by-point below, noting that the full derivations, constructions, and comparisons appear in the body of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that signature linearization reduces the mean-variance market-making objective to a pseudo-linear optimization is asserted without any derivation steps, error analysis, or numerical evidence. This prevents verification of whether the linearization accurately captures the objective for the chosen arrival processes.

    Authors: The abstract is a high-level summary by design. The derivation steps showing how signature linearization reduces the mean-variance objective to pseudo-linear optimization over expected signatures of the augmented path are given in full in Section 3, with explicit treatment of both Poisson and Hawkes arrival processes. Error analysis appears in Section 4 and numerical verification (including outperformance versus PPO) is reported in Section 5. We are willing to add a short reference to these sections in a revised abstract. revision: partial

  2. Referee: [Abstract] Abstract: No details are given on how the augmented market path is constructed, how the expected signature is computed, or how Sig-REINFORCE differs from standard policy-gradient methods, making it impossible to evaluate the algorithm's correctness or novelty.

    Authors: Construction of the augmented market path is defined in Section 2.2. Computation of the expected signature is derived in Section 3.1. The Sig-REINFORCE algorithm and its distinction from standard policy-gradient methods (via signature features that enable the pseudo-linear structure) are explained in Section 4.2. These sections supply the information needed to assess correctness and novelty; the abstract itself is not the appropriate location for such technical detail. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external signature techniques

full rationale

The abstract describes reducing the market-making problem via signature linearization to a pseudo-linear optimization over expected signatures, then applying Sig-REINFORCE. No equations, self-citations, or fitted inputs are shown that would reduce any claimed prediction or uniqueness result to the paper's own definitions or prior self-work by construction. The method is benchmarked against an independent PPO baseline under Poisson/Hawkes arrivals, indicating the core claim remains externally grounded rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven applicability of signature linearization to the market-making objective; no free parameters or invented entities are visible in the abstract.

axioms (1)
  • domain assumption Signature linearization techniques reduce the market-making problem to a pseudo-linear optimization over the expected signature of an augmented market path.
    Directly stated as the enabling step in the abstract.

pith-pipeline@v0.9.1-grok · 5607 in / 1083 out tokens · 27592 ms · 2026-06-26T16:40:19.084416+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 4 linked inside Pith

  1. [1]

    Abi Jaber and L.-A

    E. Abi Jaber and L.-A. G´ erard. Signature volatility models: pricing and hedging with Fourier. Preprint, arXiv:2402.01820,

  2. [2]

    Abi Jaber and D

    E. Abi Jaber and D. Sotnikov. Exponentially fading memory signature.Preprint, arXiv:2507.03700,

  3. [3]

    Abi Jaber, L.-A

    E. Abi Jaber, L.-A. G´ erard, and Y. Huang. Path-dependent processes from signatures.Preprint, arXiv:2407.04956,

  4. [4]

    Abi Jaber, P

    E. Abi Jaber, P. Gassiat, and D. Sotnikov. Martingale property and moment explosions in signature volatility models.Preprint, arXiv:2503.17103, 2025a. E. Abi Jaber, D. Hainaut, and E. Motte. Signature approach for pricing and hedging path-dependent options with frictions.Preprint, arXiv:2511.23295, 2025b. E. Akyildirim, M. Gambara, J. Teichmann, and S. Z....

  5. [5]

    Baldacci, P

    B. Baldacci, P. Bergault, and O. Gu´ eant. Algorithmic market making for options.Quantitative Finance, 21(1):85–97, 2021a. B. Baldacci, D. Possama¨ ı, and M. Rosenbaum. Optimal make-take fees in a multi market-maker environment.SIAM Journal on Financial Mathematics, 12(1):446–486, 2021b. F. M. Bandi, R. Ren` o, and S. Svaluto-Ferro. Local signature-based ...

  6. [6]

    Barucci, A

    20 E. Barucci, A. Mathieu, and L. S´ anchez-Betancourt. Market making with fads, informed, and uninformed traders.Preprint, arXiv:2501.03658,

  7. [7]

    F. E. Benth, F. A. Harang, and F. Straum. Universal approximation on non-geometric rough paths and applications to financial derivatives pricing.Preprint, arXiv:2412.16009,

  8. [8]

    Bloch, S

    A. Bloch, S. N. Cohen, T. Lyons, J. Mouterde, and B. Walker. The exponentially weighted signature.Preprint, arXiv:2603.19198,

  9. [9]

    Cuchiero, X

    C. Cuchiero, X. Guo, and F. Primavera. Funtional Itˆ o-formula and Taylor expansion of non-anticipative maps of rough paths.Preprint, arXiv:2504.06164, 2025a. C. Cuchiero, F. Primavera, and S. Svaluto-Ferro. Universal approximation theorems for continuous functions of c` adl` ag paths and L´ evy-type signature models.Finance and Stochastics, 29:289–342, 2...

  10. [10]

    Drobac, M

    N. Drobac, M. Br´ eg` ere, J. De Vilmarest, and O. Wintenberger. Sliding-window signatures for time series: Application to electricity demand forecasting.Preprint, arXiv:2510.12337,

  11. [11]

    Dupire and V

    B. Dupire and V. Tissot-Daguette. Functional Expansions.Preprint, arXiv:2212.13628,

  12. [12]

    Futter, B

    O. Futter, B. Horvath, and M. Wiese. Signature Trading: A Path-Dependent Extension of the Mean-Variance Framework with Exogenous Signals.Preprint, arXiv:2308.15135,

  13. [13]

    Graf and T

    J. Graf and T. Mastrolia. Learning Market Making with Closing Auctions.Preprint, arXiv:2601.17247,

  14. [14]

    H. Gu, X. Guo, T. L. Jacobs, P. Kaminsky, and X. Li. Transportation marketplace rate forecast using signature transform.Preprint, arXiv:2401.04857,

  15. [15]

    ISSN 1862-9660. P. P. Hager, F. N. Harang, L. Pelizzari, and S. Tindel. The Volterra signature.Preprint, arXiv:2603.04525,

  16. [16]

    Lee and H

    D. Lee and H. Oberhauser. The signature kernel.Preprint, arXiv:2305.04625,

  17. [17]

    Pannier and C

    A. Pannier and C. Salvi. A path-dependent PDE solver based on signature kernels.Preprint, arXiv:2403.11738,

  18. [18]

    Rosenbaum and J

    M. Rosenbaum and J. Zhang. Multi-asset market making under the quadratic rough heston. Preprint, arXiv:2212.10164,

  19. [19]

    Schulman, F

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.Preprint, arXiv:1707.06347,