pith. sign in

arxiv: 2603.04525 · v2 · pith:2EZHO36Onew · submitted 2026-03-04 · 📊 stat.ML · cs.LG

The Volterra signature

Pith reviewed 2026-05-22 11:31 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Volterra signaturepath signatureuniversal approximationkernel methodstensor algebranon-Markovian time seriesinjectivity
0
0 comments X

The pith

The Volterra signature is an injective kernel-weighted feature map that yields universal approximation by linear functionals on path space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Volterra signature by weighting an input path with a temporal kernel and developing the result into the tensor algebra. It uses the Volterra-Chen identity to prove injectivity under path augmentation, which implies a universal approximation theorem for functions on infinite-dimensional path space. In certain cases this approximation is achieved simply by linear functionals of the signature. The construction also supports a kernel trick via a two-parameter integral equation for the inner product and reduces to a linear ODE for exponential kernels while remaining invariant to time reparameterization.

Core claim

By embedding the kernel-weighted input path into the tensor algebra, the Volterra signature satisfies the Volterra-Chen identity and thereby establishes injectivity on augmented paths together with a universal approximation theorem on path space that linear functionals attain in some cases.

What carries the argument

The Volterra signature VSig(x;K) defined via development of the kernel-weighted path into the tensor algebra, together with the Volterra-Chen identity that supplies the injectivity and approximation guarantees.

If this is right

  • Linear functionals of the Volterra signature achieve universal approximation on path space for certain kernels.
  • The inner product between Volterra signatures is given by a closed two-parameter integral equation that permits PDE-based numerical methods.
  • For exponential-type kernels the signature evolves according to a linear state-space ODE in the tensor algebra.
  • The signature remains invariant under time reparameterization.
  • It improves performance relative to classical path signature baselines on dynamic learning tasks with real and synthetic data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the admissible kernels beyond the exponential family that admit ODE representations could enlarge the range of usable temporal weightings.
  • The injectivity result may clarify identifiability questions arising in other signature-based models for time series.
  • Combining the ODE representation with numerical schemes for the integral equation could produce efficient long-horizon implementations.

Load-bearing premise

The Volterra-Chen identity and the injectivity and approximation results that follow from it hold for the selected class of temporal kernels K.

What would settle it

Exhibiting two distinct augmented paths whose Volterra signatures coincide for a fixed kernel K would disprove the injectivity statement.

Figures

Figures reproduced from arXiv: 2603.04525 by Fabian N. Harang, Luca Pelizzari, Paul P. Hager, Samy Tindel.

Figure 4.1
Figure 4.1. Figure 4.1: Volterra signature vs. classical signature expansions, com￾pared with the fractional SDE solution (4.2) for one testing sample. The models were trained with M = 900 training samples and N = 500 time￾steps on [0, 1] Parameters: Y0 = 1, b0 = 0, b1 = −1, σ0 = 1, σ1 = 0.5, signature truncation L = 5. . • Sig (K = Id): Expanding classical SDE solution with iterated integrals of time￾augmented Brownian motion … view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Next-day forecast of realized S&P 500 volatility using our method VSig, compared with the benchmark HAR, on the test set. The lower subplot shows the absolute forecast errors |yb− y| for both methods, where y denotes the realized volatility. of (4.1) when using the Volterra signature features VSig(ˆx; kλ,α). Here, the parameters (λ1, λ2, α1, α2) are treated as hyperparameters and are meant to capture lon… view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Left: Coefficient of determination R2 as a (linearly interpo￾lated) function of the past-window size p (days), reported on the training and test sets for the methods VSig and Sig, and the (constant) benchmark HAR. Right: Scatter plots of realized volatility y versus predictions yb for our method VSig and the HAR benchmark. where xˆ|Wp i denotes the piecewise linear interpolation of xˆ restricted to the p… view at source ↗
read the original abstract

Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the \emph{Volterra signature} $\mathrm{VSig}(x;K)$ as a principled, explicit feature representation for history-dependent systems. By developing the input path $x$ weighted by a temporal kernel $K$ into the tensor algebra, we leverage the associated Volterra--Chen identity to derive rigorous learning-theoretic guarantees. Specifically, we prove an \emph{injectivity} statement (identifiability under augmentation) that leads to a \emph{universal approximation} theorem on the infinite dimensional path space, which in certain cases is achieved by \emph{linear functionals} of $\mathrm{VSig}(x;K)$. Moreover, we demonstrate applicability of the \emph{kernel trick} by showing that the inner product associated with Volterra signatures admits a closed characterization via a two-parameter integral equation, enabling numerical methods from PDEs for computation. For a large class of exponential-type kernels, $\mathrm{VSig}(x;K)$ solves a linear state-space ODE in the tensor algebra. Combined with inherent invariance to time reparameterization, these results position the Volterra signature as a robust, computationally tractable feature map for data science. We demonstrate its efficacy in dynamic learning tasks on real and synthetic data, where it consistently improves classical path signature baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces the Volterra signature VSig(x;K), obtained by developing an input path x weighted by a temporal kernel K into the tensor algebra. It uses the Volterra-Chen identity to establish an injectivity result (identifiability under augmentation) that implies a universal approximation theorem on infinite-dimensional path space, with linear functionals of VSig(x;K) sufficing in some cases. Additional results include a closed-form characterization of the associated inner product via a two-parameter integral equation (enabling PDE-based computation), a linear state-space ODE representation in the tensor algebra for a large class of exponential-type kernels, and time-reparameterization invariance. The method is positioned as an explicit, interpretable feature map for history-dependent systems and is tested on dynamic learning tasks against path-signature baselines.

Significance. If the injectivity and approximation results hold with the stated scope, the work supplies a principled explicit alternative to implicit memory mechanisms in recurrent models or neural CDEs, together with computational tools (ODE/PDE) and invariance properties that could aid long-horizon time-series tasks. The explicit link between kernel choice, linear ODEs, and approximation guarantees is a potential strength for interpretability.

major comments (1)
  1. [Abstract and kernel-class definition] Abstract and the section defining the kernel class: the injectivity, Volterra-Chen identity, and universal approximation theorems are asserted for 'a large class of exponential-type kernels' that admit a linear ODE representation, yet no explicit necessary and sufficient conditions on K (analyticity, decay rate, positivity, or other regularity) are supplied. Because the central claims rest on the identity holding inside this class, the lack of a precise delineation is load-bearing for the scope of the identifiability and approximation guarantees.
minor comments (1)
  1. [Experiments] The data experiments are summarized at a high level; adding concrete details on dataset sizes, exact baselines, hyper-parameter selection, and statistical significance testing would improve reproducibility and allow readers to assess the practical improvement over path signatures.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on the manuscript. We address the major comment below and have revised the paper to provide a clearer delineation of the kernel class.

read point-by-point responses
  1. Referee: [Abstract and kernel-class definition] Abstract and the section defining the kernel class: the injectivity, Volterra-Chen identity, and universal approximation theorems are asserted for 'a large class of exponential-type kernels' that admit a linear ODE representation, yet no explicit necessary and sufficient conditions on K (analyticity, decay rate, positivity, or other regularity) are supplied. Because the central claims rest on the identity holding inside this class, the lack of a precise delineation is load-bearing for the scope of the identifiability and approximation guarantees.

    Authors: We agree that the original presentation did not supply explicit necessary and sufficient conditions on K, which leaves the precise scope of the injectivity and approximation results somewhat implicit. In the revised manuscript we have added a dedicated subsection (Section 2.3) that characterizes the admissible kernels. A kernel K belongs to the class if and only if the associated Volterra operator admits a finite-dimensional linear state-space realization in the tensor algebra; this holds precisely when K is analytic in a neighborhood of the diagonal, exhibits exponential decay in |t-s|, and satisfies a positivity condition ensuring the induced inner product is positive semi-definite. Sufficient conditions are stated for kernels of the form K(t,s) = sum_{i=1}^m p_i(t) q_i(s) exp(lambda_i (t-s)) with analytic p_i, q_i and Re(lambda_i) < 0. We include concrete examples (standard exponential kernels, certain Matérn kernels) and discuss the minimal regularity on the path x required for convergence. The abstract and theorem statements have been updated to reference this characterization. These additions make the load-bearing assumptions explicit while preserving the original results for the delineated class. revision: yes

Circularity Check

0 steps flagged

No circularity: injectivity and approximation theorems rest on leveraged Volterra-Chen identity without reduction to inputs or self-citations.

full rationale

The paper derives its central injectivity statement and universal approximation theorem by leveraging the Volterra-Chen identity applied to the weighted tensor series VSig(x;K) for a class of exponential-type kernels admitting linear ODE representation. This identity is presented as an associated external tool rather than derived within the paper, and the proofs are stated as independent learning-theoretic guarantees. No equations reduce the claimed results to fitted parameters, self-definitions, or load-bearing self-citations; the kernel class is delimited by the identity's applicability without circular redefinition. Numerical experiments are separated from the theoretical claims, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the Volterra-Chen identity for the weighted signature, standard properties of the tensor algebra, and the existence of a suitable class of kernels that admit both the ODE representation and the injectivity result. No free parameters are fitted inside the theoretical statements; the kernel K is treated as a modeling choice.

axioms (2)
  • domain assumption The Volterra-Chen identity holds for the kernel-weighted lift into the tensor algebra.
    Invoked to derive injectivity and universal approximation from the construction.
  • domain assumption The chosen kernels belong to a class (exponential-type) for which the signature satisfies a linear state-space ODE.
    Required for the computational tractability claim.
invented entities (1)
  • Volterra signature VSig(x;K) no independent evidence
    purpose: Explicit feature representation for history-dependent paths
    New object defined by weighting the path with kernel K before lifting to tensor algebra.

pith-pipeline@v0.9.0 · 5805 in / 1599 out tokens · 30675 ms · 2026-05-22T11:31:19.650058+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Universal Approximation of Nonlinear Operators and Their Derivatives

    cs.LG 2026-05 unverdicted novelty 8.0

    Proves the first universal approximation theorems for k-times differentiable nonlinear operators between Banach spaces and their derivatives uniformly on compact sets in weighted Sobolev norms via encoder-decoder oper...

  2. Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

    stat.ML 2026-04 unverdicted novelty 7.0

    An off-model training architecture using explicit dominating laws and Radon-Nikodym weights enables adaptive learning for non-Markovian stochastic control, with non-asymptotic error bounds separating Monte Carlo and m...

  3. Computational aspects of the Volterra Signature

    math.NA 2026-05 unverdicted novelty 6.0

    Algorithms for Volterra signature computation achieve O(J^2), O(J log J) via FFT, and O(J R^2) via recursion, plus a predictor-corrector scheme, all implemented in a public JAX package.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · cited by 3 Pith papers · 4 internal anchors

  1. [1]

    Multifactor approximation of rough volatility models.SIAM journal on financial mathematics, 10(2):309–349, 2019

    Eduardo Abi Jaber and Omar El Euch. Multifactor approximation of rough volatility models.SIAM journal on financial mathematics, 10(2):309–349, 2019. 17, 38

  2. [2]

    Hedging with memory: shallow and deep learning with signatures

    Eduardo Abi Jaber and Louis-Amand Gérard. Hedging with memory: shallow and deep learning with signatures. 2025. 28

  3. [3]

    Volatility models in practice: Rough, path-dependent, or Markov- ian?Mathematical Finance, 2025

    Eduardo Abi Jaber and Shaun Li. Volatility models in practice: Rough, path-dependent, or Markov- ian?Mathematical Finance, 2025. 17

  4. [4]

    Branched Signature Model

    Munawar Ali and Qi Feng. Branched Signature Model. Preprint, arXiv:2511.00018 [math.NA] (2025),

  5. [5]

    Theory of reproducing kernels.Transactions of the American Mathematical So- ciety, 68(3):337–404, 1950

    Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical So- ciety, 68(3):337–404, 1950. 30

  6. [6]

    Berlin, Heidelberg : Springer-Verlag Berlin Heidelberg, 2011

    Hajer Bahouri, Jean-Yves Chemin, and Danchin Raphaël.Fourier Analysis and Nonlinear Partial Differential Equations, volume 343 ofGrundlehren der mathematischen Wissenschaften, A Series of Comprehensive Studies in Mathematics. Berlin, Heidelberg : Springer-Verlag Berlin Heidelberg, 2011. 28

  7. [7]

    Hager, Sebastian Riedel, and Tobias Nauen

    Peter Bank, Christian Bayer, Paul P. Hager, Sebastian Riedel, and Tobias Nauen. Stochastic control with signatures.SIAM Journal on Control and Optimization, 63(5):3189–3218, 2025. 28, 30

  8. [8]

    Barndorff-Nielsen, Fred Espen Benth, and Almut E

    Ole E. Barndorff-Nielsen, Fred Espen Benth, and Almut E. D. Veraart.Ambit Stochastics, volume 88 ofProbability Theory and Stochastic Modelling. Springer, Cham, 2018. 8

  9. [9]

    Markovian approximations of stochastic Volterra equations with the fractional kernel.Quantitative Finance, 23(1):53–70, 2023

    Christian Bayer and Simon Breneis. Markovian approximations of stochastic Volterra equations with the fractional kernel.Quantitative Finance, 23(1):53–70, 2023. 17, 38

  10. [10]

    Springer Finance

    Christian Bayer, Gonçalo dos Reis, Blanka Horvath, and Harald Oberhauser, editors.Signature Meth- ods in Finance: An Introduction with Computational Applications. Springer Finance. Springer, Cham,

  11. [11]

    eBook published 07 Nov 2025;©2026 Springer Nature. 2

  12. [12]

    Friz, Paul Gassiat, Jorg Martin, and Benjamin Stemper

    Christian Bayer, Peter K. Friz, Paul Gassiat, Jorg Martin, and Benjamin Stemper. A regularity structure for rough volatility.Math. Finance, 30(3):782–832, 2020. 3, 4

  13. [13]

    Optimal stopping with signatures.The Annals of Applied Probability, 33(1):238–273, 2023

    Christian Bayer, Paul P Hager, Sebastian Riedel, and John Schoenmakers. Optimal stopping with signatures.The Annals of Applied Probability, 33(1):238–273, 2023. 28

  14. [14]

    Pricing American options under rough volatility using deep-signatures and signature-kernels.arXiv preprint arXiv:2501.06758, 2025

    Christian Bayer, Luca Pelizzari, and Jia-Jie Zhu. Pricing American options under rough volatility using deep-signatures and signature-kernels.arXiv preprint arXiv:2501.06758, 2025. 28

  15. [15]

    Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994

    Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994. 1

  16. [16]

    Springer Finance

    Fred Espen Benth and Paul Krühner.Stochastic Models for Prices Dynamics in Energy and Com- modity Markets. Springer Finance. Springer, Cham, November 2023. 8

  17. [17]

    Prömel, and David Scheffels

    Martin Bergerhausen, David J. Prömel, and David Scheffels. Neural stochastic Volterra equations: Learning path-dependent dynamics.Journal of Machine Learning, 4(4):264–289, December 2025. 2

  18. [18]

    Berndt and James Clifford

    Donald J. Berndt and James Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, volume 10, pages 359–370, 1994. 30

  19. [19]

    The signature of a rough path: unique- ness.Advances in Mathematics, 293:720–737, 2016

    Horatio Boedihardjo, Xi Geng, Terry Lyons, and Danyu Yang. The signature of a rough path: unique- ness.Advances in Mathematics, 293:720–737, 2016. 23

  20. [20]

    Boyd and L

    S. Boyd and L. Chua. Fading memory and the problem of approximating nonlinear operators with Volterra series.IEEE Transactions on Circuits and Systems, 32(11):1150–1161, 1985. 1

  21. [21]

    Springer, New York, 2019

    Fred Brauer, Carlos Castillo-Chavez, and Zhilan Feng.Mathematical models in epidemiology, vol- ume 69 ofTexts in Applied Mathematics. Springer, New York, 2019. With a foreword by Simon Levin. 1

  22. [22]

    Ramification of Volterra-type rough paths.Electronic Journal of Probability, 28:1–25, 2023

    Yvain Bruned and Foivos Katsetsiadis. Ramification of Volterra-type rough paths.Electronic Journal of Probability, 28:1–25, 2023. 3, 4 THE VOLTERRA SIGNATURE 42

  23. [23]

    Burov and E

    S. Burov and E. Barkai. Fractional Langevin equation: overdamped, underdamped, and critical be- haviors.Phys. Rev. E (3), 78(3):031112, 18, 2008. 1

  24. [24]

    A survey of commodity markets and structural models for elec- tricity prices

    René Carmona and Michael Coulon. A survey of commodity markets and structural models for elec- tricity prices. InQuantitative energy finance, pages 41–83. Springer, New York, 2014. 1

  25. [25]

    Chan and Nuno Vasconcelos

    Antoni B. Chan and Nuno Vasconcelos. Probabilistic kernels for the classification of auto-regressive visual processes. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog- nition (CVPR’05), volume 1, pages 846–851. IEEE, 2005. 30

  26. [26]

    Kuo-TsaiChen.Iteratedintegralsandexponentialhomomorphisms.Proc. Lond. Math. Soc. (3), 4:502– 512, 1954. 8

  27. [27]

    Integration of paths, geometric invariants and a Generalized Baker–Hausdorff formula

    Kuo-Tsai Chen. Integration of paths, geometric invariants and a Generalized Baker–Hausdorff formula. Annals of Mathematics, 65(1):163–178, 1957. 2

  28. [28]

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differen- tial equations. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6572–6583, Red Hook, NY, USA, 2018. Curran Associates Inc. 2

  29. [29]

    Feature engineering with regularity struc- tures.J

    Ilya Chevyrev, Andris Gerasimovičs, and Hendrik Weber. Feature engineering with regularity struc- tures.J. Sci. Comput., 98(1):28, 2024. Id/No 13. 4

  30. [30]

    A primer on the signature method in machine learning.arXiv preprint arXiv:1603.03788, 2016

    Ilya Chevyrev and Andrey Kormilitzin. A primer on the signature method in machine learning.arXiv preprint arXiv:1603.03788, 2016. 24, 36

  31. [31]

    Comte and E

    F. Comte and E. Renault. Long memory continuous time models.Journal of Econometrics, 73(1):101– 149, 1996. 1

  32. [32]

    Corduneanu.Integral equations and applications

    C. Corduneanu.Integral equations and applications. Cambridge: Cambridge University Press, reprint of the 1991 hardback ed. edition, 2008. 2

  33. [33]

    A simple approximate long-memory model of realized volatility.Journal of financial econometrics, 7(2):174–196, 2009

    Fulvio Corsi. A simple approximate long-memory model of realized volatility.Journal of financial econometrics, 7(2):174–196, 2009. 39

  34. [34]

    Springer Nature Switzerland, Cham, January 2026

    Dan Crisan, Ilya Chevyrev, Thomas Cass, James Foster, Christian Litterer, and Cristopher Salvi, editors.Stochastic Analysis and Applications 2025: In Honour of Terry Lyons. Springer Nature Switzerland, Cham, January 2026. Hardcover. ISBN-10: 3032039134. xii+436 pp. eBook ISBN: 9783032039149. 2

  35. [35]

    Autoregressive Kernels For Time Series

    Marco Cuturi and Arnaud Doucet. Autoregressive kernels for time series.arXiv preprint arXiv:1101.0673, 2011. 30

  36. [36]

    Generalized iterated-sums signatures.J

    Joscha Diehl, Kurusch Ebrahimi-Fard, and Nikolas Tapia. Generalized iterated-sums signatures.J. Algebra, 632:801–824, 2023. 4

  37. [37]

    Jeffrey L. Elman. Finding structure in time.Cognitive Science, 14(2):179–211, 1990. 1

  38. [38]

    Friz and Nicolas B

    Peter K. Friz and Nicolas B. Victoir.Multidimensional stochastic processes as rough paths, volume 120 ofCambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2010. Theory and applications. 20

  39. [39]

    Volatility is rough.Quantitative Finance, 18(6):933–949, 2018

    Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. Volatility is rough.Quantitative Finance, 18(6):933–949, 2018. 1

  40. [40]

    Pricing and calibration in the 4-factor path-dependent volatility model.Quantitative Finance, 25(3):471–489, 2025

    Guido Gazzani and Julien Guyon. Pricing and calibration in the 4-factor path-dependent volatility model.Quantitative Finance, 25(3):471–489, 2025. 38

  41. [41]

    Kistler.Spiking Neuron Models: Single Neurons, Populations, Plasticity

    Wulfram Gerstner and Werner M. Kistler.Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge, 2002. 1

  42. [42]

    Number 34

    Gustaf Gripenberg, Stig-Olof Londen, and Olof Staffans.Volterra integral and functional equations. Number 34. Cambridge University Press, 1990. 2, 17

  43. [43]

    Volatility is (mostly) path-dependent.Quant

    Julien Guyon and Jordan Lekeufack. Volatility is (mostly) path-dependent.Quant. Finance, 23(9):1221–1258, 2023. 1, 17, 38, 39

  44. [44]

    Hager, Fabian N

    Paul P. Hager, Fabian N. Harang, Luca Pelizzari, and Samy Tindel. Computational aspects of the Volterra signature. arXiv:xxxx.xxxxx, March 2026. 3, 10, 17, 18, 37

  45. [45]

    A Wong-Zakai theorem for stochastic PDEs.J

    Martin Hairer and Étienne Pardoux. A Wong-Zakai theorem for stochastic PDEs.J. Math. Soc. Japan, 67(4):1551–1604, 2015. 3

  46. [46]

    Uniqueness for the signature of a path of bounded variation and the reduced path group.Ann

    Ben Hambly and Terry Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group.Ann. of Math. (2), 171(1):109–167, 2010. 2, 23

  47. [47]

    Harang, Fred Espen Benth, and Fride Straum

    Fabian A. Harang, Fred Espen Benth, and Fride Straum. Universal approximation on non-geometric rough paths and applications to financial derivatives pricing, 2024. 28

  48. [48]

    Harang and Samy Tindel

    Fabian A. Harang and Samy Tindel. Volterra equations driven by rough signals.Stochastic Process. Appl., 142:34–78, 2021. 3, 4, 7, 8, 10

  49. [49]

    Harang, Samy Tindel, and Xiaohua Wang

    Fabian A. Harang, Samy Tindel, and Xiaohua Wang. Volterra equations driven by rough signals 2: Higher-order expansions.Stoch. Dyn., 23(1):Paper No. 2350002, 50, 2023. 3, 4 THE VOLTERRA SIGNATURE 43

  50. [50]

    Oxford-man institute’s realized li- brary.Version 0.1, Oxford&Man Institute, University of Oxford, 2009

    Gerd Heber, Asger Lunde, Neil Shephard, and Kevin Sheppard. Oxford-man institute’s realized li- brary.Version 0.1, Oxford&Man Institute, University of Oxford, 2009. 38

  51. [51]

    Higham.Functions of matrices

    Nicholas J. Higham.Functions of matrices. Theory and computation. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 2008. 19

  52. [52]

    Long short-term memory.Neural Computation, 9(8):1735– 1780, 1997

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735– 1780, 1997. 1

  53. [53]

    Path-dependent processes from signa- tures, 2024

    Eduardo Abi Jaber, Louis-Amand Gérard, and Yuxing Huang. Path-dependent processes from signa- tures, 2024. 37, 38

  54. [54]

    Exponentially fading memory signature.arXiv preprint arXiv:2507.03700, 2025

    Eduardo Abi Jaber and Dimitri Sotnikov. Exponentially fading memory signature.arXiv preprint arXiv:2507.03700, 2025. 4

  55. [55]

    Michael I. Jordan. Serial order: A parallel distributed processing approach.Advances in psychology, 121:471–495, 1997. 1

  56. [56]

    Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang

    George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021. 1

  57. [57]

    James Kidger, James Morrill, Patrick T. P. Tang, and Terry Lyons. Neural controlled differential equations for irregular time series. InAdvances in Neural Information Processing Systems (NeurIPS),

  58. [58]

    Kernels for sequentially ordered data

    Franz J Király and Harald Oberhauser. Kernels for sequentially ordered data.arXiv preprint arXiv:1601.08169, 2016. 30

  59. [59]

    Király and Harald Oberhauser

    Franz J. Király and Harald Oberhauser. Kernels for sequentially ordered data.Journal of Machine Learning Research, 20(31):1–45, 2019. 2, 30, 33

  60. [60]

    Kloeden and Eckhard Platen

    Peter E. Kloeden and Eckhard Platen. Stratonovich and Itô stochastic Taylor expansions.Math. Nachr., 151(1):33–50, 1991. 37

  61. [61]

    S. C. Kou. Stochastic modeling in nanoscale biophysics: subdiffusion within proteins.Ann. Appl. Stat., 2(2):501–535, 2008. 1

  62. [62]

    W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson. On the self-similar nature of ethernet traffic (extended version).IEEE/ACM Transactions on Networking, 2(1):1–15, 1994. 1

  63. [63]

    Zachary C. Lipton. The mythos of model interpretability.Commun. ACM, 61(10):36–43, September

  64. [64]

    On a chen–fliess approximation for diffusion functionals

    Christian Litterer and Harald Oberhauser. On a chen–fliess approximation for diffusion functionals. Monatshefte für Mathematik, 175(4):577–593, 2014. 29

  65. [65]

    Terry J. Lyons. Differential equations driven by rough signals.Rev. Mat. Iberoam., 14(2):215–310,

  66. [66]

    Signature methods in machine learning.EMS Surv

    Andrew McLeod and Terry Lyons. Signature methods in machine learning.EMS Surv. Math. Sci., February 2025. Published online first (19 February 2025). 2

  67. [67]

    Moreno, Purdy Ho, and Nuno Vasconcelos

    Pedro J. Moreno, Purdy Ho, and Nuno Vasconcelos. A kullback-leibler divergence based kernel for svm classification in multimedia applications. InAdvances in Neural Information Processing Systems (NIPS), pages 1385–1392, 2003. 30

  68. [68]

    Birkhäuser Basel, Basel, 1993

    Bernt Øksendal and Tu-Sheng Zhang.The Stochastic Volterra Equation, pages 168–202. Birkhäuser Basel, Basel, 1993. 2

  69. [69]

    Stochastic Volterra equations with anticipating coefficients.Ann

    Etienne Pardoux and Philip Protter. Stochastic Volterra equations with anticipating coefficients.Ann. Probab., 18(4):1635–1655, 1990. 2

  70. [70]

    On the difficulty of training recurrent neural networks

    Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. InProceedings of the 30th International Conference on Machine Learning (ICML 2013), pages 1310–1318, 2013. 1

  71. [71]

    Volterra equations driven by semimartingales.Ann

    Philip Protter. Volterra equations driven by semimartingales.Ann. Probab., 13(2):519–530, 1985. 36

  72. [72]

    Karniadakis

    Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019. 1

  73. [73]

    The iisignature library: efficient calculation of iterated-integral signatures and log signatures

    Jeremy Reizenstein and Benjamin Graham. The iisignature library: efficient calculation of iterated- integral signatures and log signatures.arXiv preprint arXiv:1802.08252, 2018. 37

  74. [74]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1:206–215, 2019

    Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1:206–215, 2019. 1

  75. [75]

    The signature kernel is the solution of a goursat pde.SIAM Journal on Mathematics of Data Science, 3(3):873–899, 2021

    Cristopher Salvi, Thomas Cass, James Foster, Terry Lyons, and Weixin Yang. The signature kernel is the solution of a goursat pde.SIAM Journal on Mathematics of Data Science, 3(3):873–899, 2021. 2, 30, 31

  76. [76]

    Smola.Learning with Kernels: Support Vector Machines, Reg- ularization, Optimization, and Beyond

    Bernhard Schölkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Reg- ularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. 30 THE VOLTERRA SIGNATURE 44

  77. [77]

    Über die Approximation stetiger Funktionen durch lineare Aggregate von Potenzen.Math- ematische Annalen, 77(4):482–496, 1916

    Otto Szász. Über die Approximation stetiger Funktionen durch lineare Aggregate von Potenzen.Math- ematische Annalen, 77(4):482–496, 1916. 26

  78. [78]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, 2017. 1

  79. [79]

    Linformer: Self-Attention with Linear Complexity

    Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. Linformer: Self-attention with linear complexity.arXiv preprint arXiv:2006.04768, 2020. 1

  80. [80]

    Learning integral operators via neural integral equations.Nature Machine Intelligence, 6(9):1046–1062, September 2024

    EmanueleZappala, AntonioHenriquedeOliveiraFonseca, JosueOrtegaCaro, AndrewHenryMoberly, Michael James Higley, Jessica Cardin, and David van Dijk. Learning integral operators via neural integral equations.Nature Machine Intelligence, 6(9):1046–1062, September 2024. 2 Paul P. Hager, Department of Statistics and Operations Research, University of Vienna, Kol...