pith. sign in

arxiv: 2410.17426 · v4 · submitted 2024-10-22 · 💻 cs.DS · math.PR

A Unified Construction of Streaming Sketches via the L\'evy-Khintchine Representation Theorem

Pith reviewed 2026-05-23 19:23 UTC · model grok-4.3

classification 💻 cs.DS math.PR
keywords streaming sketchesf-moment estimationLévy processesLévy-Khintchine theoremturnstile modelcharacteristic exponentsunified constructionnearly periodic functions
0
0 comments X

The pith

Hashing indices to Lévy processes produces streaming sketches for any f-moment whose f is a characteristic exponent, as classified by the Lévy-Khintchine theorem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a generic scheme for building small-space sketches in the turnstile streaming model that estimate sums of the form sum f(x(v)) by hashing each index v to a Lévy process and taking f to be that process's characteristic exponent. The Lévy-Khintchine representation theorem supplies the precise form of every possible characteristic exponent and thereby identifies exactly which f-moments the scheme can handle. This single construction recovers many known sketches as special cases and shows that a large family of nearly periodic functions, previously unclassified, are now tractable. The same idea extends immediately to vector-valued streams by using multidimensional Lévy processes and to heterogeneous moments by assigning different processes to different indices. A reader would care because the approach replaces a collection of ad-hoc constructions with a uniform probabilistic mechanism whose scope is delimited by a classical theorem of probability.

Core claim

The central claim is that any f arising as the characteristic exponent of a Lévy process yields an f-moment that admits an efficient streaming sketch obtained simply by hashing each stream index to an independent copy of that Lévy process; the Lévy-Khintchine theorem then completely characterizes the admissible f by giving their integral representation in terms of a drift, a Gaussian coefficient, and a Lévy measure. The resulting sketches work in the d-dimensional turnstile model and can be made heterogeneous by drawing different Lévy processes for different indices.

What carries the argument

Hashing each index to a Lévy process whose characteristic exponent is the desired f, so that the sketch aggregates independent realizations whose expectation or variance yields an estimator of the sum; the Lévy-Khintchine theorem supplies the exact functional form of every admissible f.

If this is right

  • Many existing sketches for specific moments arise as direct instances of the generic Lévy-process construction.
  • A broad class of nearly periodic functions, not previously known to be tractable, now admit efficient sketches.
  • The scheme extends verbatim to d-dimensional streams via multidimensional Lévy processes.
  • Heterogeneous moments are obtained by projecting different indices onto different Lévy processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Fourier-Hahn-Lévy method conjectured in the paper may furnish a complete, parameter-free classification of all tractable f.
  • Choosing Lévy measures with particular tail or periodicity properties could systematically generate sketches for new families of moment functions.
  • The same hashing-to-process idea might be testable on other stochastic-process representations that admit characteristic exponents.

Load-bearing premise

Indices can be hashed to Lévy processes in the streaming model so that the sketch produces an unbiased or low-variance estimator of the characteristic exponent without further assumptions on the input distribution or on the ability to simulate the processes efficiently.

What would settle it

Exhibit an explicit function f that possesses a Lévy-Khintchine representation yet admits no small-space turnstile sketch, or conversely a function outside every Lévy-Khintchine form that nonetheless possesses such a sketch.

Figures

Figures reproduced from arXiv: 2410.17426 by Dingyu Wang, Seth Pettie.

Figure 1
Figure 1. Figure 1: Lévy-Tower with m = 3 and x = (1, 0, . . . , 0). From left to right: linear drift, Cauchy process, Brownian motion, Poisson process with rate 1, Poisson process with rate 2. Different Lévy processes have different “sensitivities” for a target function-moment. For example, linear drift is only sensitive to the sum of the vector and insensitive to how the values are distributed. The Cauchy process is only se… view at source ↗
Figure 2
Figure 2. Figure 2: Left: fL0,32(x) = P31 j=1 1 32 (1 − cos(2πjx/32)). The black dots mark the values at integer xs. Right: The jump distribution of the compound Poisson process X with characteristic exponent fL0,32. The subsampling and uniform random projection tricks used in [16] are recovered from computing the corresponding Lévy process. 5 Tractability and Lévy Processes A function f : Z → R is tractable [4] if the f-mome… view at source ↗
Figure 3
Figure 3. Figure 3: Left: gnp,5(x) = P31 j=1 2 2τ(j)+1+1 1536 (1 − cos(2πjx/32)). The black dots mark the values at integer xs. Right: The jump distribution of the compound Poisson process X with characteristic exponent gnp,5. Such nearly periodic functions do not fit in the L2-heavy-hitter based framework in [4]. Nevertheless, one may compute the corresponding Lévy process and apply the Lévy-Tower. 18 [PITH_FULL_IMAGE:figur… view at source ↗
read the original abstract

In the $d$-dimensional turnstile streaming model, a frequency vector $\mathbf{x}=(\mathbf{x}(1),\ldots,\mathbf{x}(n))\in (\mathbb{R}^d)^n$ is updated entry-wisely over a stream. We consider the problem of $f$-moment estimation, where one wants to estimate $$f(\mathbf{x})=\sum_{v\in[n]}f(\mathbf{x}(v))$$ with a small-space sketch. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to L\'evy processes, from which one can estimate the $f$-moment $f(\mathbf{x})$ where $f$ is the characteristic exponent of the L\'evy process. The fundamental L\'evy-Khintchine representation theorem completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of $f$-moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases ($d\geq 2$) by considering multidimensional L\'evy processes and can be further generalized to estimate heterogeneous moments by projecting different indices with different L\'evy processes. We conjecture that the set of tractable functions can be characterized using the L\'evy-Khintchine representation theorem via what we called the Fourier-Hahn-L\'evy method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a generic construction for sketches estimating f-moments f(x) = sum_v f(x(v)) in the d-dimensional turnstile streaming model. Indices are hashed to Lévy processes whose characteristic exponent is the target f; the Lévy-Khintchine theorem is invoked to characterize the admissible f. The scheme is claimed to unify existing sketches, render many nearly periodic functions tractable, extend immediately to d ≥ 2 via multidimensional Lévy processes, and support heterogeneous moments by using distinct processes per index. A conjecture asserts that the set of tractable f is characterized by the Lévy-Khintchine representation via a “Fourier-Hahn-Lévy method.”

Significance. If the reduction from an arbitrary Lévy triplet to a constant-time, unbiased, low-variance streaming update is supplied and verified, the work would supply a single probabilistic lens that explains the design of many known sketches and predicts new ones. The appeal to a fundamental theorem rather than ad-hoc parameters is a conceptual strength; the multidimensional and heterogeneous extensions follow naturally once the scalar case is settled.

major comments (3)
  1. [Abstract] Abstract and conjecture paragraph: the claim that “hashing indices to Lévy processes” yields an estimator of f(x(v)) whose expectation recovers the desired moment is stated without an explicit reduction showing how an arbitrary Lévy triplet (drift, Brownian coefficient, Lévy measure) produces a constant-time update rule and an unbiased statistic in the turnstile model for arbitrary real-valued updates.
  2. [Conjecture] The paragraph introducing the Fourier-Hahn-Lévy method: no formal definition of the method, no statement of the precise mapping from Lévy triplets to estimators, and no proof sketch or variance calculation are supplied, rendering the conjecture unverifiable from the given text.
  3. [Abstract] The unification claim: while the abstract asserts that the scheme “unifies the construction of many existing sketches,” no concrete example is worked out (e.g., which existing sketch corresponds to which Lévy triplet and how the update matches the known algorithm).
minor comments (2)
  1. Notation: the vector x is written both as x ∈ (R^d)^n and as x(v) ∈ R^d; a single consistent definition would remove ambiguity.
  2. [Abstract] The phrase “nearly periodic functions that were previously unclassified” is used without citing the prior classification attempts or the precise sense in which the new functions become tractable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We will revise the manuscript to provide the explicit details requested, strengthening the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract and conjecture paragraph: the claim that “hashing indices to Lévy processes” yields an estimator of f(x(v)) whose expectation recovers the desired moment is stated without an explicit reduction showing how an arbitrary Lévy triplet (drift, Brownian coefficient, Lévy measure) produces a constant-time update rule and an unbiased statistic in the turnstile model for arbitrary real-valued updates.

    Authors: We agree that the manuscript would benefit from an explicit reduction. In the revised version, we will add a dedicated subsection detailing how any Lévy triplet can be used to construct a constant-time update rule that produces an unbiased estimator for the corresponding f-moment in the turnstile model, including handling of arbitrary real-valued updates. This will make the connection between the Lévy process and the streaming sketch fully rigorous and verifiable. revision: yes

  2. Referee: [Conjecture] The paragraph introducing the Fourier-Hahn-Lévy method: no formal definition of the method, no statement of the precise mapping from Lévy triplets to estimators, and no proof sketch or variance calculation are supplied, rendering the conjecture unverifiable from the given text.

    Authors: The conjecture is presented at a high level in the current manuscript. We will revise to include a formal definition of the Fourier-Hahn-Lévy method, the precise mapping from Lévy triplets to the corresponding estimators, a proof sketch for the characterization, and initial variance bounds. This will allow readers to verify the conjecture's scope. revision: yes

  3. Referee: [Abstract] The unification claim: while the abstract asserts that the scheme “unifies the construction of many existing sketches,” no concrete example is worked out (e.g., which existing sketch corresponds to which Lévy triplet and how the update matches the known algorithm).

    Authors: We acknowledge the need for concrete examples to illustrate the unification. The revised manuscript will include a new section or subsection providing specific mappings, for instance, showing how the AMS sketch or Count-Sketch corresponds to particular choices of Lévy triplets (e.g., stable distributions or Brownian motion components), and demonstrating that the resulting update rules coincide with the standard implementations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; relies on external Lévy-Khintchine theorem

full rationale

The paper presents a generic sketching scheme based on hashing to Lévy processes whose characteristic exponents are given by the external Lévy-Khintchine representation theorem. The abstract explicitly invokes this known theorem from probability theory to characterize the admissible f functions, without any self-citation, fitted parameters, or self-referential definitions that reduce the claimed result to its inputs. No equations or steps in the provided text exhibit a reduction by construction, and the conjecture is presented as such rather than as a derived claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on the standard Lévy-Khintchine theorem and introduces one conjectural entity; no free parameters are mentioned.

axioms (1)
  • standard math The Lévy-Khintchine representation theorem completely characterizes all possible characteristic exponents of Lévy processes.
    Invoked in the abstract to delimit the set of estimable f-moments.
invented entities (1)
  • Fourier-Hahn-Lévy method no independent evidence
    purpose: To characterize the full set of tractable functions for sketching
    Conjectured at the end of the abstract; no independent evidence or construction is supplied.

pith-pipeline@v0.9.0 · 5816 in / 1252 out tokens · 29387 ms · 2026-05-23T19:23:13.301697+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

  1. [1]

    The space complexity of approximating the frequency moments.J

    Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments.J. Comput. Syst. Sci., 58(1):137–147, 1999

  2. [2]

    Jayram, Ravi Kumar, and D

    Ziv Bar-Yossef, Thathachar S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity.Journal of Computer and System Sciences, 68(4):702–732, 2004

  3. [3]

    Universal sketches for the frequency negative moments and other decreasing streaming sums.Approximation, Randomization, and Combi- natorial Optimization

    Vladimir Braverman and Stephen R Chestnut. Universal sketches for the frequency negative moments and other decreasing streaming sums.Approximation, Randomization, and Combi- natorial Optimization. Algorithms and Techniques, page 591, 2015

  4. [4]

    Chestnut, David P

    Vladimir Braverman, Stephen R. Chestnut, David P. Woodruff, and Lin F. Yang. Streaming space complexity of nearly all functions of one variable on frequency vectors. InProceedings 35th ACM Symposium on Principles of Database Systems (PODS), pages 261–276, 2016

  5. [5]

    Zero-one frequency laws

    Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. InProceedings 42nd ACM Symposium on Theory of Computing (STOC), pages 281–290, 2010

  6. [6]

    Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams

    Vladimir Braverman and Rafail Ostrovsky. Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams. In Proceedings 16th International Workshop on Approximation, Randomization, and Combinatorial Optimization (APPROX), volume 8096 ofLecture Notes in Computer Science, pages 58–70. Springer, 2013

  7. [7]

    PhD thesis, Johns Hopkins University, 2015

    Stephen Robert Chestnut.Stream sketches, sampling, and sabotage. PhD thesis, Johns Hopkins University, 2015

  8. [8]

    Comparing data streams using Hamming norms (how to zero in)

    Graham Cormode, Mayur Datar, Piotr Indyk, and Shanmugavelayutham Muthukrishnan. Comparing data streams using Hamming norms (how to zero in). IEEE Transactions on Knowledge and Data Engineering, 15(3):529–540, 2003

  9. [9]

    A unifying framework forℓ0-sampling algorithms

    Graham Cormode and Donatella Firmani. A unifying framework forℓ0-sampling algorithms. Distributed Parallel Databases, 32(3):315–335, 2014

  10. [10]

    Cambridge University Press, 2019

    Rick Durrett.Probability: theory and examples. Cambridge University Press, 2019

  11. [11]

    HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

    Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. InProceedings of the 18th Inter- national Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA), pages 127–146, 2007

  12. [12]

    Nigel Martin

    Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base appli- cations. Journal of Computer and System Sciences, 31(2):182–209, 1985

  13. [13]

    Estimating frequency moments of data streams using random linear com- binations

    Sumit Ganguly. Estimating frequency moments of data streams using random linear com- binations. In Proceedings 8th International Workshop on Randomization and Computation (RANDOM), volume 3122 ofLecture Notes in Computer Science, pages 369–380, 2004

  14. [14]

    Estimating hybrid frequency moments of data streams

    Sumit Ganguly, Mohit Bansal, and Shruti Dube. Estimating hybrid frequency moments of data streams. Journal of Combinatorial Optimization, 23(3):373–394, 2012. 21

  15. [15]

    Stable distributions, pseudorandom generators, embeddings, and data stream computation

    Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307–323, 2006

  16. [16]

    Kane, Jelani Nelson, and David P

    Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings 29th ACM Symposium on Principles of Database Systems (PODS), pages 41–52, 2010

  17. [17]

    Kevin J. Lang. Back to the future: an even more nearly optimal cardinality estimation algo- rithm. CoRR, abs/1708.06839, 2017

  18. [18]

    Estimators and tail bounds for dimension reduction inlα (0 < α ≤ 2) using stable ran- dom projections

    Ping Li. Estimators and tail bounds for dimension reduction inlα (0 < α ≤ 2) using stable ran- dom projections. InProceedings 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 10–19, 2008

  19. [19]

    Nguyen, and David P

    Yi Li, Huy L. Nguyen, and David P. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In Proceedings 46th Annual ACM Symposium on Theory of Computing (STOC), pages 174–183, 2014

  20. [20]

    Counting large number of events in small registers.Communications of the ACM, 21(10):840–842, 1978

    Robert Morris. Counting large number of events in small registers.Communications of the ACM, 21(10):840–842, 1978

  21. [21]

    Ian Munro and Mike Paterson

    J. Ian Munro and Mike Paterson. Selection and sorting with limited storage.Theor. Comput. Sci., 12:315–323, 1980

  22. [22]

    Information theoretic limits of cardinality estimation: Fisher meets Shannon

    Seth Pettie and Dingyu Wang. Information theoretic limits of cardinality estimation: Fisher meets Shannon. In Proceedings 53rd Annual ACM Symposium on Theory of Computing (STOC), pages 556–569, 2021

  23. [23]

    Cambridge University Press, 1999

    Ken-Iti Sato.Lévy processes and infinitely divisible distributions. Cambridge University Press, 1999

  24. [24]

    Better cardinality estimators for HyperLogLog, PCSA, and beyond

    Dingyu Wang and Seth Pettie. Better cardinality estimators for HyperLogLog, PCSA, and beyond. In Proceedings 42nd ACM Symposium on Principles of Database Systems (PODS), pages 317–327, 2023. A Analysis of gnp,w Proof of Lemma 7.This proof is by induction onw. It is true forw = 1 since gnp,1(x) = 2 + 1 3 · 2 (1 − cos(πx)) = 1 (2 ∤ x) . Observe that forx su...