Faster Linear-Space Data Structures for Path Frequency Queries

Ovidiu Rata

arxiv: 2604.18667 · v1 · submitted 2026-04-20 · 💻 cs.DS

Faster Linear-Space Data Structures for Path Frequency Queries

Ovidiu Rata This is my paper

Pith reviewed 2026-05-10 03:19 UTC · model grok-4.3

classification 💻 cs.DS

keywords path queriesfrequency queriestreesdata structuresmode querieslinear spaceword-RAM model

0 comments

The pith

Linear-space data structures answer path mode and least frequent element queries on trees in O(sqrt(n/w)) time after O(n sqrt(n w)) preprocessing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first linear-space data structures for several frequency queries on paths in a tree. It achieves O(sqrt(n/w)) query time for path mode and path least frequent element problems after O(n sqrt(n w)) preprocessing time, beating the earlier O(log log n sqrt(n/w)) bound. A randomized method also improves path alpha-minority queries to O(1/alpha) time. The same approach yields the first linear-space support for Path Maximum g-value Color queries under the same bounds. These results apply in the word-RAM model with word size at least logarithmic in n.

Core claim

We present the first linear-space data structures, requiring O(n sqrt(n w)) preprocessing time, that can answer path mode and path least frequent element queries in O(sqrt(n/w)) time. For the path alpha-minority problem, where alpha is specified at query time, we reduce the query time of the linear-space data structure from O(alpha^{-1} log log n) down to O(alpha^{-1}) by employing a simple randomized algorithm with a success probability at least 1/2. We also present the first linear-space data structure supporting Path Maximum g-value Color queries in O(sqrt(n/w)) time, requiring O(n sqrt(n w)) preprocessing time.

What carries the argument

A general framework for Path Maximum g-value Color queries that reduces both path mode and path least frequent element problems to it.

If this is right

Path mode queries on trees can be answered in linear space with O(sqrt(n/w)) query time.
Path least frequent element queries achieve the same O(sqrt(n/w)) query time and O(n sqrt(n w)) preprocessing.
Path alpha-minority queries run in O(1/alpha) time using linear space and a randomized procedure.
Path Maximum g-value Color queries receive the first linear-space solution with O(sqrt(n/w)) query time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The improved bounds may support repeated path statistics in large hierarchical datasets such as file systems or biological trees.
The randomization step for alpha-minority queries could be repeated a constant number of times to reach high probability without changing the asymptotic time.
Similar space-time tradeoffs might extend to labeled graphs or dynamic tree settings where paths are updated.
The dependence on word size w suggests that bit-parallel techniques are central to beating the log log n factor.

Load-bearing premise

The word-RAM model with word size w at least log n bits holds, and randomized algorithms for alpha-minority queries succeed with probability at least 1/2.

What would settle it

A concrete tree of size n together with a set of paths where any linear-space structure requires more than O(sqrt(n/w)) time to answer all path mode queries would disprove the claim.

read the original abstract

We present linear-space data structures for several frequency queries on trees, namely: path mode, path least frequent element, and path $\alpha$-minority queries. We present the first linear-space data structures, requiring $O(n \sqrt{nw})$ preprocessing time, that can answer path mode and path least frequent element queries in $O(\sqrt{n/w})$ time. This improves upon the best previously known bound of $O(\log\log n \sqrt{n/w})$ achieved by Durocher et al. in 2016. For the path $\alpha$-minority problem, where $\alpha$ is specified at query time, we reduce the query time of the linear-space data structure of Durocher et al. from $O(\alpha^{-1}\log\log n)$ down to $O(\alpha^{-1})$ by employing a simple randomized algorithm with a success probability $\geq 1/2$. We also present the first linear-space data structure supporting "Path Maximum $g$-value Color" queries in $O(\sqrt{n/w})$ time, requiring $O(n \sqrt{nw})$ preprocessing time. This general framework encapsulates both path mode and path least frequent element queries. For our data structures, we consider the word-RAM model with $w\in \Omega(\log n)$, where $w$ is the word size in bits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper improves path mode and least-frequent queries on trees to O(sqrt(n/w)) query time with linear space, removing the log log n factor from the 2016 bound.

read the letter

This paper improves the query time for path mode and path least frequent element queries on trees to O(sqrt(n/w)) while keeping linear space. It removes the log log n factor that was present in the best 2016 result by Durocher et al. The authors also provide a general framework that covers path maximum g-value color queries with matching bounds. For path alpha-minority queries, they reduce the time to O(1/alpha) with a randomized algorithm that succeeds with probability at least 1/2. The work is a straightforward extension of prior techniques, but the improvement is real and the presentation of the framework is clean. The bounds are explicit and the model assumptions are standard for word-RAM. The main limitation is the preprocessing time of O(n sqrt(n w)), which is superlinear and could be costly in practice. The randomized component is isolated and honestly stated, so it does not affect the main claims. No circular reasoning or hidden factors appear in the stated results. This is for researchers working on path queries in trees, especially in contexts like databases or string processing. Readers who track fine improvements in query times for frequency problems will get direct value from the new bounds. The paper shows clear thinking by building directly on the 2016 work and delivering better times without extra space. It deserves a serious referee. I would send this to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript presents the first linear-space data structures for path mode and path least frequent element queries on trees, achieving O(n √(n w)) preprocessing time and O(√(n/w)) query time in the word-RAM model with w ∈ Ω(log n). This improves the prior O(log log n √(n/w)) bound of Durocher et al. (2016). It further reduces query time for path α-minority queries (α given at query time) from O(α^{-1} log log n) to O(α^{-1}) via a simple randomized algorithm with success probability ≥ 1/2, and introduces a general framework supporting path maximum g-value color queries with the same bounds.

Significance. If the stated bounds and constructions hold, the work advances the state of the art for frequency-based path queries on trees by removing the log log n factor from query time while preserving strict linear space. The general framework that unifies mode, least-frequent, and g-value color queries is a methodological strength, and the explicit word-RAM parameterization with w makes the results directly comparable to prior work. These improvements are relevant to applications such as XML databases and network analysis where path aggregates must be answered quickly on large trees.

minor comments (3)

The preprocessing bound is written as O(n √(n w)); a parenthesized form O(n √(n w)) or explicit definition of the product inside the square root would eliminate any parsing ambiguity in the abstract and introduction.
The randomized α-minority algorithm is stated to succeed with probability ≥ 1/2; the manuscript should clarify whether a constant number of independent repetitions (to boost to high probability) affects the O(α^{-1}) query bound or is handled separately in the analysis.
The abstract and introduction would benefit from a one-sentence statement of the input representation assumed for the tree (e.g., adjacency-list or Euler-tour) and the precise definition of the g-value in the maximum g-value color problem.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. We appreciate the recognition that our results advance the state of the art by removing the log log n factor from the query time for path mode and least-frequent-element queries while maintaining linear space, as well as the value placed on the general g-value color framework.

Circularity Check

0 steps flagged

No significant circularity; algorithmic bounds derived independently

full rationale

The paper's core claims consist of explicit data structure constructions in the word-RAM model, with preprocessing time O(n sqrt(n w)) and query time O(sqrt(n/w)) obtained via tree decompositions, heavy-light techniques, and table lookups whose analysis is independent of any fitted parameters or self-referential definitions. The improvement over Durocher et al. (2016) is stated as a direct comparison of concrete bounds rather than a reduction to prior self-citations. The randomized alpha-minority result is isolated with an explicit success probability of at least 1/2 and does not underpin the deterministic mode/least-frequent bounds. No step equates a derived quantity to its own input by construction, renames a known empirical pattern, or imports uniqueness via author-overlapping citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Claims rest on standard theoretical CS assumptions for data structures; no free parameters, new entities, or ad-hoc axioms beyond the computational model.

axioms (1)

domain assumption Word-RAM model with word size w satisfying w = Omega(log n)
Standard model invoked for all time and space analysis in the abstract.

pith-pipeline@v0.9.0 · 5537 in / 1255 out tokens · 32856 ms · 2026-05-10T03:19:05.096625+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Theory of Computing Systems , volume=

Linear-space data structures for range mode query in arrays , author=. Theory of Computing Systems , volume=. 2014 , publisher=

work page 2014
[2]

Scandinavian Workshop on Algorithm Theory , pages=

Linear-space data structures for range minority query in arrays , author=. Scandinavian Workshop on Algorithm Theory , pages=. 2012 , organization=

work page 2012
[3]

Algorithmica , volume=

Linear-space data structures for range frequency queries on arrays and trees , author=. Algorithmica , volume=. 2016 , publisher=

work page 2016
[4]

Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11--14, 1999 Proceedings 6 , pages=

Resizable arrays in optimal time and space , author=. Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11--14, 1999 Proceedings 6 , pages=. 1999 , organization=

work page 1999
[5]

Nordic Journal of Computing , volume=

Range mode and range median queries on lists and trees , author=. Nordic Journal of Computing , volume=. 2005 , publisher=

work page 2005
[6]

STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science Dresden, Germany, February 15--17, 2001 Proceedings 18 , pages=

Efficient minimal perfect hashing in nearly minimal space , author=. STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science Dresden, Germany, February 15--17, 2001 Proceedings 18 , pages=. 2001 , organization=

work page 2001
[7]

European Symposium on Algorithms , pages=

Hash, displace, and compress , author=. European Symposium on Algorithms , pages=. 2009 , organization=

work page 2009
[8]

Fredman, János Komlós, and Endre Szemerédi

Fredman, Michael L. and Koml\'. Storing a Sparse Table with 0(1) Worst Case Access Time , year =. doi:10.1145/828.1884 , journal =

work page doi:10.1145/828.1884
[9]

Theoretical Computer Science , volume=

The level ancestor problem simplified , author=. Theoretical Computer Science , volume=. 2004 , publisher=

work page 2004
[10]

Journal of computer and System Sciences , volume=

Finding level-ancestors in trees , author=. Journal of computer and System Sciences , volume=. 1994 , publisher=

work page 1994
[11]

Workshop on Algorithms and Data Structures , pages=

Finding level-ancestors in dynamic trees , author=. Workshop on Algorithms and Data Structures , pages=. 1991 , organization=

work page 1991
[12]

SIAM Journal on Computing , volume=

Recursive star-tree parallel data structure , author=. SIAM Journal on Computing , volume=. 1993 , publisher=

work page 1993
[13]

Latin American Symposium on Theoretical Informatics , pages=

The LCA problem revisited , author=. Latin American Symposium on Theoretical Informatics , pages=. 2000 , organization=

work page 2000
[14]

SIAM Journal on Computing , volume=

Filtering search: A new approach to query-answering , author=. SIAM Journal on Computing , volume=. 1986 , publisher=

work page 1986
[15]

Array Range Queries

Skala, Matthew. Array Range Queries. Space-Efficient Data Structures, Streams, and Algorithms: Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. 2013. doi:10.1007/978-3-642-40273-9_21

work page doi:10.1007/978-3-642-40273-9_21 2013

[1] [1]

Theory of Computing Systems , volume=

Linear-space data structures for range mode query in arrays , author=. Theory of Computing Systems , volume=. 2014 , publisher=

work page 2014

[2] [2]

Scandinavian Workshop on Algorithm Theory , pages=

Linear-space data structures for range minority query in arrays , author=. Scandinavian Workshop on Algorithm Theory , pages=. 2012 , organization=

work page 2012

[3] [3]

Algorithmica , volume=

Linear-space data structures for range frequency queries on arrays and trees , author=. Algorithmica , volume=. 2016 , publisher=

work page 2016

[4] [4]

Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11--14, 1999 Proceedings 6 , pages=

Resizable arrays in optimal time and space , author=. Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11--14, 1999 Proceedings 6 , pages=. 1999 , organization=

work page 1999

[5] [5]

Nordic Journal of Computing , volume=

Range mode and range median queries on lists and trees , author=. Nordic Journal of Computing , volume=. 2005 , publisher=

work page 2005

[6] [6]

STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science Dresden, Germany, February 15--17, 2001 Proceedings 18 , pages=

Efficient minimal perfect hashing in nearly minimal space , author=. STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science Dresden, Germany, February 15--17, 2001 Proceedings 18 , pages=. 2001 , organization=

work page 2001

[7] [7]

European Symposium on Algorithms , pages=

Hash, displace, and compress , author=. European Symposium on Algorithms , pages=. 2009 , organization=

work page 2009

[8] [8]

Fredman, János Komlós, and Endre Szemerédi

Fredman, Michael L. and Koml\'. Storing a Sparse Table with 0(1) Worst Case Access Time , year =. doi:10.1145/828.1884 , journal =

work page doi:10.1145/828.1884

[9] [9]

Theoretical Computer Science , volume=

The level ancestor problem simplified , author=. Theoretical Computer Science , volume=. 2004 , publisher=

work page 2004

[10] [10]

Journal of computer and System Sciences , volume=

Finding level-ancestors in trees , author=. Journal of computer and System Sciences , volume=. 1994 , publisher=

work page 1994

[11] [11]

Workshop on Algorithms and Data Structures , pages=

Finding level-ancestors in dynamic trees , author=. Workshop on Algorithms and Data Structures , pages=. 1991 , organization=

work page 1991

[12] [12]

SIAM Journal on Computing , volume=

Recursive star-tree parallel data structure , author=. SIAM Journal on Computing , volume=. 1993 , publisher=

work page 1993

[13] [13]

Latin American Symposium on Theoretical Informatics , pages=

The LCA problem revisited , author=. Latin American Symposium on Theoretical Informatics , pages=. 2000 , organization=

work page 2000

[14] [14]

SIAM Journal on Computing , volume=

Filtering search: A new approach to query-answering , author=. SIAM Journal on Computing , volume=. 1986 , publisher=

work page 1986

[15] [15]

Array Range Queries

Skala, Matthew. Array Range Queries. Space-Efficient Data Structures, Streams, and Algorithms: Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. 2013. doi:10.1007/978-3-642-40273-9_21

work page doi:10.1007/978-3-642-40273-9_21 2013