Faster Linear-Space Data Structures for Path Frequency Queries
Pith reviewed 2026-05-10 03:19 UTC · model grok-4.3
The pith
Linear-space data structures answer path mode and least frequent element queries on trees in O(sqrt(n/w)) time after O(n sqrt(n w)) preprocessing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present the first linear-space data structures, requiring O(n sqrt(n w)) preprocessing time, that can answer path mode and path least frequent element queries in O(sqrt(n/w)) time. For the path alpha-minority problem, where alpha is specified at query time, we reduce the query time of the linear-space data structure from O(alpha^{-1} log log n) down to O(alpha^{-1}) by employing a simple randomized algorithm with a success probability at least 1/2. We also present the first linear-space data structure supporting Path Maximum g-value Color queries in O(sqrt(n/w)) time, requiring O(n sqrt(n w)) preprocessing time.
What carries the argument
A general framework for Path Maximum g-value Color queries that reduces both path mode and path least frequent element problems to it.
If this is right
- Path mode queries on trees can be answered in linear space with O(sqrt(n/w)) query time.
- Path least frequent element queries achieve the same O(sqrt(n/w)) query time and O(n sqrt(n w)) preprocessing.
- Path alpha-minority queries run in O(1/alpha) time using linear space and a randomized procedure.
- Path Maximum g-value Color queries receive the first linear-space solution with O(sqrt(n/w)) query time.
Where Pith is reading between the lines
- The improved bounds may support repeated path statistics in large hierarchical datasets such as file systems or biological trees.
- The randomization step for alpha-minority queries could be repeated a constant number of times to reach high probability without changing the asymptotic time.
- Similar space-time tradeoffs might extend to labeled graphs or dynamic tree settings where paths are updated.
- The dependence on word size w suggests that bit-parallel techniques are central to beating the log log n factor.
Load-bearing premise
The word-RAM model with word size w at least log n bits holds, and randomized algorithms for alpha-minority queries succeed with probability at least 1/2.
What would settle it
A concrete tree of size n together with a set of paths where any linear-space structure requires more than O(sqrt(n/w)) time to answer all path mode queries would disprove the claim.
read the original abstract
We present linear-space data structures for several frequency queries on trees, namely: path mode, path least frequent element, and path $\alpha$-minority queries. We present the first linear-space data structures, requiring $O(n \sqrt{nw})$ preprocessing time, that can answer path mode and path least frequent element queries in $O(\sqrt{n/w})$ time. This improves upon the best previously known bound of $O(\log\log n \sqrt{n/w})$ achieved by Durocher et al. in 2016. For the path $\alpha$-minority problem, where $\alpha$ is specified at query time, we reduce the query time of the linear-space data structure of Durocher et al. from $O(\alpha^{-1}\log\log n)$ down to $O(\alpha^{-1})$ by employing a simple randomized algorithm with a success probability $\geq 1/2$. We also present the first linear-space data structure supporting "Path Maximum $g$-value Color" queries in $O(\sqrt{n/w})$ time, requiring $O(n \sqrt{nw})$ preprocessing time. This general framework encapsulates both path mode and path least frequent element queries. For our data structures, we consider the word-RAM model with $w\in \Omega(\log n)$, where $w$ is the word size in bits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first linear-space data structures for path mode and path least frequent element queries on trees, achieving O(n √(n w)) preprocessing time and O(√(n/w)) query time in the word-RAM model with w ∈ Ω(log n). This improves the prior O(log log n √(n/w)) bound of Durocher et al. (2016). It further reduces query time for path α-minority queries (α given at query time) from O(α^{-1} log log n) to O(α^{-1}) via a simple randomized algorithm with success probability ≥ 1/2, and introduces a general framework supporting path maximum g-value color queries with the same bounds.
Significance. If the stated bounds and constructions hold, the work advances the state of the art for frequency-based path queries on trees by removing the log log n factor from query time while preserving strict linear space. The general framework that unifies mode, least-frequent, and g-value color queries is a methodological strength, and the explicit word-RAM parameterization with w makes the results directly comparable to prior work. These improvements are relevant to applications such as XML databases and network analysis where path aggregates must be answered quickly on large trees.
minor comments (3)
- The preprocessing bound is written as O(n √(n w)); a parenthesized form O(n √(n w)) or explicit definition of the product inside the square root would eliminate any parsing ambiguity in the abstract and introduction.
- The randomized α-minority algorithm is stated to succeed with probability ≥ 1/2; the manuscript should clarify whether a constant number of independent repetitions (to boost to high probability) affects the O(α^{-1}) query bound or is handled separately in the analysis.
- The abstract and introduction would benefit from a one-sentence statement of the input representation assumed for the tree (e.g., adjacency-list or Euler-tour) and the precise definition of the g-value in the maximum g-value color problem.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the recommendation for minor revision. We appreciate the recognition that our results advance the state of the art by removing the log log n factor from the query time for path mode and least-frequent-element queries while maintaining linear space, as well as the value placed on the general g-value color framework.
Circularity Check
No significant circularity; algorithmic bounds derived independently
full rationale
The paper's core claims consist of explicit data structure constructions in the word-RAM model, with preprocessing time O(n sqrt(n w)) and query time O(sqrt(n/w)) obtained via tree decompositions, heavy-light techniques, and table lookups whose analysis is independent of any fitted parameters or self-referential definitions. The improvement over Durocher et al. (2016) is stated as a direct comparison of concrete bounds rather than a reduction to prior self-citations. The randomized alpha-minority result is isolated with an explicit success probability of at least 1/2 and does not underpin the deterministic mode/least-frequent bounds. No step equates a derived quantity to its own input by construction, renames a known empirical pattern, or imports uniqueness via author-overlapping citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Word-RAM model with word size w satisfying w = Omega(log n)
Reference graph
Works this paper leans on
-
[1]
Theory of Computing Systems , volume=
Linear-space data structures for range mode query in arrays , author=. Theory of Computing Systems , volume=. 2014 , publisher=
work page 2014
-
[2]
Scandinavian Workshop on Algorithm Theory , pages=
Linear-space data structures for range minority query in arrays , author=. Scandinavian Workshop on Algorithm Theory , pages=. 2012 , organization=
work page 2012
-
[3]
Linear-space data structures for range frequency queries on arrays and trees , author=. Algorithmica , volume=. 2016 , publisher=
work page 2016
-
[4]
Resizable arrays in optimal time and space , author=. Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11--14, 1999 Proceedings 6 , pages=. 1999 , organization=
work page 1999
-
[5]
Nordic Journal of Computing , volume=
Range mode and range median queries on lists and trees , author=. Nordic Journal of Computing , volume=. 2005 , publisher=
work page 2005
-
[6]
Efficient minimal perfect hashing in nearly minimal space , author=. STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science Dresden, Germany, February 15--17, 2001 Proceedings 18 , pages=. 2001 , organization=
work page 2001
-
[7]
European Symposium on Algorithms , pages=
Hash, displace, and compress , author=. European Symposium on Algorithms , pages=. 2009 , organization=
work page 2009
-
[8]
Fredman, János Komlós, and Endre Szemerédi
Fredman, Michael L. and Koml\'. Storing a Sparse Table with 0(1) Worst Case Access Time , year =. doi:10.1145/828.1884 , journal =
-
[9]
Theoretical Computer Science , volume=
The level ancestor problem simplified , author=. Theoretical Computer Science , volume=. 2004 , publisher=
work page 2004
-
[10]
Journal of computer and System Sciences , volume=
Finding level-ancestors in trees , author=. Journal of computer and System Sciences , volume=. 1994 , publisher=
work page 1994
-
[11]
Workshop on Algorithms and Data Structures , pages=
Finding level-ancestors in dynamic trees , author=. Workshop on Algorithms and Data Structures , pages=. 1991 , organization=
work page 1991
-
[12]
SIAM Journal on Computing , volume=
Recursive star-tree parallel data structure , author=. SIAM Journal on Computing , volume=. 1993 , publisher=
work page 1993
-
[13]
Latin American Symposium on Theoretical Informatics , pages=
The LCA problem revisited , author=. Latin American Symposium on Theoretical Informatics , pages=. 2000 , organization=
work page 2000
-
[14]
SIAM Journal on Computing , volume=
Filtering search: A new approach to query-answering , author=. SIAM Journal on Computing , volume=. 1986 , publisher=
work page 1986
-
[15]
Skala, Matthew. Array Range Queries. Space-Efficient Data Structures, Streams, and Algorithms: Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. 2013. doi:10.1007/978-3-642-40273-9_21
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.