Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

Faruk Alpay; Levent Sarioglu

arxiv: 2605.29061 · v1 · pith:IKR7BSM2new · submitted 2026-05-27 · 💻 cs.DS · cs.DB

Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

Faruk Alpay , Levent Sarioglu This is my paper

Pith reviewed 2026-06-29 09:03 UTC · model grok-4.3

classification 💻 cs.DS cs.DB

keywords learned indexesresidual entropyaccounting theorempredecessor searchrank searchcertified repairatom budgetpiecewise linear

0 comments

The pith

Conditional residual answer entropy determines the query-time scale in routed atom-budgeted learned indexes with certified repair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines exact predecessor and rank search inside a routed architecture where a directory sends each query to a contiguous interval, a counted local predictor supplies a certified rank window, and exact repair finishes the answer by comparisons. It introduces conditional residual answer entropy as the uncertainty left in the exact answer once the leaf, predictor output, certificate, and charged pre-repair information are known. The central result is a two-sided accounting theorem that equates this entropy to the scale of query time under the given local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are kept as separate costs, so the theorem supplies an accounting relation rather than a single byte-level bound. The work also derives a rank-spread specialization and reports concrete RGapM and GapM values plus benchmarks for counted piecewise-linear segments on real data sets.

Core claim

We prove a two-sided accounting theorem showing that conditional residual answer entropy gives the query-time scale under the stated routed, atom-budgeted, certified-repair architecture and local predictor-atom budget, with directory space, sorted-array storage, and transcript-indexed repair-program space treated as separate system costs.

What carries the argument

The two-sided accounting theorem that equates conditional residual answer entropy to query-time scale.

If this is right

Query time is governed by the entropy remaining after the predictor transcript and certificate are observed.
For counted piecewise-linear segments the profile term becomes non-oracular through a shadow-price allocation rule.
Finite-instance RGapM and GapM values can be computed on real SOSD and Zenodo samples.
Benchmarks against PGM-index, RadixSpline, and binary search reveal overheads and bottlenecks rather than overall speed claims.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The entropy accounting could guide atom-budget allocation across multiple predictor types if the certificate and repair structure stays fixed.
Treating repair-program space as a distinct cost allows explicit trade-offs between precomputed repair transcripts and on-line comparisons.
The rank-spread specialization may apply when residual ranks after the transcript remain uniformly likely.

Load-bearing premise

The result holds only inside the architecture that routes via directory to contiguous intervals, uses counted local predictors for certified rank windows, resolves the rest by exact repair, and keeps the three space costs separate.

What would settle it

An implementation of the routed atom-budgeted certified-repair architecture in which the measured number of repair comparisons deviates systematically from the computed conditional residual answer entropy for the same atom budget.

read the original abstract

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A narrowly scoped entropy accounting theorem for one specific routed learned-index architecture.

read the letter

The paper introduces conditional residual answer entropy as a functional on the outputs of a routed, atom-budgeted learned index with directory routing, counted local predictor, certified rank window, and exact repair. It proves a two-sided accounting theorem that this functional gives the query-time scale inside that exact model, with directory, array, and repair-program space kept as separate costs. A rank-spread specialization and a shadow-price allocation rule for counted piecewise-linear segments are also derived, and some RGapM and GapM numbers are computed on real SOSD and Zenodo samples.

The new pieces are the functional itself and the theorem, plus the two specializations. The work stays honest about its limits and does not pretend to give general lower bounds or a ready-to-use index. The benchmarks are framed as exposing overheads rather than claiming wins, which matches the scoped goal.

The main limitation is the narrow architecture: everything rests on the directory routing to contiguous intervals, the counted predictor, the certificate, and the exact repair step. Remove any of those and the accounting does not apply. The abstract states the theorem but shows no derivation steps, so the soundness cannot be checked from the given material. The result is architecture-dependent by design, so it does not reduce to a parameter-free claim.

This is for researchers already working on entropy or cost models for learned indexes who want to see the accounting applied inside one concrete routed setup. A reader outside that niche will not find a new algorithm or a broad result.

I would send it to peer review so the full derivations can be examined.

Referee Report

2 major / 3 minor

Summary. The manuscript studies exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture consisting of an ordered directory, counted local predictor returning a certified rank window, and exact repair. The central claim is a two-sided accounting theorem equating conditional residual answer entropy (entropy of the exact answer after observing the leaf, predictor output, certificate, and charged pre-repair information) to query-time scale under the local predictor-atom budget. Directory, sorted-array, and transcript-indexed repair-program space are treated as separate costs. Additional contributions include a rank-spread specialization (with radius term log(1 + Delta) valid only when many residual ranks remain likely), a shadow-price allocation rule making the profile term non-oracular for counted piecewise-linear segments, finite-instance RGapM and GapM computations on SOSD and Zenodo samples, and benchmarks against PGM-index, RadixSpline, and binary search that expose overheads rather than claiming superiority.

Significance. If the scoped theorem holds, it supplies a precise, architecture-specific identity relating residual entropy to query time while cleanly separating distinct space costs; this is a useful theoretical tool for analyzing this class of learned indexes. The finite-instance empirical values and benchmark exposure of bottlenecks provide concrete data points on practical overheads within the stated model.

major comments (2)

[Abstract / empirical evaluation] Abstract and empirical section: The reported finite-instance RGapM and GapM values on SOSD and Zenodo samples are presented without error bars, confidence intervals, sample-size justification, or explicit exclusion criteria; because these values are used to illustrate the accounting in concrete cases, the absence of statistical characterization weakens the empirical support for the theorem's practical relevance.
[Theorem on residual-entropy accounting] Theorem statement (two-sided accounting): The derivation that conditional residual answer entropy exactly equals query-time scale must be shown to hold without introducing architecture-dependent fitted parameters beyond those already declared; if the proof relies on the counted local predictor and certified rank window, the steps establishing the two-sided bound should be expanded to confirm no circular reduction to the functional itself occurs.

minor comments (3)

[Rank-spread specialization] The rank-spread specialization should include a precise, quantitative definition of the phrase 'many residual ranks remain likely' to make the validity condition for log(1 + Delta) operational rather than informal.
[Shadow-price rule] The shadow-price allocation rule for piecewise-linear segments is presented as making the profile term non-oracular; a short worked example on a small instance would clarify how the rule is applied in the finite-instance computations.
[Benchmarks] The benchmarks section would benefit from an explicit statement of the machine model and cost model used for the reported query times to ensure reproducibility of the overhead comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation of minor revision. We address each major comment below.

read point-by-point responses

Referee: [Abstract / empirical evaluation] Abstract and empirical section: The reported finite-instance RGapM and GapM values on SOSD and Zenodo samples are presented without error bars, confidence intervals, sample-size justification, or explicit exclusion criteria; because these values are used to illustrate the accounting in concrete cases, the absence of statistical characterization weakens the empirical support for the theorem's practical relevance.

Authors: We agree that the empirical section would benefit from statistical characterization. The reported values were provided as concrete illustrations of the accounting theorem on real data rather than as a comprehensive statistical evaluation. In the revised manuscript we will add bootstrap-derived 95% confidence intervals for the RGapM and GapM figures, state the exact sample sizes drawn from each dataset, and document any exclusion criteria applied. revision: yes
Referee: [Theorem on residual-entropy accounting] Theorem statement (two-sided accounting): The derivation that conditional residual answer entropy exactly equals query-time scale must be shown to hold without introducing architecture-dependent fitted parameters beyond those already declared; if the proof relies on the counted local predictor and certified rank window, the steps establishing the two-sided bound should be expanded to confirm no circular reduction to the functional itself occurs.

Authors: The two-sided accounting is derived directly from the definition of conditional residual answer entropy together with the atom-budget constraint on the counted local predictor and the certified rank window; no additional fitted parameters are introduced. We nevertheless accept that the proof presentation can be strengthened. In the revision we will expand the derivation steps (in the main text or a dedicated appendix) to make explicit the sequence from the predictor transcript and certificate to the entropy bound, confirming that the argument does not reduce circularly to the functional being bounded. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture-scoped identity is self-contained

full rationale

The central result is a two-sided accounting theorem scoped explicitly to the routed directory + counted local predictor + certified rank window + exact repair model, with directory/array/repair-program space treated as separate costs. The theorem equates the defined conditional residual answer entropy functional (entropy of exact answer after observing leaf, predictor output, certificate, and charged pre-repair information) to query-time scale under the local predictor-atom budget. No equations, self-citations, or derivations are exhibited in the provided material that reduce this identity to a fitted parameter renamed as prediction, a self-definitional loop, or a load-bearing self-citation chain. The rank-spread specialization and shadow-price rule are presented as further derivations inside the same scoped model. The result is therefore architecture-dependent by explicit statement rather than circular by construction. Benchmarks against external indexes are reported without claiming the theorem itself depends on them.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim rests on the definition of conditional residual answer entropy and the architectural constraints listed in the abstract; no free parameters are fitted to data in the abstract description, and no new entities are postulated.

free parameters (1)

conditional residual answer entropy
Defined as entropy of exact answer after leaf, predictor output, certificate, and pre-repair information; serves as the main parameter for query-time scale.

axioms (1)

domain assumption Entropy functional is an appropriate measure of residual uncertainty for query cost
Invoked as the quantity that gives the query-time scale under the architecture.

pith-pipeline@v0.9.1-grok · 5799 in / 1254 out tokens · 35986 ms · 2026-06-29T09:03:11.720969+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 21 canonical work pages

[1]

Peyman Afshani, Jérémie Barbay, and Timothy M. Chan. Instance-optimal geometric algorithms.Journal of the ACM, 64(1):4:1–4:38, 2017

2017
[2]

A lower bound for finding predecessors in Yao’s cell probe model.Combina- torica, 8(3):235–247, 1988

Miklós Ajtai. A lower bound for finding predecessors in Yao’s cell probe model.Combina- torica, 8(3):235–247, 1988

1988
[3]

BF-Tree: Approximate tree indexing

Manos Athanassoulis and Anastasia Ailamaki. BF-Tree: Approximate tree indexing. Proceedings of the VLDB Endowment, 7(14):1881–1892, 2014

2014
[4]

Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems.Journal of Computer and System Sciences, 65(1):38–72, 2002

2002
[5]

Bent, Daniel D

Stephen W. Bent, Daniel D. Sleator, and Robert E. Tarjan. Biased search trees.SIAM Journal on Computing, 14(3):545–568, 1985

1985
[6]

A learned approach to design compressed rank/select data structures.ACM Transactions on Algorithms, 18(3):24:1–24:28, 2022

Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. A learned approach to design compressed rank/select data structures.ACM Transactions on Algorithms, 18(3):24:1–24:28, 2022

2022
[7]

Beyond logarithmic bounds: querying in constant expected time with learned indexes

Luis Alberto Croquevielle, Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. Beyond logarithmic bounds: querying in constant expected time with learned indexes. In 28th International Conference on Database Theory (ICDT 2025), LIPIcs 328, 19:1–19:21,

2025
[8]

doi: 10.4230/LIPIcs.ICDT.2025.19

work page doi:10.4230/lipics.icdt.2025.19 2025
[9]

Lower bounds for the algorithmic complexity of learned indexes

Luis Alberto Croquevielle, Roman Sokolovskii, and Thomas Heinis. Lower bounds for the algorithmic complexity of learned indexes. In29th International Conference on Database Theory (ICDT 2026), LIPIcs 365, 14:1–14:21, 2026. doi: 10.4230/LIPIcs.ICDT.2026.14

work page doi:10.4230/lipics.icdt.2026.14 2026
[10]

Demaine, Thouis Jones, and Mihai Pătraşcu

Erik D. Demaine, Thouis Jones, and Mihai Pătraşcu. Interpolation search for non- independent data. InSODA, pages 522–523, 2004

2004
[11]

Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, and Robert E. Tarjan. Dynamic perfect hashing: upper and lower bounds.SIAM Journal on Computing, 23(4):738–761, 1994

1994
[12]

ALEX: An updatable adaptive learned index

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. ALEX: An updatable adaptive learned index. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 969–984, 2020. doi: 10.1145/3318464.3389711

work page doi:10.1145/3318464.3389711 2020
[13]

Why are learned indexes so effective? InICML, PMLR 119, pages 3123–3132, 2020

Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. Why are learned indexes so effective? InICML, PMLR 119, pages 3123–3132, 2020. 42

2020
[14]

The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds.Proceedings of the VLDB Endowment, 13(8):1162–1175, 2020

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds.Proceedings of the VLDB Endowment, 13(8):1162–1175, 2020. doi: 10.14778/3389133.3389135

work page doi:10.14778/3389133.3389135 2020
[15]

FL-RMQ: A learned approach to range minimum queries

Paolo Ferragina and Filippo Lari. FL-RMQ: A learned approach to range minimum queries. InCPM 2025, LIPIcs 331, 7:1–7:23, 2025. doi: 10.4230/LIPIcs.CPM.2025.7

work page doi:10.4230/lipics.cpm.2025.7 2025
[16]

Fredman, János Komlós, and Endre Szemerédi

Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table withO(1) worst case access time.Journal of the ACM, 31(3):538–544, 1984

1984
[17]

Fredman and Michael E

Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. InProceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), pages 345–354, 1989. doi: 10.1145/73007.73040

work page doi:10.1145/73007.73040 1989
[18]

A dynamic piecewise-linear geometric index with worst-case guarantees

Emil Toftegaard Gæde, Ivor van der Hoog, Eva Rotenberg, and Tord Stordalen. A dynamic piecewise-linear geometric index with worst-case guarantees. In33rd Annual European Symposium on Algorithms (ESA 2025), LIPIcs 351, 64:1–64:18, 2025. doi: 10.4230/LIPIcs.ESA.2025.64

work page doi:10.4230/lipics.esa.2025.64 2025
[19]

FITing-Tree: A data-aware index structure

Alexandros Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska, and Umar Farooq Minhas. FITing-Tree: A data-aware index structure. InProceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pages 1189–1206,

2019
[20]

doi: 10.1145/3299869.3319860

work page doi:10.1145/3299869.3319860
[21]

Computational-geometric methods for polygonal approximations of a curve.Computer Vision, Graphics, and Image Processing, 36(1):31–41, 1986

Hiroshi Imai and Masao Iri. Computational-geometric methods for polygonal approximations of a curve.Computer Vision, Graphics, and Image Processing, 36(1):31–41, 1986. doi: 10.1016/S0734-189X(86)80027-5

work page doi:10.1016/s0734-189x(86)80027-5 1986
[22]

PhD thesis, Carnegie Mellon University, 1989

Guy Jacobson.Succinct Static Data Structures. PhD thesis, Carnegie Mellon University, 1989

1989
[23]

RadixSpline: A single-pass learned index

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. RadixSpline: A single-pass learned index. InProceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM), pages 5:1–5:5, 2020. doi: 10.1145/3401071.3401659

work page doi:10.1145/3401071.3401659 2020
[24]

Donald E. Knuth. Optimum binary search trees.Acta Informatica, 1:14–25, 1971

1971
[25]

Chi, Jeffrey Dean, and Neoklis Polyzotis

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 489–504, 2018. doi: 10.1145/3183713.3196909

work page doi:10.1145/3183713.3196909 2018
[26]

Shane Culpepper, and Renata Borovica-Gajic

Hai Lan, Zhifeng Bao, J. Shane Culpepper, and Renata Borovica-Gajic. Updatable learned indexes meet disk-resident DBMS: from evaluations to design choices. InProceedings of the ACM SIGMOD International Conference on Management of Data, 2023. doi: 10.1145/3589284

work page doi:10.1145/3589284 2023
[27]

The adaptive radix tree: ARTful indexing for main-memory databases

Viktor Leis, Alfons Kemper, and Thomas Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. InICDE, pages 38–49, 2013

2013
[28]

Why are learned indexes so effective but sometimes ineffective?Proceedings of the VLDB Endowment, 18(9):2886–2898, 2025

Qiyu Liu, Siyuan Han, Yanlin Qi, and others. Why are learned indexes so effective but sometimes ineffective?Proceedings of the VLDB Endowment, 18(9):2886–2898, 2025. doi: 10.14778/3746405.3746415

work page doi:10.14778/3746405.3746415 2025
[29]

Understanding robustness issues of updatable learned indexes: experiments and analysis.Proceedings of the ACM on Management of Data, 3(4):270, 2025

Yuanhui Luo, Minhui Xie, Yiheng Tong, Shichao Jiang, and Yunpeng Chai. Understanding robustness issues of updatable learned indexes: experiments and analysis.Proceedings of the ACM on Management of Data, 3(4):270, 2025. doi: 10.1145/3749188. 43

work page doi:10.1145/3749188 2025
[30]

A critical analysis of recursive model indexes.Proceedings of the VLDB Endowment, 15(5):1079–1091, 2022

Marcel Maltry and Jens Dittrich. A critical analysis of recursive model indexes.Proceedings of the VLDB Endowment, 15(5):1079–1091, 2022

2022
[31]

CDFShop: Exploring and optimizing learned index structures

Ryan Marcus, Emily Zhang, and Tim Kraska. CDFShop: Exploring and optimizing learned index structures. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 2789–2792, 2020. doi: 10.1145/3318464.3384706

work page doi:10.1145/3318464.3384706 2020
[32]

Benchmarking learned indexes.Proceedings of the VLDB Endowment, 14(1):1–13, 2021

Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. Benchmarking learned indexes.Proceedings of the VLDB Endowment, 14(1):1–13, 2021. doi: 10.14778/3421424.3421425

work page doi:10.14778/3421424.3421425 2021
[33]

Nearly optimal binary search trees.Acta Informatica, 5:287–295, 1975

Kurt Mehlhorn. Nearly optimal binary search trees.Acta Informatica, 5:287–295, 1975

1975
[34]

Cell probe complexity – a survey

Peter Bro Miltersen. Cell probe complexity – a survey. InAdvances in Data Structures Workshop, 1999

1999
[35]

Predecessor search.ACM Computing Surveys, 53(5):105:1–105:35, 2020

Gonzalo Navarro and Matías Rojas-Ledesma. Predecessor search.ACM Computing Surveys, 53(5):105:1–105:35, 2020

2020
[36]

Overmars.The Design of Dynamic Data Structures

Mark H. Overmars.The Design of Dynamic Data Structures. Lecture Notes in Computer Science 156. Springer, 1983

1983
[37]

Mihai Pătraşcu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model. SIAM Journal on Computing, 35(4):932–963, 2006

2006
[38]

Time-space trade-offs for predecessor search

Mihai Pătraşcu and Mikkel Thorup. Time-space trade-offs for predecessor search. InSTOC, pages 232–240, 2006

2006
[39]

Randomization does not help searching predecessors

Mihai Pătraşcu and Mikkel Thorup. Randomization does not help searching predecessors. InSODA, pages 555–564, 2007

2007
[40]

Cell-probe lower bounds for succinct partial sums

Mihai Pătraşcu and Emanuele Viola. Cell-probe lower bounds for succinct partial sums. In SODA, pages 117–122, 2010

2010
[41]

Srinivasa Rao

Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encodingk-ary trees and multisets.ACM Transactions on Algorithms, 3(4):43, 2007

2007
[42]

Jun Rao and Kenneth A. Ross. Making B+-trees cache conscious in main memory. In SIGMOD, pages 475–486, 2000

2000
[43]

Sleator and Robert E

Daniel D. Sleator and Robert E. Tarjan. Self-adjusting binary search trees.Journal of the ACM, 32(3):652–686, 1985

1985
[44]

Zhang, J

Lorenzo Bellomo. Sorted unsigned integer datasets. Zenodo, 2025. doi: 10.5281/zen- odo.15240501

work page doi:10.5281/zen- 2025
[45]

Kaas, and E

Peter van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue.Mathematical Systems Theory, 10:99–127, 1977

1977
[46]

Are updatable learned indexes ready?Proceedings of the VLDB Endowment, 15(11):3004–3017, 2022

Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. Are updatable learned indexes ready?Proceedings of the VLDB Endowment, 15(11):3004–3017, 2022. doi: 10.14778/3551793.3551848

work page doi:10.14778/3551793.3551848 2022
[47]

Updatable learned index with precise positions.Proceedings of the VLDB Endowment, 14(8):1276–1288,

JiachengWu, YongZhang, ShiminChen, JinWang, YuChen, andChunxiaoXing. Updatable learned index with precise positions.Proceedings of the VLDB Endowment, 14(8):1276–1288,
[48]

doi: 10.14778/3457390.3457393. 44

work page doi:10.14778/3457390.3457393
[49]

Kornaropoulos, and Yue Cheng

Rui Yang, Evgenios M. Kornaropoulos, and Yue Cheng. Algorithmic complexity attacks on dynamic learned indexes.Proceedings of the VLDB Endowment, 17(4):780–792, 2024. doi: 10.14778/3636218.3636232

work page doi:10.14778/3636218.3636232 2024
[50]

Andrew C.-C. Yao. Probabilistic computations: toward a unified measure of complexity. In FOCS, pages 222–227, 1977

1977
[51]

On distribution dependent sub-logarithmic query time of learned indexing

Sepanta Zeighami and Cyrus Shahabi. On distribution dependent sub-logarithmic query time of learned indexing. InICML, PMLR 202, pages 40669–40680, 2023

2023
[52]

Making in-memory learned indexes efficient on disk.Proceedings of the ACM on Management of Data, 2(3):151:1–151:27, 2024

Jiaoyi Zhang, Kai Su, and Huanchen Zhang. Making in-memory learned indexes efficient on disk.Proceedings of the ACM on Management of Data, 2(3):151:1–151:27, 2024. doi: 10.1145/3654954. 45

work page doi:10.1145/3654954 2024

[1] [1]

Peyman Afshani, Jérémie Barbay, and Timothy M. Chan. Instance-optimal geometric algorithms.Journal of the ACM, 64(1):4:1–4:38, 2017

2017

[2] [2]

A lower bound for finding predecessors in Yao’s cell probe model.Combina- torica, 8(3):235–247, 1988

Miklós Ajtai. A lower bound for finding predecessors in Yao’s cell probe model.Combina- torica, 8(3):235–247, 1988

1988

[3] [3]

BF-Tree: Approximate tree indexing

Manos Athanassoulis and Anastasia Ailamaki. BF-Tree: Approximate tree indexing. Proceedings of the VLDB Endowment, 7(14):1881–1892, 2014

2014

[4] [4]

Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems.Journal of Computer and System Sciences, 65(1):38–72, 2002

2002

[5] [5]

Bent, Daniel D

Stephen W. Bent, Daniel D. Sleator, and Robert E. Tarjan. Biased search trees.SIAM Journal on Computing, 14(3):545–568, 1985

1985

[6] [6]

A learned approach to design compressed rank/select data structures.ACM Transactions on Algorithms, 18(3):24:1–24:28, 2022

Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. A learned approach to design compressed rank/select data structures.ACM Transactions on Algorithms, 18(3):24:1–24:28, 2022

2022

[7] [7]

Beyond logarithmic bounds: querying in constant expected time with learned indexes

Luis Alberto Croquevielle, Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. Beyond logarithmic bounds: querying in constant expected time with learned indexes. In 28th International Conference on Database Theory (ICDT 2025), LIPIcs 328, 19:1–19:21,

2025

[8] [8]

doi: 10.4230/LIPIcs.ICDT.2025.19

work page doi:10.4230/lipics.icdt.2025.19 2025

[9] [9]

Lower bounds for the algorithmic complexity of learned indexes

Luis Alberto Croquevielle, Roman Sokolovskii, and Thomas Heinis. Lower bounds for the algorithmic complexity of learned indexes. In29th International Conference on Database Theory (ICDT 2026), LIPIcs 365, 14:1–14:21, 2026. doi: 10.4230/LIPIcs.ICDT.2026.14

work page doi:10.4230/lipics.icdt.2026.14 2026

[10] [10]

Demaine, Thouis Jones, and Mihai Pătraşcu

Erik D. Demaine, Thouis Jones, and Mihai Pătraşcu. Interpolation search for non- independent data. InSODA, pages 522–523, 2004

2004

[11] [11]

Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, and Robert E. Tarjan. Dynamic perfect hashing: upper and lower bounds.SIAM Journal on Computing, 23(4):738–761, 1994

1994

[12] [12]

ALEX: An updatable adaptive learned index

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. ALEX: An updatable adaptive learned index. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 969–984, 2020. doi: 10.1145/3318464.3389711

work page doi:10.1145/3318464.3389711 2020

[13] [13]

Why are learned indexes so effective? InICML, PMLR 119, pages 3123–3132, 2020

Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. Why are learned indexes so effective? InICML, PMLR 119, pages 3123–3132, 2020. 42

2020

[14] [14]

The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds.Proceedings of the VLDB Endowment, 13(8):1162–1175, 2020

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds.Proceedings of the VLDB Endowment, 13(8):1162–1175, 2020. doi: 10.14778/3389133.3389135

work page doi:10.14778/3389133.3389135 2020

[15] [15]

FL-RMQ: A learned approach to range minimum queries

Paolo Ferragina and Filippo Lari. FL-RMQ: A learned approach to range minimum queries. InCPM 2025, LIPIcs 331, 7:1–7:23, 2025. doi: 10.4230/LIPIcs.CPM.2025.7

work page doi:10.4230/lipics.cpm.2025.7 2025

[16] [16]

Fredman, János Komlós, and Endre Szemerédi

Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table withO(1) worst case access time.Journal of the ACM, 31(3):538–544, 1984

1984

[17] [17]

Fredman and Michael E

Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. InProceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), pages 345–354, 1989. doi: 10.1145/73007.73040

work page doi:10.1145/73007.73040 1989

[18] [18]

A dynamic piecewise-linear geometric index with worst-case guarantees

Emil Toftegaard Gæde, Ivor van der Hoog, Eva Rotenberg, and Tord Stordalen. A dynamic piecewise-linear geometric index with worst-case guarantees. In33rd Annual European Symposium on Algorithms (ESA 2025), LIPIcs 351, 64:1–64:18, 2025. doi: 10.4230/LIPIcs.ESA.2025.64

work page doi:10.4230/lipics.esa.2025.64 2025

[19] [19]

FITing-Tree: A data-aware index structure

Alexandros Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska, and Umar Farooq Minhas. FITing-Tree: A data-aware index structure. InProceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pages 1189–1206,

2019

[20] [20]

doi: 10.1145/3299869.3319860

work page doi:10.1145/3299869.3319860

[21] [21]

Computational-geometric methods for polygonal approximations of a curve.Computer Vision, Graphics, and Image Processing, 36(1):31–41, 1986

Hiroshi Imai and Masao Iri. Computational-geometric methods for polygonal approximations of a curve.Computer Vision, Graphics, and Image Processing, 36(1):31–41, 1986. doi: 10.1016/S0734-189X(86)80027-5

work page doi:10.1016/s0734-189x(86)80027-5 1986

[22] [22]

PhD thesis, Carnegie Mellon University, 1989

Guy Jacobson.Succinct Static Data Structures. PhD thesis, Carnegie Mellon University, 1989

1989

[23] [23]

RadixSpline: A single-pass learned index

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. RadixSpline: A single-pass learned index. InProceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM), pages 5:1–5:5, 2020. doi: 10.1145/3401071.3401659

work page doi:10.1145/3401071.3401659 2020

[24] [24]

Donald E. Knuth. Optimum binary search trees.Acta Informatica, 1:14–25, 1971

1971

[25] [25]

Chi, Jeffrey Dean, and Neoklis Polyzotis

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 489–504, 2018. doi: 10.1145/3183713.3196909

work page doi:10.1145/3183713.3196909 2018

[26] [26]

Shane Culpepper, and Renata Borovica-Gajic

Hai Lan, Zhifeng Bao, J. Shane Culpepper, and Renata Borovica-Gajic. Updatable learned indexes meet disk-resident DBMS: from evaluations to design choices. InProceedings of the ACM SIGMOD International Conference on Management of Data, 2023. doi: 10.1145/3589284

work page doi:10.1145/3589284 2023

[27] [27]

The adaptive radix tree: ARTful indexing for main-memory databases

Viktor Leis, Alfons Kemper, and Thomas Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. InICDE, pages 38–49, 2013

2013

[28] [28]

Why are learned indexes so effective but sometimes ineffective?Proceedings of the VLDB Endowment, 18(9):2886–2898, 2025

Qiyu Liu, Siyuan Han, Yanlin Qi, and others. Why are learned indexes so effective but sometimes ineffective?Proceedings of the VLDB Endowment, 18(9):2886–2898, 2025. doi: 10.14778/3746405.3746415

work page doi:10.14778/3746405.3746415 2025

[29] [29]

Understanding robustness issues of updatable learned indexes: experiments and analysis.Proceedings of the ACM on Management of Data, 3(4):270, 2025

Yuanhui Luo, Minhui Xie, Yiheng Tong, Shichao Jiang, and Yunpeng Chai. Understanding robustness issues of updatable learned indexes: experiments and analysis.Proceedings of the ACM on Management of Data, 3(4):270, 2025. doi: 10.1145/3749188. 43

work page doi:10.1145/3749188 2025

[30] [30]

A critical analysis of recursive model indexes.Proceedings of the VLDB Endowment, 15(5):1079–1091, 2022

Marcel Maltry and Jens Dittrich. A critical analysis of recursive model indexes.Proceedings of the VLDB Endowment, 15(5):1079–1091, 2022

2022

[31] [31]

CDFShop: Exploring and optimizing learned index structures

Ryan Marcus, Emily Zhang, and Tim Kraska. CDFShop: Exploring and optimizing learned index structures. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 2789–2792, 2020. doi: 10.1145/3318464.3384706

work page doi:10.1145/3318464.3384706 2020

[32] [32]

Benchmarking learned indexes.Proceedings of the VLDB Endowment, 14(1):1–13, 2021

Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. Benchmarking learned indexes.Proceedings of the VLDB Endowment, 14(1):1–13, 2021. doi: 10.14778/3421424.3421425

work page doi:10.14778/3421424.3421425 2021

[33] [33]

Nearly optimal binary search trees.Acta Informatica, 5:287–295, 1975

Kurt Mehlhorn. Nearly optimal binary search trees.Acta Informatica, 5:287–295, 1975

1975

[34] [34]

Cell probe complexity – a survey

Peter Bro Miltersen. Cell probe complexity – a survey. InAdvances in Data Structures Workshop, 1999

1999

[35] [35]

Predecessor search.ACM Computing Surveys, 53(5):105:1–105:35, 2020

Gonzalo Navarro and Matías Rojas-Ledesma. Predecessor search.ACM Computing Surveys, 53(5):105:1–105:35, 2020

2020

[36] [36]

Overmars.The Design of Dynamic Data Structures

Mark H. Overmars.The Design of Dynamic Data Structures. Lecture Notes in Computer Science 156. Springer, 1983

1983

[37] [37]

Mihai Pătraşcu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model. SIAM Journal on Computing, 35(4):932–963, 2006

2006

[38] [38]

Time-space trade-offs for predecessor search

Mihai Pătraşcu and Mikkel Thorup. Time-space trade-offs for predecessor search. InSTOC, pages 232–240, 2006

2006

[39] [39]

Randomization does not help searching predecessors

Mihai Pătraşcu and Mikkel Thorup. Randomization does not help searching predecessors. InSODA, pages 555–564, 2007

2007

[40] [40]

Cell-probe lower bounds for succinct partial sums

Mihai Pătraşcu and Emanuele Viola. Cell-probe lower bounds for succinct partial sums. In SODA, pages 117–122, 2010

2010

[41] [41]

Srinivasa Rao

Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encodingk-ary trees and multisets.ACM Transactions on Algorithms, 3(4):43, 2007

2007

[42] [42]

Jun Rao and Kenneth A. Ross. Making B+-trees cache conscious in main memory. In SIGMOD, pages 475–486, 2000

2000

[43] [43]

Sleator and Robert E

Daniel D. Sleator and Robert E. Tarjan. Self-adjusting binary search trees.Journal of the ACM, 32(3):652–686, 1985

1985

[44] [44]

Zhang, J

Lorenzo Bellomo. Sorted unsigned integer datasets. Zenodo, 2025. doi: 10.5281/zen- odo.15240501

work page doi:10.5281/zen- 2025

[45] [45]

Kaas, and E

Peter van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue.Mathematical Systems Theory, 10:99–127, 1977

1977

[46] [46]

Are updatable learned indexes ready?Proceedings of the VLDB Endowment, 15(11):3004–3017, 2022

Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. Are updatable learned indexes ready?Proceedings of the VLDB Endowment, 15(11):3004–3017, 2022. doi: 10.14778/3551793.3551848

work page doi:10.14778/3551793.3551848 2022

[47] [47]

Updatable learned index with precise positions.Proceedings of the VLDB Endowment, 14(8):1276–1288,

JiachengWu, YongZhang, ShiminChen, JinWang, YuChen, andChunxiaoXing. Updatable learned index with precise positions.Proceedings of the VLDB Endowment, 14(8):1276–1288,

[48] [48]

doi: 10.14778/3457390.3457393. 44

work page doi:10.14778/3457390.3457393

[49] [49]

Kornaropoulos, and Yue Cheng

Rui Yang, Evgenios M. Kornaropoulos, and Yue Cheng. Algorithmic complexity attacks on dynamic learned indexes.Proceedings of the VLDB Endowment, 17(4):780–792, 2024. doi: 10.14778/3636218.3636232

work page doi:10.14778/3636218.3636232 2024

[50] [50]

Andrew C.-C. Yao. Probabilistic computations: toward a unified measure of complexity. In FOCS, pages 222–227, 1977

1977

[51] [51]

On distribution dependent sub-logarithmic query time of learned indexing

Sepanta Zeighami and Cyrus Shahabi. On distribution dependent sub-logarithmic query time of learned indexing. InICML, PMLR 202, pages 40669–40680, 2023

2023

[52] [52]

Making in-memory learned indexes efficient on disk.Proceedings of the ACM on Management of Data, 2(3):151:1–151:27, 2024

Jiaoyi Zhang, Kai Su, and Huanchen Zhang. Making in-memory learned indexes efficient on disk.Proceedings of the ACM on Management of Data, 2(3):151:1–151:27, 2024. doi: 10.1145/3654954. 45

work page doi:10.1145/3654954 2024