pith. sign in

arxiv: 2605.31406 · v1 · pith:5PEVG73Enew · submitted 2026-05-29 · 💻 cs.DB

AIM: A practical approach to automated index management for SQL databases

Pith reviewed 2026-06-28 19:52 UTC · model grok-4.3

classification 💻 cs.DB
keywords automatic index managementSQL databasessecondary indexesworkload adaptationdatabase physical designperformance optimizationindex recommendation
0
0 comments X

The pith

AIM automatically recommends secondary indexes for SQL databases that adapt to workload changes with a no-regression guarantee.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AIM as a configurable system that identifies secondary indexes to improve CPU, I/O, and storage efficiency in SQL databases. It has been applied to thousands of production databases and allows the physical design to adjust automatically as workloads evolve. The design emphasizes fast convergence on wide composite indexes, lower dependence on the query optimizer, and a guarantee against performance regressions on live systems. Recommendations include metrics-based explanations to support verification of the changes.

Core claim

AIM is an industrial-strength index recommendation engine deployed at large scale that identifies impactful secondary indexes, converges quickly even on wide indexes, reduces optimizer reliance, and provides a no-regression guarantee, with the resulting physical design close to optimal on thousands of production databases.

What carries the argument

The AIM system, which combines workload monitoring with design choices for fast convergence on composite indexes and metrics-driven explanations for each recommendation.

If this is right

  • Database physical design adapts automatically to workload changes.
  • Index recommendations include metrics-driven explanations for easier verification.
  • The system scales to thousands of production databases.
  • Resulting designs are close to optimal while avoiding regressions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Manual index tuning by administrators could become less necessary if the system maintains its guarantees across diverse workloads.
  • The approach of pairing recommendations with explanations might extend to other automated database tuning tasks.
  • Large-scale validation suggests the system handles variability in real environments, though edge cases in query patterns could still require monitoring.

Load-bearing premise

That the described design choices can deliver fast convergence, reduced optimizer reliance, and no hidden regressions on real production workloads without extensive manual oversight.

What would settle it

A production workload where AIM-recommended indexes produce a measurable performance regression on at least one key query despite the no-regression claim.

Figures

Figures reproduced from arXiv: 2605.31406 by Mohamed Za\"it, Ritwik Yadav, Satyanarayana R. Valluri.

Figure 1
Figure 1. Figure 1: The algorithm can be thought of as having two [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Join graph for Q2 join predicates with at most j other table instances are selected. The candidate indexes for the selected table instances include all possibilities of join orders with respect to that table. The upper bound is exponential in j, since every participating table could either precede or succeed the selected table instance. Therefore, the value of j is usually a small positive integer. Althoug… view at source ↗
Figure 3
Figure 3. Figure 3: CPU utilization & throughput profiles before and after AIM execution [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Query processing costs for TPC￾H (scale factor 10) on PostgreSQL [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of join parameter a pair of tables with three sub-predicates. It is possible that any combination of two sub-predicates is not selective enough but a combination of all three is highly selective. In such cases, greedy algorithms would not be able to detect efficient join orders for the query. Some machine learning algorithms [31] try to explore suboptimal intermediate configurations in a probabilist… view at source ↗
read the original abstract

This paper describes AIM (Automatic Index Manager), a configurable index management system, which identifies impactful secondary indexes for SQL databases to efficiently use available resources such as CPU, I/O and storage. It has been validated on thousands of databases which support production systems. With AIM, the physical design of the database adapts itself to the changes in the workload.We lay out the end to end design of AIM while calling out the guarantees and tradeoffs associated with our design choices. Some of the salient features of AIM include fast convergence even while recommending wide composite indexes, reduced reliance on the query optimizer and a "no regression" guarantee for production workloads. Each index recommendation from AIM is accompanied with a metrics driven explanation, making it easier to verify machine driven changes.AIM is one of the few industrial strength index recommendation engines that is deployed on production databases at a large scale. The experimental results show that AIM is quick in identifying the most effective indexes and the resulting physical design is close to optimal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper presents AIM (Automatic Index Manager), a configurable system for identifying impactful secondary indexes in SQL databases to optimize CPU, I/O, and storage usage. It claims validation on thousands of production databases, with the physical design adapting to workload changes. Key features include fast convergence on wide composite indexes, reduced reliance on the query optimizer, a no-regression guarantee, and metrics-driven explanations for each recommendation. The experimental results demonstrate that AIM quickly identifies effective indexes and achieves physical designs close to optimal.

Significance. If the claims hold, this is a significant contribution as it describes a deployed, industrial-strength automated index management system at large scale, which is uncommon. The explicit discussion of guarantees, trade-offs, and explainability makes it particularly useful for practitioners. The validation on real production workloads strengthens the practical impact.

minor comments (2)
  1. [Abstract] The abstract claims validation on thousands of databases and closeness to optimality without referencing the specific experimental setup or metrics; while the full paper provides this in the experimental section, a brief pointer would improve standalone readability.
  2. [Experimental Results] Ensure that all performance metrics in the experimental results include error bars or statistical significance tests to support the claims of near-optimality and quick identification.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our work on AIM and the recommendation for minor revision. The report does not raise any specific major comments or concerns about the manuscript content, claims, or experiments.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript is a systems paper describing an end-to-end index recommendation engine. It contains no equations, fitted parameters, predictions derived from subsets of data, or mathematical derivations. Claims rest on explicit design choices with stated trade-offs plus separate experimental metrics on production workloads. No self-citation chain, ansatz smuggling, or renaming of known results is load-bearing. The argument is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a systems description with no mathematical derivations, fitted constants, or new postulated entities; the central claims rest on engineering choices and empirical validation whose details are absent from the provided abstract.

pith-pipeline@v0.9.1-grok · 5701 in / 1070 out tokens · 20202 ms · 2026-06-28T19:52:40.094151+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Database tuning advisor for Microsoft SQL Server 2005,

    S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V . Narasayya, and M. Syamala, “Database tuning advisor for Microsoft SQL Server 2005,” inProceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 930–932, 2005

  2. [2]

    An online approach to physical design tun- ing,

    N. Bruno and S. Chaudhuri, “An online approach to physical design tun- ing,” in2007 IEEE 23rd International Conference on Data Engineering (ICDE), pp. 826–835, IEEE, 2007

  3. [3]

    An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server,

    S. Chaudhuri and V . R. Narasayya, “An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server,” inProceedings of the 23rd International Conference on Very Large Databases (VLDB), p. 146–155, 1997

  4. [4]

    AutoAdmin “What-If

    S. Chaudhuri and V . Narasayya, “AutoAdmin “What-If” Index Analysis Utility,” inProceedings of the 1998 ACM SIGMOD International Conference on Management of Data, p. 367–378, 1998

  5. [5]

    Self-Tuning Database Systems: A Decade of Progress,

    S. Chaudhuri and V . Narasayya, “Self-Tuning Database Systems: A Decade of Progress,” inProceedings of the 33rd International Con- ference on Very Large Databases (VLDB), pp. 3–14, 2007

  6. [6]

    Automatic SQL tuning in Oracle 10g,

    B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zait, and M. Ziauddin, “Automatic SQL tuning in Oracle 10g,” inProceedings of the 30th International Conference on Very Large Databases (VLDB), pp. 1098– 1109, 2004

  7. [7]

    Db2 advisor: An optimizer smart enough to recommend its own indexes,

    G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley, “Db2 advisor: An optimizer smart enough to recommend its own indexes,” in 2000 IEEE 16th International Conference on Data Engineering (ICDE), pp. 101–110, IEEE, 2000

  8. [8]

    DB2 Design Advisor: Integrated Automatic Physical Database Design,

    D. C. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia- Arellano, and S. Fadden, “DB2 Design Advisor: Integrated Automatic Physical Database Design,” inProceedings of the 30th International Conference on Very Large Databases (VLDB), pp. 1087–1097, 2004

  9. [9]

    Index selection for databases: A hardness study and a principled heuristic solution,

    S. Chaudhuri, M. Datar, and V . Narasayya, “Index selection for databases: A hardness study and a principled heuristic solution,”IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 16, no. 11, pp. 1313–1323, 2004

  10. [10]

    How good are query optimizers, really?,

    V . Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neu- mann, “How good are query optimizers, really?,”Proceedings of the VLDB Endowment, vol. 9, no. 3, pp. 204–215, 2015

  11. [11]

    Efficient scalable multi- attribute index selection using recursive strategies,

    R. Schlosser, J. Kossmann, and M. Boissier, “Efficient scalable multi- attribute index selection using recursive strategies,” in2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1238–1249, IEEE, 2019

  12. [12]

    Chaudhuri and V

    S. Chaudhuri and V . Narasayya,Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server, October

  13. [13]

    https://www.microsoft.com/en-us/research/publication/ anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/

  14. [14]

    Magic mirror in my hand, which is the best in the land? an experimental evaluation of index selection algorithms,

    J. Kossmann, S. Halfpap, M. Jankrift, and R. Schlosser, “Magic mirror in my hand, which is the best in the land? an experimental evaluation of index selection algorithms,”Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 2382–2395, 2020

  15. [15]

    Efficient use of the query optimizer for automated physical design,

    S. Papadomanolakis, D. Dash, and A. Ailamaki, “Efficient use of the query optimizer for automated physical design,” inProceedings of the 33rd International Conference on Very Large Databases (VLDB), pp. 1093–1104, 2007

  16. [16]

    Partially ordered sets,

    B. Dushnik and E. W. Miller, “Partially ordered sets,”American journal of mathematics, vol. 63, no. 3, pp. 600–610, 1941

  17. [17]

    Ordinal sums and equational doctrines,

    H. Appelgate, M. Barr, J. Beck, F. Lawvere, F. Linton, E. Manes, M. Tierney, F. Ulmer, and F. W. Lawvere, “Ordinal sums and equational doctrines,” inSeminar on Triples and Categorical Homology Theory: ETH 1966/67, pp. 141–155, Springer, 1969

  18. [18]

    MySQL Reference Manual 8.0,Estimating Query Performance, January

  19. [19]

    https://dev.mysql.com/doc/refman/8.0/en/estimating-performance. html

  20. [20]

    Algorithms for knapsack problems,

    S. Martello and P. Toth, “Algorithms for knapsack problems,”North- Holland Mathematics Studies, vol. 132, pp. 213–257, 1987

  21. [21]

    Factorizing Complex Predicates in Queries to Exploit Indexes,

    S. Chaudhuri, P. Ganesan, and S. Sarawagi, “Factorizing Complex Predicates in Queries to Exploit Indexes,” inProceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 361–372, 2003

  22. [22]

    https://dev.mysql.com/doc/refman/8.0/en/ range-optimization.html#range-access-multi-part

    MySQL Reference Manual 8.0,Range Access Method for Multiple- Part Indexes, October 2022. https://dev.mysql.com/doc/refman/8.0/en/ range-optimization.html#range-access-multi-part

  23. [23]

    https://dev.mysql.com/doc/refman/8.0/en/ index-condition-pushdown-optimization.htm

    MySQL Reference Manual 8.0,Index Condition Pushdown Op- timization, October 2022. https://dev.mysql.com/doc/refman/8.0/en/ index-condition-pushdown-optimization.htm

  24. [24]

    https://dev

    MySQL Reference Manual 8.0,Join Clause, October 2022. https://dev. mysql.com/doc/refman/8.0/en/join.html

  25. [25]

    On the optimal nesting order for computing n-relational joins,

    I. Toshihide and K. Tiko, “On the optimal nesting order for computing n-relational joins,”ACM Transactions on Database Systems (TODS), vol. 9, no. 3, pp. 482–502, 1984

  26. [26]

    Analyzing Plan Diagrams of Database Query Optimizers,

    N. Reddy and J. R. Haritsa, “Analyzing Plan Diagrams of Database Query Optimizers,” inProceedings of the 31st International Conference on Very Large Databases (VLDB), p. 1228–1239, 2005

  27. [27]

    CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

    D. Dash, N. Polyzotis, and A. Ailamaki, “Cophy: A scalable, portable, and interactive index advisor for large workloads,”arXiv preprint arXiv:1104.3214, 2011

  28. [28]

    https://github

    Dexter,The automatic indexer for Postgres, October 2022. https://github. com/ankane/dexter

  29. [29]

    Index selection in relational databases,

    K. Y . Whang, “Index selection in relational databases,”Foundations of Data Organization, pp. 487–500, 1987

  30. [30]

    Automatic physical database tuning: A relaxation-based approach,

    N. Bruno and S. Chaudhuri, “Automatic physical database tuning: A relaxation-based approach,” inProceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 227–238, 2005

  31. [31]

    https: //github.com/HypoPG/hypopg

    HypoPG,Hypothetical Indexes for PostgreSQL, October 2022. https: //github.com/HypoPG/hypopg

  32. [32]

    https: //dev.mysql.com/doc/refman/8.0/en/innodb-limits.html

    MySQL Reference Manual 8.0,InnoDB Limits, October 2022. https: //dev.mysql.com/doc/refman/8.0/en/innodb-limits.html

  33. [33]

    AutoIndex: An Incremental Index Management System for Dynamic Workloads,

    X. Zhou, L. Liu, W. Li, L. Jin, S. Li, T. Wang, and J. Feng, “AutoIndex: An Incremental Index Management System for Dynamic Workloads,” in 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 2196–2208, IEEE, 2022

  34. [34]

    On-line index selection for shifting workloads,

    K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “On-line index selection for shifting workloads,” in2007 IEEE 23rd International Conference on Data Engineering Workshop (ICDEW), pp. 459–468, IEEE, 2007

  35. [35]

    FlexiRaft: Flexible Quorums with Raft,

    R. Yadav and A. Rahut, “FlexiRaft: Flexible Quorums with Raft,”The Conference on Innovative Data Systems Research (CIDR), 2023

  36. [36]

    Apache Kafka: Next generation distributed messaging system,

    K. M. M. Thein, “Apache Kafka: Next generation distributed messaging system,”International Journal of Scientific Engineering and Technology Research, vol. 3, no. 47, pp. 9478–9483, 2014

  37. [37]

    Predicting query execution time: Are optimizer cost models really unusable?,

    W. Wu, Y . Chi, S. Zhu, J. Tatemura, H. Hacig ¨um¨us, and J. F. Naughton, “Predicting query execution time: Are optimizer cost models really unusable?,” in2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1081–1092, IEEE, 2013

  38. [38]

    https://dev.mysql.com/doc/refman/8.0/en/range-optimization

    MySQL Reference Manual 8.0,Skip Scan Range Access Method, Octo- ber 2022. https://dev.mysql.com/doc/refman/8.0/en/range-optimization. html#range-access-skip-scan

  39. [39]

    https://dev.mysql.com/doc/refman/8.0/en/ index-merge-optimization.html

    MySQL Reference Manual 8.0,Index Merge Optimization, October 2022. https://dev.mysql.com/doc/refman/8.0/en/ index-merge-optimization.html

  40. [40]

    https: //bugs.mysql.com/bug.php?id=100253

    MySQL Bug,Skip Scan retrieves incorrect Result, October 2022. https: //bugs.mysql.com/bug.php?id=100253

  41. [41]

    https://bugs.mysql.com/bug.php? id=80390

    MySQL Bug,Clustered primary key included in index merge may cause higher execution times, October 2022. https://bugs.mysql.com/bug.php? id=80390

  42. [42]

    A quantitative approach to the selection of secondary indexes,

    F. Palermo, “A quantitative approach to the selection of secondary indexes,”IBM Research, RJ, vol. 730, 1970

  43. [43]

    Configuration-parametric query optimiza- tion for physical design tuning,

    N. Bruno and R. V . Nehme, “Configuration-parametric query optimiza- tion for physical design tuning,” inProceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 941– 952, 2008

  44. [44]

    Budget-aware Index Tuning with Reinforcement Learning,

    W. Wu, C. Wang, T. Siddiqui, J. Wang, V . Narasayya, S. Chaudhuri, and P. A. Bernstein, “Budget-aware Index Tuning with Reinforcement Learning,” inProceedings of the 2022 ACM SIGMOD International Conference on Management of Data, pp. 1528–1541, 2022

  45. [45]

    Automated physical designers: what you see is (not) what you get,

    R. Borovica, I. Alagiannis, and A. Ailamaki, “Automated physical designers: what you see is (not) what you get,” inProceedings of the Fifth International Workshop on Testing Database Systems, pp. 1–6, 2012

  46. [46]

    Automatically indexing millions of databases in Microsoft Azure SQL database,

    S. Das, M. Grbic, I. Ilic, I. Jovandic, A. Jovanovic, V . R. Narasayya, M. Radulovic, M. Stikic, G. Xu, and S. Chaudhuri, “Automatically indexing millions of databases in Microsoft Azure SQL database,” in Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pp. 666–679, 2019

  47. [47]

    Exact and approximate al- gorithms for the index selection problem in physical database design,

    A. Caprara, M. Fischetti, and D. Maio, “Exact and approximate al- gorithms for the index selection problem in physical database design,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 7, no. 6, pp. 955–967, 1995

  48. [48]

    A branch-and-cut algorithm for a general- ization of the uncapacitated facility location problem,

    A. Caprara and J. Gonz ´alez, “A branch-and-cut algorithm for a general- ization of the uncapacitated facility location problem,”Top, vol. 4, no. 1, pp. 135–163, 1996

  49. [49]

    Regularized cost-model oblivious database tuning with reinforce- ment learning,

    D. Basu, Q. Lin, W. Chen, H. T. V o, Z. Yuan, P. Senellart, and S. Bres- san, “Regularized cost-model oblivious database tuning with reinforce- ment learning,” inTransactions on Large-Scale Data-and Knowledge- Centered Systems XXVIII, pp. 96–132, Springer, 2016

  50. [50]

    The Case for Automatic Database Administration using Deep Reinforcement Learning

    A. Sharma, F. M. Schuhknecht, and J. Dittrich, “The case for auto- matic database administration using deep reinforcement learning,”arXiv preprint arXiv:1801.05643, 2018

  51. [51]

    Online index selection using deep reinforcement learning for a cluster database,

    Z. Sadri, L. Gruenwald, and E. Leal, “Online index selection using deep reinforcement learning for a cluster database,” in2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), pp. 158–161, IEEE, 2020

  52. [52]

    DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees,

    R. M. Perera, B. Oetomo, B. I. Rubinstein, and R. Borovica-Gajic, “DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees,” in2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 600–611, IEEE, 2021

  53. [53]

    UDO: universal database optimiza- tion using reinforcement learning,

    J. Wang, I. Trummer, and D. Basu, “UDO: universal database optimiza- tion using reinforcement learning,”arXiv preprint arXiv:2104.01744, 2021

  54. [54]

    AI meets AI: Leveraging query executions to improve index recom- mendations,

    B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V . R. Narasayya, “AI meets AI: Leveraging query executions to improve index recom- mendations,” inProceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pp. 1241–1258, 2019

  55. [55]

    Deep learning models for selectivity estimation of multi-attribute queries,

    S. Hasan, S. Thirumuruganathan, J. Augustine, N. Koudas, and G. Das, “Deep learning models for selectivity estimation of multi-attribute queries,” inProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050, 2020

  56. [56]

    Which Sort Orders Are Interesting?,

    R. Guravannavar, S. Sudarshan, A. A. Diwan, and C. S. Babu, “Which Sort Orders Are Interesting?,”The VLDB Journal, vol. 21, no. 1, pp. 145––165, 2012