AgenticDB: Agentic Performance Reconfiguration for Database Workloads

Chaozheng Wang; Chen Zheng; Heng Zhang; Xinyue Yang; Yanjun Wu

arxiv: 2606.20318 · v2 · pith:DGS4IME6new · submitted 2026-06-18 · 💻 cs.DB

AgenticDB: Agentic Performance Reconfiguration for Database Workloads

Xinyue Yang , Chaozheng Wang , Chen Zheng , Heng Zhang , Yanjun Wu This is my paper

Pith reviewed 2026-06-29 04:54 UTC · model grok-4.3

classification 💻 cs.DB

keywords database configuration tuningagentic frameworkruntime feedbackworkload reconfigurationMySQLPostgreSQLOS-level actionsperformance optimization

0 comments

The pith

AgenticDB turns database tuning into a self-refining process by letting an agent propose safe DBMS and OS changes guided by runtime feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that existing automatic database tuners are limited by high execution costs, narrow DBMS-only search spaces, and weak use of runtime states for diagnosis or safety. AgenticDB addresses this with a context-grounded harness that proposes changes at both DBMS and OS levels, applies them under constraints, observes performance and states, and feeds results back into planning. This interaction lets the system diagnose bottlenecks, avoid unsafe actions, and build experience across tasks. If the claim holds, tuning on real servers becomes more practical because feedback iteratively improves decisions instead of relying on blind search. The experiments on MySQL and PostgreSQL with standard workloads support this by showing consistent gains over prior methods.

Core claim

AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions, turning database tuning into a self-refining reconfiguration process.

What carries the argument

The context-grounded harness, which proposes changes, enforces safety constraints during application, records runtime states, and incorporates feedback into planning.

If this is right

AgenticDB reaches the best final performance on every evaluated workload.
It improves over the strongest baseline by 118.1 percent on average.
It reduces aggregate time-to-best by 22.6 percent.
Its OS-level action space, robust execution lifecycle, and memory-enhanced planning each contribute to the gains.
Experience accumulated within and across tasks improves later reconfiguration decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same harness structure could be reused on other relational or NoSQL systems that expose runtime metrics.
Over repeated deployments the accumulated experience might reduce the need for per-workload expert intervention.
Adding explicit cost models for change application time could further shorten the observed time-to-best.
The approach may extend naturally to multi-tenant or cloud-managed databases where OS-level knobs are partially exposed.

Load-bearing premise

The harness can reliably diagnose bottlenecks from runtime states and enforce safety constraints to prevent unsafe actions without creating new failure modes.

What would settle it

On the same MySQL and PostgreSQL instances with YCSB, Sysbench, and TPC-H workloads, AgenticDB produces final performance no better than the strongest baseline or triggers unsafe configuration changes that the harness was supposed to block.

Figures

Figures reproduced from arXiv: 2606.20318 by Chaozheng Wang, Chen Zheng, Heng Zhang, Xinyue Yang, Yanjun Wu.

**Figure 2.** Figure 2: Overall reconfiguration progression of AgenticDB and baseline methods across MySQL and PostgreSQL workloads. For YCSB and Sysbench, each workload reports the objective, TPS, and P95 latency; TPC-H reports execution time [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Reconfiguration progression of AgenticDB with different LLM backends on representative MySQL workloads. 6.4 RQ4: Effects of LLM Backends and Memory RQ4 evaluates how two design choices affect AgenticDB’s reconfiguration behavior: the LLM backend and the memory mechanism. We first compare multiple LLM backends under the same AgenticDB Harness to assess their impact on planning quality, end-to-end interact… view at source ↗

**Figure 4.** Figure 4: Effect of experience memory on representative [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Representative reconfiguration traces showing how [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Database configuration tuning is critical for workload performance, but practical tuning on real deployments remains difficult. Existing automatic tuners mostly formulate tuning as iterative search over DBMS knob values. This formulation often incurs high execution cost, depends on predefined DBMS-only search spaces, and provides limited support for using runtime feedback to diagnose bottlenecks and safely apply configuration changes on real servers. To address these limitations, we propose AgenticDB, an agentic framework for database workload reconfiguration. AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions. This runtime interaction enables AgenticDB to diagnose bottlenecks, explore a broader DBMS- and OS-level reconfiguration space, avoid unsafe or unsupported actions, and accumulate experience within and across reconfiguration tasks. As a result, AgenticDB turns database tuning into a self-refining reconfiguration process in which runtime feedback iteratively improves later decisions. We conduct extensive experiments on MySQL and PostgreSQL using YCSB, Sysbench, and TPC-H workloads. The results show that AgenticDB achieves the best final performance on all evaluated workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%. The results also demonstrate that its OS-level action space, robust execution lifecycle, and memory-enhanced planning contribute to more effective and practical database reconfiguration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgenticDB adds OS-level actions and a feedback loop with safety checks to DB tuning, but the big reported gains rest on an abstract with no experimental details.

read the letter

The main point is that this paper frames tuning as an ongoing agent interaction that proposes both DBMS and OS changes, applies them under constraints, observes states, and uses the feedback to refine later steps. That setup directly targets the high cost and narrow scope of standard iterative search methods.

It does a clear job naming the practical problems: predefined DBMS-only spaces, limited runtime diagnosis, and risk of unsafe changes on real servers. Extending the action space and adding a harness for bottleneck diagnosis plus memory across tasks is a logical next step if the execution works.

The reported outcomes say it beats the strongest baseline by 118% on average across YCSB, Sysbench, and TPC-H on MySQL and Postgres while cutting time-to-best by 23%. If those numbers are backed by proper controls, that would matter for deployment.

The soft spot is the complete absence of experimental specifics in the abstract: no baseline names, no run counts, no variance or statistical tests, and no ablation showing what the OS space or memory planning actually add. The safety claims are asserted without evidence they avoid new failure modes. Until the full methods and results sections are checked, the performance edge cannot be evaluated.

This is aimed at systems researchers working on auto-tuning or agent-based configuration. It is worth sending for peer review so the experimental setup and reproducibility can be examined; the idea addresses real gaps even if the current evidence is thin.

Referee Report

2 major / 1 minor

Summary. The paper proposes AgenticDB, an agentic framework for database workload reconfiguration. It implements a context-grounded harness that proposes DBMS- and OS-level configuration changes, applies them under safety constraints, observes runtime performance and states, and uses execution feedback to guide subsequent decisions. Experiments on MySQL and PostgreSQL with YCSB, Sysbench, and TPC-H workloads are reported to show that AgenticDB achieves the best final performance on all workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%, with contributions from the OS action space, robust lifecycle, and memory-enhanced planning.

Significance. If the empirical claims hold with verifiable experimental support, the work could advance practical database tuning by expanding the reconfiguration space to OS-level actions and incorporating runtime feedback in a self-refining agentic loop, addressing limitations of traditional knob-search approaches.

major comments (2)

[Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.
[Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.

minor comments (1)

The abstract refers to 'extensive experiments' but supplies no table or figure references for the reported aggregate metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and verifiability of the claims.

read point-by-point responses

Referee: [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.

Authors: We agree that the abstract presents the aggregate performance numbers at a high level. While the Experimental Evaluation section provides descriptions of the baselines, workloads (YCSB, Sysbench, TPC-H), and MySQL/PostgreSQL setups, we acknowledge that explicit details on the number of trials, statistical significance tests, and error bars are not sufficiently highlighted. To address this, we will revise the abstract to briefly note the baselines and experimental conditions, and expand the Experimental Evaluation section to include the number of independent trials, statistical tests, and error bars on the reported improvements. revision: yes
Referee: [Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.

Authors: The Framework section describes the context-grounded harness, the proposal-application-observation loop, safety constraints, and use of runtime feedback for bottleneck diagnosis. We agree that the description remains high-level and would benefit from additional support. In revision, we will add pseudocode for the harness decision process, concrete examples of bottleneck diagnosis from observed states (e.g., CPU, I/O metrics), and a dedicated discussion of potential new failure modes with mitigation via the safety constraints and rollback mechanisms. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results only

full rationale

The paper describes an agentic framework for database reconfiguration and reports performance improvements from direct experiments on MySQL/PostgreSQL with YCSB/Sysbench/TPC-H workloads. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text. The central claims rest on experimental outcomes rather than any derivation that reduces to its own definitions or prior fitted values by construction. This is the expected non-finding for a purely empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework description implies unstated assumptions about agent planning reliability and safety enforcement but supplies no details for enumeration.

pith-pipeline@v0.9.1-grok · 5804 in / 1122 out tokens · 37941 ms · 2026-06-29T04:54:00.946077+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Anthropic. 2026. Introducing Claude Opus 4.7. https://www.anthropic.com/ne ws/claude-opus-4-7

2026
[2]

Baoqing Cai, Yu Liu, Ce Zhang, Guangyu Zhang, Ke Zhou, Li Liu, Chunhua Li, Bin Cheng, Jie Yang, and Jiashu Xing. 2022. HUNTER: an online cloud database hybrid tuning system for personalized requirements. InProceedings of the 2022 International Conference on Management of Data. 646–659

2022
[3]

Ben Cane. 2017. Improving Linux System Performance with I/O Scheduler Tuning. https://www.cloudbees.com/blog/linux-io-scheduler-tuning

2017
[4]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM symposium on Cloud computing. 143–154

2010
[5]

Biplob K Debnath, David J Lilja, and Mohamed F Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In2008 IEEE 24th International Conference on Data Engineering Workshop. IEEE, 11–18

2008
[6]

2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-AI. 2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence. Technical Report. DeepSeek-AI. https://huggingface.co/deepseek- ai/DeepSeek-V4-Pro/resolve/main/DeepSeek_V4.pdf

2026
[7]

Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning database configuration parameters with ituned.Proc. VLDB Endow.2, 1 (2009), 1246–1257

2009
[8]

Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-tune: Harnessing large language models for automated database system tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

2025
[9]

Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Tieying Zhang, Jianjun Chen, Hong Chen, and Cuiping Li. 2024. E2etune: End- to-end knob tuning via fine-tuned generative language model.arXiv preprint arXiv:2404.11581(2024)

work page arXiv 2024
[10]

Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model- based optimization for general algorithm configuration. InInternational confer- ence on learning and intelligent optimization. Springer, 507–523

2011
[11]

Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman
[12]

In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)

Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)
[13]

Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: sample-efficient DBMS configu- ration tuning.arXiv preprint arXiv:2203.05128(2022)

work page arXiv 2022
[14]

2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool

Alexey Kopytov. 2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool. MySQL AB. https://imysql.com/wp-content/uploads/2014/10/ sysbench-manual.pdf

2009
[15]

Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2025. Gptuner: An llm-based database tuning system.ACM SIGMOD Record54, 1 (2025), 101–110

2025
[16]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning.Proceedings of the VLDB Endowment12, 12 (2019), 2118–2130

2019
[17]

Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proceedings of the ACM on Management of Data3, 6 (2025), 1–29

2025
[18]

Moonshot AI. 2026. Kimi K2.6 API Guide. https://platform.kimi.ai/docs/guide/ki mi-k2-6-quickstart

2026
[19]

OpenAI. 2026. Introducing GPT-5.5. https://openai.com/index/introducing-gpt- 5-5/

2026
[20]

Oracle. 2020. Oracle Database 2 Day + Performance Tuning Guide, 19c. https: //docs.oracle.com/en/database/oracle/oracle-database/19/tdppt/

2020
[21]

Oracle. 2022. Oracle Database Performance Tuning Guide, 19c. https://docs.ora cle.com/en/database/oracle/oracle-database/19/tgdba/

2022
[22]

2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online

Oracle. 2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online. Oracle. https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool- resize.html

2026
[23]

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)

2023
[24]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023
[25]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self- Driving Database Management Systems.. InCIDR, Vol. 4. 1

2017
[26]

2026.PostgreSQL 18 Documentation: Monitoring Database Activity

PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Monitoring Database Activity. PostgreSQL Global Development Group. https: //www.postgresql.org/docs/18/monitoring.html

2026
[27]

2026.PostgreSQL 18 Documentation: Resource Consumption

PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Resource Consumption. PostgreSQL Global Development Group. https://www. postgresql.org/docs/18/runtime-config-resource.html

2026
[28]

Red Hat. 2020. Red Hat Enterprise Linux 6 Performance Tuning Guide: Tuning Virtual Memory. https://docs.redhat.com/en/documentation/red_hat_enterpris e_linux/6/html/performance_tuning_guide/s-memory-tunables

2020
[29]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems36 (2023), 68539–68551

2023
[30]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems36 (2023), 8634–8652

2023
[31]

2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1

Transaction Processing Performance Council. 2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1. Transaction Processing Performance Council. https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC- H_v3.0.1.pdf

2022
[32]

Reads the Manual

Immanuel Trummer. 2022. DB-BERT: a Database Tuning Tool that" Reads the Manual". InProceedings of the 2022 international conference on management of data. 190–203

2022
[33]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM international conference on management of data. 1009–1024

2017
[34]

Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, and Andrew Pavlo. 2021. An inquiry into machine learning- based automatic configuration tuning services on real-world database manage- ment systems.Proceedings of the VLDB Endowment14, 7 (2021), 1241–1253

2021
[35]

Morreale

Rik van Riel and Peter W. Morreale. 2008. Documentation for /proc/sys/vm. https://docs.kernel.org/admin-guide/sysctl/vm.html

2008
[36]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

2024
[38]

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2025. The rise and potential of large language model based agents: A survey.Science China Information Sciences 68, 2 (2025), 121101

2025
[39]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[40]

Z.AI. 2026. GLM-5.1 Overview. https://docs.z.ai/guides/llm/glm-5.1

2026
[41]

Yueyang Zhan, Rui Xi, Jianming Liao, Shuhuan Fan, and Mengshu Hou. 2024. KnobTune: A dynamic database configuration tuning strategy leveraging his- torical workload similarities. InProceedings of the International Conference on Computing, Machine Learning and Data Science. 1–8

2024
[42]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 international conference on management of data. 415–432

2019
[43]

Limeng Zhang and M Ali Babar. 2024. Automatic configuration tuning on cloud database: A survey.arXiv preprint arXiv:2404.06043(2024)

work page arXiv 2024
[44]

Xinyi Zhang, Zhuo Chang, Yang Li, Hong Wu, Jian Tan, Feifei Li, and Bin Cui. 2021. Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation.arXiv preprint arXiv:2110.12654(2021)

work page arXiv 2021
[45]

Xinyi Zhang, Zhuo Chang, Hong Wu, Yang Li, Jia Chen, Jian Tan, Feifei Li, and Bin Cui. 2023. A unified and efficient coordinating framework for autonomous DBMS tuning.Proceedings of the ACM on Management of Data1, 2 (2023), 1–26

2023
[46]

Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta- learning for cloud databases. InProceedings of the 2021 international conference on management of data. 2102–2114. SIGMOD/PODS ’27, June, 2027, California, USA Yang et al

2021
[47]

Xinyi Zhang, Hong Wu, Yang Li, Jian Tan, Feifei Li, and Bin Cui. 2022. Towards dynamic and safe configuration tuning for cloud databases. InProceedings of the 2022 International Conference on Management of Data. 631–645

2022
[48]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47

2025
[49]

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19632–19642

2024
[50]

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731

2024
[51]

Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the perfor- mance potential of systems via automatic configuration tuning. InProceedings of the 2017 symposium on cloud computing. 338–350

2017

[1] [1]

Anthropic. 2026. Introducing Claude Opus 4.7. https://www.anthropic.com/ne ws/claude-opus-4-7

2026

[2] [2]

Baoqing Cai, Yu Liu, Ce Zhang, Guangyu Zhang, Ke Zhou, Li Liu, Chunhua Li, Bin Cheng, Jie Yang, and Jiashu Xing. 2022. HUNTER: an online cloud database hybrid tuning system for personalized requirements. InProceedings of the 2022 International Conference on Management of Data. 646–659

2022

[3] [3]

Ben Cane. 2017. Improving Linux System Performance with I/O Scheduler Tuning. https://www.cloudbees.com/blog/linux-io-scheduler-tuning

2017

[4] [4]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM symposium on Cloud computing. 143–154

2010

[5] [5]

Biplob K Debnath, David J Lilja, and Mohamed F Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In2008 IEEE 24th International Conference on Data Engineering Workshop. IEEE, 11–18

2008

[6] [6]

2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-AI. 2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence. Technical Report. DeepSeek-AI. https://huggingface.co/deepseek- ai/DeepSeek-V4-Pro/resolve/main/DeepSeek_V4.pdf

2026

[7] [7]

Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning database configuration parameters with ituned.Proc. VLDB Endow.2, 1 (2009), 1246–1257

2009

[8] [8]

Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-tune: Harnessing large language models for automated database system tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

2025

[9] [9]

Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Tieying Zhang, Jianjun Chen, Hong Chen, and Cuiping Li. 2024. E2etune: End- to-end knob tuning via fine-tuned generative language model.arXiv preprint arXiv:2404.11581(2024)

work page arXiv 2024

[10] [10]

Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model- based optimization for general algorithm configuration. InInternational confer- ence on learning and intelligent optimization. Springer, 507–523

2011

[11] [11]

Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman

[12] [12]

In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)

Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)

[13] [13]

Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: sample-efficient DBMS configu- ration tuning.arXiv preprint arXiv:2203.05128(2022)

work page arXiv 2022

[14] [14]

2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool

Alexey Kopytov. 2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool. MySQL AB. https://imysql.com/wp-content/uploads/2014/10/ sysbench-manual.pdf

2009

[15] [15]

Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2025. Gptuner: An llm-based database tuning system.ACM SIGMOD Record54, 1 (2025), 101–110

2025

[16] [16]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning.Proceedings of the VLDB Endowment12, 12 (2019), 2118–2130

2019

[17] [17]

Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proceedings of the ACM on Management of Data3, 6 (2025), 1–29

2025

[18] [18]

Moonshot AI. 2026. Kimi K2.6 API Guide. https://platform.kimi.ai/docs/guide/ki mi-k2-6-quickstart

2026

[19] [19]

OpenAI. 2026. Introducing GPT-5.5. https://openai.com/index/introducing-gpt- 5-5/

2026

[20] [20]

Oracle. 2020. Oracle Database 2 Day + Performance Tuning Guide, 19c. https: //docs.oracle.com/en/database/oracle/oracle-database/19/tdppt/

2020

[21] [21]

Oracle. 2022. Oracle Database Performance Tuning Guide, 19c. https://docs.ora cle.com/en/database/oracle/oracle-database/19/tgdba/

2022

[22] [22]

2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online

Oracle. 2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online. Oracle. https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool- resize.html

2026

[23] [23]

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)

2023

[24] [24]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023

[25] [25]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self- Driving Database Management Systems.. InCIDR, Vol. 4. 1

2017

[26] [26]

2026.PostgreSQL 18 Documentation: Monitoring Database Activity

PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Monitoring Database Activity. PostgreSQL Global Development Group. https: //www.postgresql.org/docs/18/monitoring.html

2026

[27] [27]

2026.PostgreSQL 18 Documentation: Resource Consumption

PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Resource Consumption. PostgreSQL Global Development Group. https://www. postgresql.org/docs/18/runtime-config-resource.html

2026

[28] [28]

Red Hat. 2020. Red Hat Enterprise Linux 6 Performance Tuning Guide: Tuning Virtual Memory. https://docs.redhat.com/en/documentation/red_hat_enterpris e_linux/6/html/performance_tuning_guide/s-memory-tunables

2020

[29] [29]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems36 (2023), 68539–68551

2023

[30] [30]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems36 (2023), 8634–8652

2023

[31] [31]

2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1

Transaction Processing Performance Council. 2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1. Transaction Processing Performance Council. https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC- H_v3.0.1.pdf

2022

[32] [32]

Reads the Manual

Immanuel Trummer. 2022. DB-BERT: a Database Tuning Tool that" Reads the Manual". InProceedings of the 2022 international conference on management of data. 190–203

2022

[33] [33]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM international conference on management of data. 1009–1024

2017

[34] [34]

Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, and Andrew Pavlo. 2021. An inquiry into machine learning- based automatic configuration tuning services on real-world database manage- ment systems.Proceedings of the VLDB Endowment14, 7 (2021), 1241–1253

2021

[35] [35]

Morreale

Rik van Riel and Peter W. Morreale. 2008. Documentation for /proc/sys/vm. https://docs.kernel.org/admin-guide/sysctl/vm.html

2008

[36] [36]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

2024

[38] [38]

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2025. The rise and potential of large language model based agents: A survey.Science China Information Sciences 68, 2 (2025), 121101

2025

[39] [39]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[40] [40]

Z.AI. 2026. GLM-5.1 Overview. https://docs.z.ai/guides/llm/glm-5.1

2026

[41] [41]

Yueyang Zhan, Rui Xi, Jianming Liao, Shuhuan Fan, and Mengshu Hou. 2024. KnobTune: A dynamic database configuration tuning strategy leveraging his- torical workload similarities. InProceedings of the International Conference on Computing, Machine Learning and Data Science. 1–8

2024

[42] [42]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 international conference on management of data. 415–432

2019

[43] [43]

Limeng Zhang and M Ali Babar. 2024. Automatic configuration tuning on cloud database: A survey.arXiv preprint arXiv:2404.06043(2024)

work page arXiv 2024

[44] [44]

Xinyi Zhang, Zhuo Chang, Yang Li, Hong Wu, Jian Tan, Feifei Li, and Bin Cui. 2021. Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation.arXiv preprint arXiv:2110.12654(2021)

work page arXiv 2021

[45] [45]

Xinyi Zhang, Zhuo Chang, Hong Wu, Yang Li, Jia Chen, Jian Tan, Feifei Li, and Bin Cui. 2023. A unified and efficient coordinating framework for autonomous DBMS tuning.Proceedings of the ACM on Management of Data1, 2 (2023), 1–26

2023

[46] [46]

Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta- learning for cloud databases. InProceedings of the 2021 international conference on management of data. 2102–2114. SIGMOD/PODS ’27, June, 2027, California, USA Yang et al

2021

[47] [47]

Xinyi Zhang, Hong Wu, Yang Li, Jian Tan, Feifei Li, and Bin Cui. 2022. Towards dynamic and safe configuration tuning for cloud databases. InProceedings of the 2022 International Conference on Management of Data. 631–645

2022

[48] [48]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47

2025

[49] [49]

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19632–19642

2024

[50] [50]

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731

2024

[51] [51]

Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the perfor- mance potential of systems via automatic configuration tuning. InProceedings of the 2017 symposium on cloud computing. 338–350

2017