On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Aaron Guan; Ada Zhou; Adrian Zhou; Alexy Li; Allen Lin; Andrew Chen; Andrew Lei; Anson Qiu; Anya Zhang; Arthur Fu

arxiv: 2606.02437 · v2 · pith:FGENLQI7new · submitted 2026-06-01 · 💻 cs.LG · cs.CL

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Mind Lab: Vin Bo , Song Cao , Vic Cao , Andrew Chen , Kaijie Chen , Cleon Cheng , Steven Chiang , Kaixuan Fan

show 57 more authors

Hera Feng Huan Feng Arthur Fu Jun Gao Hongquan Gu Aaron Guan Nolan Ho Mutian Hong Hailee Hou Peixuan Hua Charles Huang Miles Jiang Nora Jiang Yuyi Jiang Qiuyu Jin Fancy Kong Andrew Lei Kyrie Lei Alexy Li Lucian Li Ray Li Theo Li Wenhao Li Zhihui Li Allen Lin Jiayi Lin Kairus Liu Kieran Liu Logan Liu Xiang Liu Irvine Lu Maeve Luo Runze Lv Pony Ma Verity Niu Anson Qiu Vincent Wang Rio Yang Maxwell Yao Carrie Ye Regis Ye Wenlin Ye Josh Ying Danney Zeng Yuhan Zhan Anya Zhang Di Zhang Ruijia Zhang Shiyang Zhang Sueky Zhang Ya Zhang Wei Zhao Ada Zhou Adrian Zhou Yuhua Zhou Xinyue Zhu Murphy Zhuang

This is my paper

Pith reviewed 2026-06-28 15:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords PEFTparameter-efficient fine-tuningadapterspersonal modelsscalingfoundation modelspersistent stateMinT

0 comments

The pith

Small PEFT adapters can serve as persistent local state carrying instance-specific behavior on shared foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes parameter-efficient fine-tuning from a mere cost-saving method to a substrate for maintaining small, trainable adapters that hold user-specific preferences, skills, tool habits, and memory-like updates atop powerful shared models. It structures the investigation along three axes: Scale Up, in which stronger shared priors amplify the value of tiny local changes; Scale Down, which tests how compact adapters can remain while staying reliable; and Scale Out, which examines the management of many coexisting persistent instances. An infrastructure system called MinT is presented as one concrete way to handle adapter identity, revision, provenance, evaluation, and serving. If the framing holds, PEFT shifts from a temporary workaround to the practical mechanism for building millions of individualized models without retraining entire foundation models from scratch.

Core claim

The central claim is that small trainable adapters function as persistent local state on top of strong shared foundation models, with the base model supplying shared competence and the adapters supplying instance-specific behavior such as preferences, skills, tool habits, and memory-like updates; the problem is organized around the three scaling axes of Scale Up, Scale Down, and Scale Out, and MinT supplies one infrastructure example for managing adapter identity, revision, provenance, evaluation, and serving residency, leading to the conclusion that PEFT can act as a compact substrate for persistent personal models rather than only a budget substitute for full fine-tuning.

What carries the argument

small trainable adapters as persistent local state on top of strong shared foundation models

If this is right

Stronger shared priors increase the usefulness of small local updates.
Adapters can be reduced in size while still carrying reliable instance-specific behavior.
Many persistent adapted instances can be managed and served simultaneously.
PEFT moves from a temporary budget option to a standing substrate for personal models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If adapters prove stable, deployment architectures could shift toward serving one shared model plus per-user adapters rather than per-user full copies.
The three scaling axes suggest research questions on the minimal adapter size that still supports long-term memory-like updates without drift.
Managing provenance and evaluation at million-instance scale would require new tooling for version control and safety checks on adapters.
The approach raises questions about how to handle conflicting updates across many adapters without affecting the shared base.

Load-bearing premise

Small trainable adapters can reliably carry instance-specific behavior such as preferences, skills, tool habits, and memory-like updates while remaining stable on top of shared foundation models.

What would settle it

A longitudinal test in which adapters of varying sizes are updated with user-specific data and then evaluated on held-out tasks after extended periods of non-use or continued shared-model updates, checking whether the instance-specific behavior is retained or lost.

read the original abstract

Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state on top of strong shared foundation models. In this framing, the base model provides shared competence while adapters carry instance-specific behavior such as preferences, skills, tool habits, and memory-like updates. We organize the problem around three scaling axes: Scale Up, where stronger shared priors make small local updates more useful; Scale Down, where we study how small adapters can be while remaining reliable; and Scale Out, where many persistent adapted instances coexist. MinT provides one infrastructure example for managing adapter identity, revision, provenance, evaluation, and serving residency. Together, the results suggest that PEFT can be a compact substrate for persistent personal models rather than only a budget substitute for full fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual reframing of PEFT as persistent personal state organized around three scaling axes, but it supplies no experiments, data, or derivations to back the central claims.

read the letter

The paper's main move is to treat small adapters not just as cheap fine-tuning but as carriers of instance-specific state like preferences, skills, and memory-like updates on a shared base model. It organizes the idea around Scale Up (stronger bases make small adapters more effective), Scale Down (how small can they get while staying reliable), and Scale Out (managing many such instances), with MinT as an example infrastructure for identity and serving. That framing is a clean way to think about personalization at scale and extends existing PEFT work without new math or methods.

What it does well is lay out the problem in those three directions and point to the infrastructure needs. The abstract is clear about the shift from budget substitute to compact substrate.

The soft spot is exactly where the stress-test note flags: the load-bearing assumption that adapters can reliably hold and retain complex, long-term instance-specific behavior while staying stable on the base model. The abstract mentions results supporting the suggestion, yet no experiments, error bars, sequential update tests, or stability measurements appear. Without that evidence the persistence claim stays a suggestion rather than a demonstrated outcome. The circularity burden is zero because there are no equations or fitted parameters to check.

This is for readers already working on LLM personalization who want a high-level map of the scaling questions. It does not yet show the kind of empirical grounding or reproducible result that would make it ready for a serious referee. I would not bring it to reading group or cite it as is.

Referee Report

2 major / 0 minor

Summary. The paper claims that parameter-efficient fine-tuning (PEFT) can serve as a compact substrate for persistent personal models by using small trainable adapters as local state on shared foundation models, where the base model supplies shared competence and adapters encode instance-specific behaviors such as preferences, skills, tool habits, and memory-like updates. It organizes the discussion around three scaling axes (Scale Up with stronger priors, Scale Down to minimal reliable adapter sizes, and Scale Out to many coexisting instances) and introduces MinT as an infrastructure example for managing adapter identity, revision, provenance, evaluation, and serving. The abstract concludes that the results suggest PEFT enables persistent personal models rather than functioning only as a budget substitute for full fine-tuning.

Significance. If the central suggestion were supported by evidence, the work could meaningfully reframe PEFT research toward scalable personalization, enabling efficient maintenance of millions of instance-specific models without duplicating full foundation models. The three-axis scaling organization offers a useful conceptual structure for future studies. However, the manuscript supplies no empirical results, derivations, or technical details, so any significance is currently prospective rather than realized. The introduction of MinT as a management system is noted as a potential concrete element but remains unelaborated.

major comments (2)

[Abstract] Abstract: the statement 'Together, the results suggest that PEFT can be a compact substrate for persistent personal models' is unsupported because the manuscript contains no experiments, data, error analysis, or derivations demonstrating adapter stability or retention under sequential updates; this is load-bearing for the central claim that adapters can carry memory-like instance-specific state persistently.
[the scaling results] The scaling results: the abstract refers to results supporting the persistence framing, yet no sections detail experiments on adapter reliability for complex behaviors (preferences, skills, tool habits, memory-like updates) or base-model stability across revisions, leaving the weakest assumption unaddressed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the abstract's reference to 'results' is imprecise for a conceptual manuscript without new experiments on adapter persistence or sequential updates. We will revise the abstract and framing to clarify that the work proposes a scaling organization and infrastructure example, with the persistence claim presented as a direction suggested by the framework rather than empirically demonstrated here.

read point-by-point responses

Referee: [Abstract] Abstract: the statement 'Together, the results suggest that PEFT can be a compact substrate for persistent personal models' is unsupported because the manuscript contains no experiments, data, error analysis, or derivations demonstrating adapter stability or retention under sequential updates; this is load-bearing for the central claim that adapters can carry memory-like instance-specific state persistently.

Authors: The comment is correct: the manuscript presents no new experiments, derivations, or analyses of adapter stability under sequential updates. The referenced 'results' are the conceptual synthesis across the three scaling axes and the MinT example. We will revise the abstract to remove the implication of empirical validation and instead state that the proposed framing and axes suggest this potential for future investigation. revision: yes
Referee: [the scaling results] The scaling results: the abstract refers to results supporting the persistence framing, yet no sections detail experiments on adapter reliability for complex behaviors (preferences, skills, tool habits, memory-like updates) or base-model stability across revisions, leaving the weakest assumption unaddressed.

Authors: We agree there are no such experiments or technical details on reliability for the listed behaviors or cross-revision stability. The manuscript's contribution is the three-axis organization and MinT as an infrastructure sketch; the persistence aspects are identified as open questions within the Scale Down and Scale Out axes. In revision we will explicitly label these as directions for empirical work rather than supported outcomes. revision: yes

Circularity Check

0 steps flagged

High-level conceptual proposal with no derivations or fitted predictions

full rationale

The paper is a conceptual framing of PEFT as a substrate for persistent personal models, organized around Scale Up/Down/Out axes and referencing MinT as an infrastructure example. No equations, parameter fittings, derivations, or mathematical claims appear in the provided text. The central suggestion that 'PEFT can be a compact substrate for persistent personal models' is presented as an organizing perspective rather than a result derived from inputs. Per the reader's assessment, there are no load-bearing steps that reduce to self-definition, fitted inputs called predictions, or self-citation chains. This is a normal non-finding for a high-level proposal paper that does not attempt quantitative derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that small adapters can encode and maintain instance-specific behaviors, plus the introduction of MinT as an unvalidated infrastructure layer.

axioms (1)

domain assumption Small trainable adapters can reliably carry instance-specific behavior such as preferences, skills, tool habits, and memory-like updates.
This premise is required for the persistent personal model framing to hold.

invented entities (1)

MinT no independent evidence
purpose: Infrastructure for managing adapter identity, revision, provenance, evaluation, and serving residency.
Presented as an example solution without implementation details or validation.

pith-pipeline@v0.9.1-grok · 5898 in / 1191 out tokens · 31422 ms · 2026-06-28T15:30:13.263529+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 40 canonical work pages · 23 internal anchors

[1]

Understanding LoRA as Knowledge Memory: An Empirical Analysis

doi: 10.1038/nature11632. URL https://doi.org/10.1038/nature11632. Anthropic. Claude 4.7 model card, 2025a. URLhttps://www.anthropic.com/claude/claude-4. Anthropic. Claude code: Agentic coding at the command line. Anthropic product, 2025b. URLhttps://www. anthropic.com/claude-code. Seungju Back, Dongwoo Lee, Naun Kang, Taehee Lee, SK Hong, Youngjune Gwon,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/nature11632
[2]

40 Kerim Büyükakyüz

URLhttps://arxiv.org/abs/2405.09673. 40 Kerim Büyükakyüz. Olora: Orthonormal low-rank adaptation of large language models,

work page arXiv
[3]

org/abs/2406.01775

URLhttps://arxiv. org/abs/2406.01775. Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, and Arvind Krishnamurthy. Punica: Multi-tenant LoRA serving,

work page arXiv
[4]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav

URLhttps://arxiv.org/abs/2310.18547. Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory,

work page arXiv
[5]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

URLhttps://arxiv.org/abs/2504.19413. DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

URL https://arxiv.org/abs/2501.12948. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome.Nature, 489: 57–74,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

GLM-5 Team

doi: 10.1038/nature11247. GLM-5 Team. GLM-5: From vibe coding to agentic engineering,

work page doi:10.1038/nature11247
[8]

GLM-5: from Vibe Coding to Agentic Engineering

URLhttps://arxiv.org/abs/2602.15763. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, et al. Measuring mathematical problem solving with the MATH dataset,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Measuring Mathematical Problem Solving With the MATH Dataset

URLhttps://arxiv.org/abs/2103.03874. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

LoRA: Low-Rank Adaptation of Large Language Models

URLhttps://arxiv.org/abs/2106.09685. Jian Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, and Heung-Yeung Shum. Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model, 2025a. URLhttps://arxiv.org/abs/ 2503.24290. Yuanzhe Hu, Yu Wang, and Julian McAuley. Evaluating memory in llm agents via incremental multi...

work page internal anchor Pith review Pith/arXiv arXiv
[11]

URLhttps://arxiv.org/abs/2601.20802. Peak Ji. Context engineering for AI agents: Lessons from building manus. Blog post,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

URLhttps://arxiv.org/abs/2310.06770. Damjan Kalajdzievski. A rank stabilization scaling factor for fine-tuning with LoRA,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

URLhttps://arxiv.org/ abs/2312.03732. Kimi Team. Kimi K2: Open agentic intelligence,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Kimi K2: Open Agentic Intelligence

URLhttps://arxiv.org/abs/2507.20534. Fanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, and Xue Feng. Enhancing LLM-based social bot via an adversarial learning framework. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23235–23260, Suzhou, China, November

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

doi: 10.18653/v1/2025.emnlp-main.1185

Association for Computational Linguistics. doi: 10.18653/v1/2025.emnlp-main.1185. URLhttps://aclanthology.org/2025.emnlp-main.1185/. Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, and Soujanya Poria.δ-mem: Efficient online memory for large language models,

work page doi:10.18653/v1/2025.emnlp-main.1185 2025
[16]

$\delta$-mem: Efficient Online Memory for Large Language Models

URLhttps://arxiv.org/abs/ 2605.12357. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

URLhttps://arxiv.org/abs/2005.11401. Lucian Li, Qihan Liu, Song Cao, Ruijian Ye, Andrew Chen, Pony Ma, and Mind Lab. Mindclaw: Fine- tuning openclaw for personalized long-term memory. Mind Lab: A Lab for Experiential Intelligence,

work page internal anchor Pith review Pith/arXiv arXiv 2005
[18]

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

URL https://arxiv.org/abs/2401.05459. Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SKILL0: In-context agentic reinforcement learning for skill internalization,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

URLhttps://arxiv.org/abs/2604.02268. 41 Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Evaluating Very Long-Term Conversational Memory of LLM Agents

URLhttps://arxiv.org/abs/2402.17753. Mathematical Association of America. 2024 american invitational mathematics examination,

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

AIME 2024 problem set

URLhttps: //artofproblemsolving.com/wiki/index.php/2024_AIME_I_Problems. AIME 2024 problem set. Fanxu Meng, Zhaohui Wang, and Muhan Zhang. PiSSA: Principal singular values and singular vectors adaptation of large language models,

2024
[22]

Mind Lab

URLhttps://arxiv.org/abs/2404.02948. Mind Lab. MinT: Managed infrastructure for training and serving millions of LLMs,

work page arXiv
[23]

URLhttps://arxiv.org/ abs/2605.13779. OpenAI. GPT-4.5 system card,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

MemGPT: Towards LLMs as Operating Systems

URLhttps://arxiv.org/abs/2310.08560. Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Bernstein

doi: 10.1145/3586183.3606763. URLhttps://arxiv.org/abs/2304. 03442. Qwen Team. Qwen3 technical report,

work page doi:10.1145/3586183.3606763
[26]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

URLhttps://arxiv.org/abs/1910.01108. Idan Shenfeld, Mehul Damani, Jonas Hübotter, and Pulkit Agrawal. Self-distillation enables continual learning,

work page internal anchor Pith review Pith/arXiv arXiv 1910
[27]

Self-Distillation Enables Continual Learning

URLhttps://arxiv.org/abs/2601.19897. Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Chris Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, and Ion Stoica. S-LoRA: Serving thousands of concurrent LoRA adapters,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao

URLhttps://arxiv.org/abs/2311.03285. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning,

work page arXiv
[29]

Reflexion: Language Agents with Verbal Reinforcement Learning

URLhttps://arxiv.org/abs/2303.11366. Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma. LoRA vs full fine-tuning: An illusion of equivalence,

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Lora vs full fine-tuning: An illusion of equivalence

URLhttps://arxiv.org/abs/2410.21228. David Silver and Richard S. Sutton. Welcome to the era of experience. Essay,

work page arXiv
[31]

Xingyao Wang, Boxuan Chen, Hao Tang, et al

URLhttps://arxiv.org/abs/2406.09044. Xingyao Wang, Boxuan Chen, Hao Tang, et al. OpenHands: An open platform for AI software developers as generalist agents,

work page arXiv
[32]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

URLhttps://arxiv.org/abs/2407.16741. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine Learning, 8(3–4):229–256,

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Williams

doi: 10.1007/BF00992696. URLhttps://doi.org/10.1007/BF00992696. Haotian Xia et al. SkillRL: Evolving agents via recursive skill-augmented reinforcement learning,

work page doi:10.1007/bf00992696
[34]

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

URLhttps: //arxiv.org/abs/2602.08234. An Yang et al. Qwen3 technical report,

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Qwen3 Technical Report

URLhttps://arxiv.org/abs/2505.09388. Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, et al. Tensor programs V: Tuning large neural networks via zero-shot hyperparameter transfer,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer.arXiv preprint arXiv:2203.03466,

URLhttps://arxiv.org/abs/2203.03466. Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. InConference on Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv
[37]

42 Shunyu Yao

URL https://arxiv.org/abs/2411.11581. 42 Shunyu Yao. The second half. Blog post,

work page arXiv
[38]

Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, and Laixi Shi

URLhttps://arxiv.org/abs/2512.23165. Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, and Laixi Shi. Geometry-preserving orthonormal initialization for low-rank adaptation in reinforcement learning. InProceedings of the 43rd International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR,

work page arXiv
[39]

Chujie Zheng et al

URLhttps://papers.nips.cc/paper/2015/ hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html. Chujie Zheng et al. Stabilizing reinforcement learning with LLMs: Formulation and practices,

2015
[40]

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang

URLhttps: //arxiv.org/abs/2512.01374. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence,

work page arXiv
[41]

Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, and Hongguang Li

URL https://ojs.aaai.org/index.php/AAAI/article/view/29946. Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, and Hongguang Li. RankAdaptor: Hierarchical rank allocation for efficient fine-tuning pruned LLMs via performance model. InFindings of the Asso- ciation for Computational Linguistics: NAACL 2025, pages 5796–5810. Associatio...

work page doi:10.1609/aaai.v39i21.34453 2025
[42]

URLhttps://arxiv.org/abs/2511.08567. 43

work page arXiv

[1] [1]

Understanding LoRA as Knowledge Memory: An Empirical Analysis

doi: 10.1038/nature11632. URL https://doi.org/10.1038/nature11632. Anthropic. Claude 4.7 model card, 2025a. URLhttps://www.anthropic.com/claude/claude-4. Anthropic. Claude code: Agentic coding at the command line. Anthropic product, 2025b. URLhttps://www. anthropic.com/claude-code. Seungju Back, Dongwoo Lee, Naun Kang, Taehee Lee, SK Hong, Youngjune Gwon,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/nature11632

[2] [2]

40 Kerim Büyükakyüz

URLhttps://arxiv.org/abs/2405.09673. 40 Kerim Büyükakyüz. Olora: Orthonormal low-rank adaptation of large language models,

work page arXiv

[3] [3]

org/abs/2406.01775

URLhttps://arxiv. org/abs/2406.01775. Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, and Arvind Krishnamurthy. Punica: Multi-tenant LoRA serving,

work page arXiv

[4] [4]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav

URLhttps://arxiv.org/abs/2310.18547. Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory,

work page arXiv

[5] [5]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

URLhttps://arxiv.org/abs/2504.19413. DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

URL https://arxiv.org/abs/2501.12948. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome.Nature, 489: 57–74,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

GLM-5 Team

doi: 10.1038/nature11247. GLM-5 Team. GLM-5: From vibe coding to agentic engineering,

work page doi:10.1038/nature11247

[8] [8]

GLM-5: from Vibe Coding to Agentic Engineering

URLhttps://arxiv.org/abs/2602.15763. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, et al. Measuring mathematical problem solving with the MATH dataset,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Measuring Mathematical Problem Solving With the MATH Dataset

URLhttps://arxiv.org/abs/2103.03874. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

LoRA: Low-Rank Adaptation of Large Language Models

URLhttps://arxiv.org/abs/2106.09685. Jian Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, and Heung-Yeung Shum. Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model, 2025a. URLhttps://arxiv.org/abs/ 2503.24290. Yuanzhe Hu, Yu Wang, and Julian McAuley. Evaluating memory in llm agents via incremental multi...

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

URLhttps://arxiv.org/abs/2601.20802. Peak Ji. Context engineering for AI agents: Lessons from building manus. Blog post,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

URLhttps://arxiv.org/abs/2310.06770. Damjan Kalajdzievski. A rank stabilization scaling factor for fine-tuning with LoRA,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

URLhttps://arxiv.org/ abs/2312.03732. Kimi Team. Kimi K2: Open agentic intelligence,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Kimi K2: Open Agentic Intelligence

URLhttps://arxiv.org/abs/2507.20534. Fanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, and Xue Feng. Enhancing LLM-based social bot via an adversarial learning framework. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23235–23260, Suzhou, China, November

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

doi: 10.18653/v1/2025.emnlp-main.1185

Association for Computational Linguistics. doi: 10.18653/v1/2025.emnlp-main.1185. URLhttps://aclanthology.org/2025.emnlp-main.1185/. Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, and Soujanya Poria.δ-mem: Efficient online memory for large language models,

work page doi:10.18653/v1/2025.emnlp-main.1185 2025

[16] [16]

$\delta$-mem: Efficient Online Memory for Large Language Models

URLhttps://arxiv.org/abs/ 2605.12357. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

URLhttps://arxiv.org/abs/2005.11401. Lucian Li, Qihan Liu, Song Cao, Ruijian Ye, Andrew Chen, Pony Ma, and Mind Lab. Mindclaw: Fine- tuning openclaw for personalized long-term memory. Mind Lab: A Lab for Experiential Intelligence,

work page internal anchor Pith review Pith/arXiv arXiv 2005

[18] [18]

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

URL https://arxiv.org/abs/2401.05459. Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. SKILL0: In-context agentic reinforcement learning for skill internalization,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

URLhttps://arxiv.org/abs/2604.02268. 41 Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Evaluating Very Long-Term Conversational Memory of LLM Agents

URLhttps://arxiv.org/abs/2402.17753. Mathematical Association of America. 2024 american invitational mathematics examination,

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

AIME 2024 problem set

URLhttps: //artofproblemsolving.com/wiki/index.php/2024_AIME_I_Problems. AIME 2024 problem set. Fanxu Meng, Zhaohui Wang, and Muhan Zhang. PiSSA: Principal singular values and singular vectors adaptation of large language models,

2024

[22] [22]

Mind Lab

URLhttps://arxiv.org/abs/2404.02948. Mind Lab. MinT: Managed infrastructure for training and serving millions of LLMs,

work page arXiv

[23] [23]

URLhttps://arxiv.org/ abs/2605.13779. OpenAI. GPT-4.5 system card,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

MemGPT: Towards LLMs as Operating Systems

URLhttps://arxiv.org/abs/2310.08560. Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology,

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Bernstein

doi: 10.1145/3586183.3606763. URLhttps://arxiv.org/abs/2304. 03442. Qwen Team. Qwen3 technical report,

work page doi:10.1145/3586183.3606763

[26] [26]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

URLhttps://arxiv.org/abs/1910.01108. Idan Shenfeld, Mehul Damani, Jonas Hübotter, and Pulkit Agrawal. Self-distillation enables continual learning,

work page internal anchor Pith review Pith/arXiv arXiv 1910

[27] [27]

Self-Distillation Enables Continual Learning

URLhttps://arxiv.org/abs/2601.19897. Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Chris Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, and Ion Stoica. S-LoRA: Serving thousands of concurrent LoRA adapters,

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao

URLhttps://arxiv.org/abs/2311.03285. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning,

work page arXiv

[29] [29]

Reflexion: Language Agents with Verbal Reinforcement Learning

URLhttps://arxiv.org/abs/2303.11366. Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma. LoRA vs full fine-tuning: An illusion of equivalence,

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Lora vs full fine-tuning: An illusion of equivalence

URLhttps://arxiv.org/abs/2410.21228. David Silver and Richard S. Sutton. Welcome to the era of experience. Essay,

work page arXiv

[31] [31]

Xingyao Wang, Boxuan Chen, Hao Tang, et al

URLhttps://arxiv.org/abs/2406.09044. Xingyao Wang, Boxuan Chen, Hao Tang, et al. OpenHands: An open platform for AI software developers as generalist agents,

work page arXiv

[32] [32]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

URLhttps://arxiv.org/abs/2407.16741. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine Learning, 8(3–4):229–256,

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Williams

doi: 10.1007/BF00992696. URLhttps://doi.org/10.1007/BF00992696. Haotian Xia et al. SkillRL: Evolving agents via recursive skill-augmented reinforcement learning,

work page doi:10.1007/bf00992696

[34] [34]

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

URLhttps: //arxiv.org/abs/2602.08234. An Yang et al. Qwen3 technical report,

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

Qwen3 Technical Report

URLhttps://arxiv.org/abs/2505.09388. Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, et al. Tensor programs V: Tuning large neural networks via zero-shot hyperparameter transfer,

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer.arXiv preprint arXiv:2203.03466,

URLhttps://arxiv.org/abs/2203.03466. Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. InConference on Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv

[37] [37]

42 Shunyu Yao

URL https://arxiv.org/abs/2411.11581. 42 Shunyu Yao. The second half. Blog post,

work page arXiv

[38] [38]

Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, and Laixi Shi

URLhttps://arxiv.org/abs/2512.23165. Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, and Laixi Shi. Geometry-preserving orthonormal initialization for low-rank adaptation in reinforcement learning. InProceedings of the 43rd International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR,

work page arXiv

[39] [39]

Chujie Zheng et al

URLhttps://papers.nips.cc/paper/2015/ hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html. Chujie Zheng et al. Stabilizing reinforcement learning with LLMs: Formulation and practices,

2015

[40] [40]

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang

URLhttps: //arxiv.org/abs/2512.01374. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence,

work page arXiv

[41] [41]

Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, and Hongguang Li

URL https://ojs.aaai.org/index.php/AAAI/article/view/29946. Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, and Hongguang Li. RankAdaptor: Hierarchical rank allocation for efficient fine-tuning pruned LLMs via performance model. InFindings of the Asso- ciation for Computational Linguistics: NAACL 2025, pages 5796–5810. Associatio...

work page doi:10.1609/aaai.v39i21.34453 2025

[42] [42]

URLhttps://arxiv.org/abs/2511.08567. 43

work page arXiv