TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
Pith reviewed 2026-05-21 04:57 UTC · model grok-4.3
The pith
TextReg mitigates prompt distributional overfitting by regularizing text-space optimization to control capacity cost and scope narrowness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors argue that prompt distributional overfitting reflects a lack of representation control in discrete text-space optimization and formalize this through representational inefficiency, a dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness. They attribute the failure mode to the coupled growth of these factors during iterative rewriting and propose TextReg as a regularization framework that realizes a soft-penalty objective through regularized textual gradients, combining Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update. Across reasoning benchmarks this yields substantial gains in out
What carries the argument
Representational inefficiency, the dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness, which the regularization framework targets to prevent coupled growth.
Load-bearing premise
Prompt distributional overfitting is caused by the coupled growth of capacity cost and scope narrowness, and the proposed regularization components can control this growth without introducing new biases or harming in-distribution performance.
What would settle it
Running the optimization on the same reasoning benchmarks and checking whether TextReg prompts remain shorter and achieve the reported OOD accuracy gains without drops in in-distribution accuracy compared to TextGrad and REVOLVE.
Figures
read the original abstract
Large language models (LLMs) are highly sensitive to the prompts used to specify task objectives and behavioral constraints. Many recent prompt optimization methods iteratively rewrite prompts using LLM-generated feedback, but the resulting prompts often become longer, accumulate narrow sample-specific rules, and generalize poorly beyond the training distribution. We study this failure mode as prompt distributional overfitting and argue that it reflects a lack of representation control in discrete text-space optimization. We formalize this view through representational inefficiency, a dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness, attributing distributional prompt overfitting to their coupled growth during optimization. We propose TextReg, a regularization framework that realizes a soft-penalty objective through regularized textual gradients, combining Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update. Across multiple reasoning benchmarks, TextReg substantially improves out-of-distribution (OOD) generalization, with accuracy gains of up to +11.8% over TextGrad and +16.5% over REVOLVE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TextReg, a regularization framework for prompt optimization in LLMs to mitigate prompt distributional overfitting. It defines representational inefficiency as the coupled growth of capacity cost and scope narrowness during iterative text-space optimization, introduces three components (Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update) to realize a soft-penalty objective, and reports OOD accuracy gains of up to +11.8% over TextGrad and +16.5% over REVOLVE on reasoning benchmarks.
Significance. If the empirical results hold under proper controls and the mechanism is directly validated, the work could meaningfully advance prompt optimization methods by offering a principled regularization approach in discrete text space. The dual-factor decomposition of inefficiency provides a potentially useful analytical lens for prompt evolution, though its practical impact depends on whether the gains are shown to stem from the claimed control rather than incidental effects.
major comments (2)
- [Experiments] Experiments section: The paper reports only final OOD accuracies without intermediate measurements of capacity cost and scope narrowness on the evolving prompts, without ablations isolating each regularizer's effect on these factors, and without confirming that in-distribution performance remains stable. This leaves the central causal claim—that the three components control distributional overfitting via representational inefficiency—unsupported by direct evidence.
- [Abstract and §3 (Method)] Method and abstract: No information is supplied on experimental controls, statistical tests, dataset details, or whether regularization hyperparameters were tuned on the same data used for final OOD reporting. This undermines the reliability of the claimed gains of +11.8% and +16.5%.
minor comments (1)
- [§3] The notation for the soft-penalty objective and the three regularization terms could be clarified with explicit equations showing how they combine, to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important areas for strengthening the empirical support and methodological transparency in our work on TextReg. We address each major comment below and commit to revisions that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The paper reports only final OOD accuracies without intermediate measurements of capacity cost and scope narrowness on the evolving prompts, without ablations isolating each regularizer's effect on these factors, and without confirming that in-distribution performance remains stable. This leaves the central causal claim—that the three components control distributional overfitting via representational inefficiency—unsupported by direct evidence.
Authors: We agree that the current presentation focuses on final OOD accuracies and does not include the requested intermediate analyses. In the revised manuscript we will add plots tracking capacity cost and scope narrowness over optimization iterations for both TextReg and baselines. We will also include component-wise ablations measuring the isolated impact of Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update on these two factors. In-distribution accuracy will be reported alongside OOD results to confirm that gains do not come at the expense of in-distribution performance. These additions will provide direct evidence linking the proposed regularizers to the control of representational inefficiency. revision: yes
-
Referee: [Abstract and §3 (Method)] Method and abstract: No information is supplied on experimental controls, statistical tests, dataset details, or whether regularization hyperparameters were tuned on the same data used for final OOD reporting. This undermines the reliability of the claimed gains of +11.8% and +16.5%.
Authors: We acknowledge the absence of these details in the submitted version. The revised manuscript will expand the experimental protocol section to specify: (i) use of a held-out validation split for hyperparameter selection that is disjoint from both in-distribution training and OOD test sets; (ii) statistical reporting with means and standard deviations across at least five random seeds together with appropriate significance tests; (iii) full dataset descriptions including sizes, sources, and OOD construction procedures; and (iv) explicit confirmation that regularization hyperparameters were never tuned on the final OOD evaluation data. These clarifications will substantiate the reliability of the reported improvements. revision: yes
Circularity Check
No significant circularity; derivation is self-contained via new definitions and empirical claims
full rationale
The paper introduces representational inefficiency as a dual-factor decomposition (capacity cost and scope narrowness) to formalize prompt distributional overfitting, then proposes three regularization components (Dual-Evidence Gradient Purification, Semantic Edit Regularization, Regularization-Guided Prompt Update) to realize a soft-penalty objective. These steps are presented as additive innovations rather than reductions of prior fitted quantities or self-citations. The central claims rest on reported OOD accuracy gains across benchmarks, without any equations or mechanisms shown to be equivalent to inputs by construction. No load-bearing self-citation chains, fitted-input predictions, or ansatz smuggling are evident in the provided derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs are highly sensitive to the prompts used to specify task objectives and behavioral constraints
invented entities (1)
-
representational inefficiency
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define the representational inefficiency of a prompt as I(p)=|p|_tok · (1−s̄(p)) … multiplicative form emphasizes a mutually amplifying interaction between the two
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
min_p LDtrain(p) + λ I(p)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Mascot: Towards multi-agent socio- collaborative companion systems.arXiv:2601.14230, 2026
Yiyang Wang, Yiqiao Jin, Alex Cabral, and Josiah Hester. Mascot: Towards multi-agent socio- collaborative companion systems.arXiv:2601.14230, 2026
-
[6]
Yiyang Wang, Chen Chen, Tica Lin, Vishnu Raj, Josh Kimball, Alex Cabral, and Josiah Hester. Companioncast: A multi-agent conversational ai framework with spatial audio for social co-viewing experiences.ACM CHI 2026 Workshop on Human-Agent Collaboration, 2026
work page 2026
-
[7]
Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, and Michael Zeng. Automatic prompt optimization with “gradient descent” and beam search. InEMNLP, pages 7957–7968, 2023
work page 2023
-
[8]
Xingchen Wan, Ruoxi Sun, Hootan Nakhost, and Sercan Ö Arık. Teach better or show smarter? on instructions and exemplars in automatic prompt optimization.NeurIPS, 37:58174–58244, 2024
work page 2024
-
[9]
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou. Textgrad: Automatic" differentiation" via text.arXiv preprint arXiv:2406.07496, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Prosa: Assessing and understanding the prompt sensitivity of llms
Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, and Kai Chen. Prosa: Assessing and understanding the prompt sensitivity of llms. InFindings of the Association for Compu- tational Linguistics: EMNLP 2024, pages 1950–1976, 2024. 11
work page 2024
-
[11]
Self-regulating prompts: Foundational model adaptation without forgetting
Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Self-regulating prompts: Foundational model adaptation without forgetting. InICCV, pages 15190–15200, 2023
work page 2023
-
[12]
Mosh Levy, Alon Jacoby, and Yoav Goldberg. Same task, more tokens: the impact of input length on the reasoning performance of large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15339–15353, 2024
work page 2024
-
[13]
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics, 12:157–173, 2024
work page 2024
-
[14]
Sara: Selective and adaptive retrieval-augmented generation with context compression
Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, and Srijan Kumar. Sara: Selective and adaptive retrieval-augmented generation with context compression. InACL, 2026
work page 2026
-
[15]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022
work page 2022
-
[16]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[17]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. Program of thoughts prompting: Dis- entangling computation from reasoning for numerical reasoning tasks.arXiv preprint arXiv:2211.12588, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural informa- tion processing systems, 36:11809–11822, 2023
work page 2023
-
[20]
Autoprompt: Eliciting knowledge from language models with automatically generated prompts
Taylor Shin, Yasaman Razeghi, Robert L Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 4222–4235, 2020
work page 2020
-
[21]
Rlprompt: Optimizing discrete text prompts with reinforcement learning
Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric Xing, and Zhiting Hu. Rlprompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3369–3391, 2022
work page 2022
-
[22]
Large language models are human-level prompt engineers
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. InThe eleventh international conference on learning representations, 2022
work page 2022
-
[23]
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rock- täschel. Promptbreeder: Self-referential self-improvement via prompt evolution.arXiv preprint arXiv:2309.16797, 2023. 12
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vard- hamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al. Dspy: Compiling declarative language model calls into self-improving pipelines.arXiv preprint arXiv:2310.03714, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [26]
-
[27]
Yaoning Yu, Ye Yu, Peiyan Zhang, Kai Wei, Haojing Luo, and Haohan Wang. Sipdo: Closed-loop prompt optimization via synthetic data feedback.arXiv preprint arXiv:2505.19514, 2025
-
[28]
Robust prompt optimization for large language models against distribution shifts
Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, and Tat-Seng Chua. Robust prompt optimization for large language models against distribution shifts. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1539–1554, 2023
work page 2023
-
[29]
Guancheng Wan, Lucheng Fu, Haoxin Liu, Yiqiao Jin, Hui Yi Leong, Eric Hanchen Jiang, Hejia Geng, Jinhe Bi, Yunpu Ma, Xiangru Tang, et al. Beyond magic words: Sharpness-aware prompt evolving for robust large language models with tare.arXiv preprint arXiv:2509.24130, 2025
-
[30]
Dengyun Peng, Yuhang Zhou, Qiguang Chen, Jinhao Liu, Jingjing Chen, Libo Qin, and Wanxiang Che. Dlpo: Towards a robust, efficient, and generalizable prompt optimization framework from a deep-learning perspective.arXiv preprint arXiv:2503.13413, 2025
-
[31]
Chunlong Wu and Zhibo Qu. Reflection-enhanced meta-optimization integrating textgrad-style prompt optimization with memory-driven self-evolution.arXiv preprint arXiv:2508.18749, 2025
-
[32]
Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970
Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970
work page 1970
-
[33]
Anders Krogh and John Hertz. A simple weight decay can improve generalization.Advances in neural information processing systems, 4, 1991
work page 1991
-
[34]
Regularization of neural networks using dropconnect
Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neural networks using dropconnect. InInternational conference on machine learning, pages 1058–1066. PMLR, 2013
work page 2013
-
[35]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014
work page 1929
-
[36]
Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996
work page 1996
-
[37]
Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005
work page 2005
-
[38]
Rich Caruana, Steve Lawrence, and C Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping.Advances in neural information processing systems, 13, 2000
work page 2000
-
[39]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021
work page 2021
-
[40]
The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 3045–3059, 2021. 13
work page 2021
-
[41]
P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, 2022
work page 2022
-
[42]
Challenging big-bench tasks and whether chain-of-thought can solve them
Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc Le, Ed Chi, Denny Zhou, et al. Challenging big-bench tasks and whether chain-of-thought can solve them. InFindings of the Association for Computational Linguistics: ACL 2023, pages 13003–13051, 2023
work page 2023
-
[43]
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.Transactions on machine learning research, 2023
work page 2023
-
[44]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
Arkil Patel, Satwik Bhattamishra, and Navin Goyal. Are nlp models really able to solve simple math word problems? InProceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 2080–2094, 2021
work page 2021
-
[46]
Solving general arithmetic word problems
Subhro Roy and Dan Roth. Solving general arithmetic word problems. InProceedings of the 2015 conference on empirical methods in natural language processing, pages 1743–1752, 2015
work page 2015
-
[47]
Mawps: A math word problem repository
Rik Koncel-Kedziorski, Subhro Roy, Aida Amini, Nate Kushman, and Hannaneh Hajishirzi. Mawps: A math word problem repository. InProceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies, pages 1152–1157, 2016
work page 2016
-
[48]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.eprint arXiv: 2407.10671, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone, 2024.arXiv:2404.14219, 2:6, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv e-prints, pages arXiv–2412, 2024
work page 2024
- [52]
-
[53]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199– 22213, 2022. 14 A Ethical Considerations and Broader Impact This work proposes TextReg, a regularization framework for prompt optimization, and is methodological rathe...
work page 2022
-
[54]
Remove references to specific entities, exact numbers, or particular examples
Extract mid-level canonical {rule_scope} rules from the raw gradient. Remove references to specific entities, exact numbers, or particular examples. Preserve structural {rule_patterns}. Keep rules at mid-level abstraction
-
[55]
For each extracted rule, compare it with the existing RuleBank. If semantically equivalent to an existing rule (same structural pattern, not just similar wording), output an INCREMENT operation with that rule’s ID. Otherwise, output anINSERToperation with the canonical description. Input[CURRENT RULEBANK] {rulebank_summary};[RAW GRADIENT] {raw_gradient}. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.