pith. machine review for the scientific record. sign in

arxiv: 2605.13050 · v2 · submitted 2026-05-13 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Context Training with Active Information Seeking

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:03 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords context optimizationactive information seekingLLM adaptationsearch toolscandidate pruningexternal knowledgedata efficiencytask generalization
0
0 comments X

The pith

Pairing search tools with multi-candidate pruning during context training produces consistent LLM gains without weight updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that simply attaching Wikipedia search and browser tools to standard context optimization pipelines often reduces performance compared to baselines. When the same tools are instead used inside a training loop that keeps several candidate contexts and prunes the weaker ones, the resulting contexts deliver measurable improvements on low-resource translation, health queries, and reasoning benchmarks. Readers would care because the method adapts deployed models to new information using only external search and modest training data, without the expense of full retraining.

Core claim

Equipping context optimizers with Wikipedia search and browser tools for active information seeking, when combined with a search-based training procedure that maintains and prunes multiple candidate contexts, produces consistent and substantial performance gains on Flores+ low-resource translation, HealthBench health scenarios, LiveCodeBench, and Humanity's Last Exam reasoning tasks. The resulting textual contexts are data-efficient, robust across hyperparameters, and generalize to different models.

What carries the argument

A search-based training procedure that maintains multiple candidate contexts and prunes them to incorporate active information from external tools.

If this is right

  • Performance improves on low-resource translation benchmarks such as Flores+.
  • Accuracy rises on health-related queries in HealthBench.
  • Reasoning scores increase on LiveCodeBench and Humanity's Last Exam.
  • Training requires relatively little data while remaining stable across hyperparameter choices.
  • Learned contexts transfer effectively to models not seen during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could reduce the need for repeated full-model retraining when new domain knowledge appears.
  • Similar pruning logic might be applied to other external retrieval sources beyond Wikipedia.
  • Real-time systems could use the same active-seeking loop to keep contexts current without human intervention.
  • The method may complement existing retrieval-augmented generation pipelines by supplying higher-quality initial contexts.

Load-bearing premise

External search tools return sufficiently accurate and relevant passages, and the pruning step can reliably discard noisy contexts without removing useful ones.

What would settle it

Running the same tasks with the pruning step disabled or with deliberately noisy search results and finding no gains relative to the closed-loop baseline would falsify the claim.

read the original abstract

Most existing large language models (LLMs) are expensive to adapt after deployment, especially when a task requires newly produced information or niche domain knowledge. Recent work has shown that, by manipulating and optimizing their context, LLMs can be tailored to downstream tasks without updating their weights. However, most existing methods remain closed-loop, relying solely on the model's intrinsic knowledge. In this paper, we equip these context optimizers with Wikipedia search and browser tools for active information seeking. We show that naively adding these tools to a standard sequential context optimization pipeline can actually degrade performance compared to baselines. However, when paired with a search-based training procedure that maintains and prunes multiple candidate contexts, active information seeking delivers consistent and substantial gains. We demonstrate these improvements across diverse domains, including low-resource translation (Flores+), health scenarios (HealthBench), and reasoning-heavy tasks (LiveCodeBench and Humanity's Last Exam). Furthermore, our method proves to be data-efficient, robust across different hyperparameters, and capable of generating effective textual contexts that generalize well across different models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes equipping context optimizers for LLMs with external Wikipedia search and browser tools to enable active information seeking. It claims that naively adding these tools to sequential optimization pipelines degrades performance relative to baselines, but pairing them with a search-based training procedure that maintains and prunes multiple candidate contexts produces consistent gains on low-resource translation (Flores+), health scenarios (HealthBench), and reasoning tasks (LiveCodeBench and Humanity's Last Exam). The method is further presented as data-efficient, robust to hyperparameter choices, and capable of producing contexts that generalize across models.

Significance. If the central result holds after addressing the noted gaps, the work would be significant for demonstrating how external tools combined with explicit multi-candidate pruning can enable effective, weight-free adaptation of LLMs to new or niche information, with potential implications for data-efficient deployment in dynamic domains.

major comments (2)
  1. [Training Procedure and Experiments] The headline claim that active information seeking yields gains only when paired with the multi-candidate pruning procedure rests on an unablated assumption. No experiment compares performance when all candidates are retained versus when the pruning rule is applied, leaving open the possibility that gains arise from maintaining multiples rather than from the pruning step itself.
  2. [Abstract and Results] The abstract asserts 'consistent and substantial gains' across four benchmarks, yet supplies no quantitative deltas, baseline details, statistical tests, or error analysis of false-negative discards during pruning. This absence makes it impossible to assess whether the pruning metric reliably separates signal from noise on health or reasoning tasks.
minor comments (1)
  1. [Abstract] The abstract states robustness across hyperparameters but does not enumerate the specific hyperparameters varied or the ranges tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments point-by-point below, and we will make the necessary revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [Training Procedure and Experiments] The headline claim that active information seeking yields gains only when paired with the multi-candidate pruning procedure rests on an unablated assumption. No experiment compares performance when all candidates are retained versus when the pruning rule is applied, leaving open the possibility that gains arise from maintaining multiples rather than from the pruning step itself.

    Authors: We agree that an explicit ablation isolating the effect of the pruning rule from merely maintaining multiple candidates would provide clearer evidence for our claims. Our existing comparisons show that the naive tool addition (sequential optimization without multi-candidate maintenance) underperforms, while the full procedure with maintenance and pruning succeeds. To address this, we will add a new ablation experiment in the revised version that retains all candidates without pruning and compares it directly to the pruned version. revision: yes

  2. Referee: [Abstract and Results] The abstract asserts 'consistent and substantial gains' across four benchmarks, yet supplies no quantitative deltas, baseline details, statistical tests, or error analysis of false-negative discards during pruning. This absence makes it impossible to assess whether the pruning metric reliably separates signal from noise on health or reasoning tasks.

    Authors: We acknowledge the need for more quantitative detail and rigor in the abstract and results. We will update the abstract to include specific performance deltas (e.g., absolute and relative improvements on each benchmark) and baseline descriptions. In the results section, we will incorporate statistical tests such as significance testing across runs and an analysis of pruning errors, including false negative discards, to demonstrate the reliability of the pruning metric. These additions will be included in the main text or supplementary material as appropriate. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method: equipping context optimizers with external Wikipedia search and browser tools, then pairing them with a search-based training procedure that maintains and prunes multiple candidate contexts. The abstract and described claims contain no equations, fitted parameters, or derivations that reduce the reported gains to inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear. Performance is evaluated on external benchmarks (Flores+, HealthBench, LiveCodeBench, Humanity's Last Exam), making the central claim dependent on those results rather than internal tautology. The pruning component is part of the proposed procedure, not derived from itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method is described as building on existing context optimization and tool-use techniques.

pith-pipeline@v0.9.0 · 5500 in / 1100 out tokens · 36014 ms · 2026-05-15T06:03:28.957434+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 8 internal anchors

  1. [1]

    Mastering the game of

    Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal=. Mastering the game of. 2016 , publisher=

  2. [2]

    2024 , eprint=

    Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs , author=. 2024 , eprint=

  3. [3]

    Arora and Jason Wei and Rebecca Soskin Hicks and Preston Bowman and Joaquin Qui

    Rahul K. Arora and Jason Wei and Rebecca Soskin Hicks and Preston Bowman and Joaquin Qui. HealthBench: Evaluating Large Language Models Towards Improved Human Health , journal =

  4. [4]

    Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =

  5. [5]

    Mondal and Jyoti Prakash Sahoo , title =

    Yisheng Song and Ting Wang and Puyu Cai and Subrota K. Mondal and Jyoti Prakash Sahoo , title =

  6. [6]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,

    Qingxiu Dong and Lei Li and Damai Dai and Ce Zheng and Jingyuan Ma and Rui Li and Heming Xia and Jingjing Xu and Zhiyong Wu and Baobao Chang and Xu Sun and Lei Li and Zhifang Sui , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,

  7. [7]

    CoRR , volume =

    Pranab Sahoo and Ayush Kumar Singh and Sriparna Saha and Vinija Jain and Samrat Mondal and Aman Chadha , title =. CoRR , volume =

  8. [8]

    Kroiz and Feileen Li and Hudson Tao and Ashay Srivastava and Hevander Da Costa and Saloni Gupta and Megan L

    Sander Schulhoff and Michael Ilie and Nishant Balepur and Konstantine Kahadze and Amanda Liu and Chenglei Si and Yinheng Li and Aayush Gupta and HyoJung Han and Sevien Schulhoff and Pranav Sandeep Dulepet and Saurav Vidyadhara and Dayeon Ki and Sweta Agrawal and Chau Pham and Gerson C. Kroiz and Feileen Li and Hudson Tao and Ashay Srivastava and Hevander ...

  9. [9]

    Chi and Quoc V

    Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 , year =

  10. [10]

    Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , booktitle =

    Chrisantha Fernando and Dylan Banarse and Henryk Michalewski and Simon Osindero and Tim Rockt. Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , booktitle =

  11. [11]

    Yuxin Wen and Neel Jain and John Kirchenbauer and Micah Goldblum and Jonas Geiping and Tom Goldstein , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

  12. [12]

    CoRR , volume =

    Ryumei Nakada and Wenlong Ji and Tianxi Cai and James Zou and Linjun Zhang , title =. CoRR , volume =

  13. [13]

    NLLB Team and Costa-juss \`a , Marta R. and Cross, James and C elebi, Onur and Elbayad, Maha and Heafield, Kenneth and Heffernan, Kevin and Kalbassi, Elahe and Lam, Janice and Licht, Daniel and Maillard, Jean and Sun, Anna and Wang, Skyler and Wenzek, Guillaume and Youngblood, Al and Akula, Bapi and Barrault, Loic and Gonzalez, Gabriel Mejia and Hansanti,...

  14. [14]

    The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

    Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc ' Aurelio and Guzm \'a n, Francisco and Fan, Angela. The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. Transactions of the Association for Computational Linguistics. 2022

  15. [15]

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =

    Naman Jain and King Han and Alex Gu and Wen. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =

  16. [16]

    2025 , eprint=

    Humanity's Last Exam , author=. 2025 , eprint=

  17. [17]

    CoRR , volume =

    Qizheng Zhang and Changran Hu and Shubhangi Upasani and Boyuan Ma and Fenglu Hong and Vamsidhar Kamanuru and Jay Rainton and Chen Wu and Mengmeng Ji and Hanchen Li and Urmish Thakker and James Zou and Kunle Olukotun , title =. CoRR , volume =

  18. [18]

    CoRR , volume =

    Gemini Team , title =. CoRR , volume =

  19. [19]

    OpenAI GPT-5 System Card

    Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

  20. [20]

    Kimi K2: Open Agentic Intelligence

    Kimi k2: Open agentic intelligence , author=. arXiv preprint arXiv:2507.20534 , year=

  21. [21]

    Proceedings of the 27th International Conference on Computational Linguistics,

    Vikas Yadav and Steven Bethard , title =. Proceedings of the 27th International Conference on Computational Linguistics,

  22. [22]

    WIREs Data Mining Knowl

    Lei Zhang and Shuai Wang and Bing Liu , title =. WIREs Data Mining Knowl. Discov. , volume =

  23. [23]

    A survey on code generation with llm-based agents,

    Yihong Dong and Xue Jiang and Jiaru Qian and Tian Wang and Kechi Zhang and Zhi Jin and Ge Li , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2508.00083 , eprinttype =. 2508.00083 , timestamp =

  24. [24]

    Competitive Programming with Large Reasoning Models , journal =

    Ahmed El. Competitive Programming with Large Reasoning Models , journal =. 2025 , url =. doi:10.48550/ARXIV.2502.06807 , eprinttype =. 2502.06807 , timestamp =

  25. [25]

    Vechev , title =

    Mislav Balunovic and Jasper Dekoninck and Ivo Petrov and Nikola Jovanovic and Martin T. Vechev , title =. CoRR , volume =

  26. [26]

    and Huang, Jimin and Qian, Lingfei and Peng, Xueqing and Suchow, Jordan W

    Li, Haohang and Cao, Yupeng and Yu, Yangyang and Javaji, Shashidhar Reddy and Deng, Zhiyang and He, Yueru and Jiang, Yuechen and Zhu, Zining and Subbalakshmi, K.p. and Huang, Jimin and Qian, Lingfei and Peng, Xueqing and Suchow, Jordan W. and Xie, Qianqian. INVESTORBENCH : A Benchmark for Financial Decision-Making Tasks with LLM -based Agent. Proceedings ...

  27. [27]

    CoRR , volume =

    Shuo Ren and Pu Jian and Zhenjiang Ren and Chunlin Leng and Can Xie and Jiajun Zhang , title =. CoRR , volume =. 2025 , url =

  28. [28]

    A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence , journal =

    Huan. A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence , journal =. 2025 , url =

  29. [29]

    CoRR , volume =

    Jinyuan Fang and Yanwen Peng and Xi Zhang and Yingxu Wang and Xinhao Yi and Guibin Zhang and Yi Xu and Bin Wu and Siwei Liu and Zihao Li and Zhaochun Ren and Nikos Aletras and Xi Wang and Han Zhou and Zaiqiao Meng , title =. CoRR , volume =. 2025 , url =

  30. [30]

    The Eleventh International Conference on Learning Representations,

    Zeyu Huang and Yikang Shen and Xiaofeng Zhang and Jie Zhou and Wenge Rong and Zhang Xiong , title =. The Eleventh International Conference on Learning Representations,

  31. [31]

    Smith and Yejin Choi and Kentaro Inui , title =

    Jungo Kasai and Keisuke Sakaguchi and Yoichi Takahashi and Ronan Le Bras and Akari Asai and Xinyan Yu and Dragomir Radev and Noah A. Smith and Yejin Choi and Kentaro Inui , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 20...

  32. [32]

    Cohen and Emine Yilmaz , title =

    Zheng Zhao and Clara Vania and Subhradeep Kayal and Naila Khan and Shay B. Cohen and Emine Yilmaz , title =. Findings of the Association for Computational Linguistics,

  33. [33]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Interactive fiction games: A colossal adventure , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  34. [34]

    CoRR , volume =

    Edan Toledo and Karen Hambardzumyan and Martin Josifoski and Rishi Hazra and Nicolas Mario Baldwin and Alexis Audran. CoRR , volume =

  35. [35]

    Differentiation

    Mert Y. TextGrad: Automatic "Differentiation" via Text , journal =

  36. [36]

    Automatic Prompt Optimization with ``Gradient Descent'' and Beam Search

    Pryzant, Reid and Iter, Dan and Li, Jerry and Lee, Yin and Zhu, Chenguang and Zeng, Michael. Automatic Prompt Optimization with ``Gradient Descent'' and Beam Search. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  37. [37]

    Forty-second International Conference on Machine Learning,

    Zora Zhiruo Wang and Jiayuan Mao and Daniel Fried and Graham Neubig , title =. Forty-second International Conference on Machine Learning,. 2025 , url =

  38. [38]

    A Benchmark for Learning to Translate a New Language from One Grammar Book , booktitle =

    Garrett Tanzer and Mirac Suzgun and Eline Visser and Dan Jurafsky and Luke Melas. A Benchmark for Learning to Translate a New Language from One Grammar Book , booktitle =

  39. [39]

    The Thirteenth International Conference on Learning Representations,

    Seth Aycock and David Stap and Di Wu and Christof Monz and Khalil Sima'an , title =. The Thirteenth International Conference on Learning Representations,

  40. [40]

    McAuley , title =

    Yuanzhe Hu and Yu Wang and Julian J. McAuley , title =. CoRR , volume =. 2025 , url =

  41. [41]

    CoRR , volume =

    Huichi Zhou and Yihang Chen and Siyuan Guo and Xue Yan and Kin Hei Lee and Zihan Wang and Ka Yiu Lee and Guchun Zhang and Kun Shao and Linyi Yang and Jun Wang , title =. CoRR , volume =

  42. [42]

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory , journal =

    Siru Ouyang and Jun Yan and I. ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory , journal =

  43. [43]

    arXiv preprint arXiv:2502.00592 , year=

    M+: Extending MemoryLLM with Scalable Long-Term Memory , author=. arXiv preprint arXiv:2502.00592 , year=

  44. [44]

    A Survey of Context Engineering for Large Language Models

    A survey of context engineering for large language models , author=. arXiv preprint arXiv:2507.13334 , year=

  45. [45]

    ACM Computing Surveys , volume=

    Continual learning of large language models: A comprehensive survey , author=. ACM Computing Surveys , volume=. 2025 , publisher=

  46. [46]

    Contextual Experience Replay for Self-Improvement of Language Agents

    Liu, Yitao and Si, Chenglei and Narasimhan, Karthik R and Yao, Shunyu. Contextual Experience Replay for Self-Improvement of Language Agents. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

  47. [47]

    Guanzhi Wang and Yuqi Xie and Yunfan Jiang and Ajay Mandlekar and Chaowei Xiao and Yuke Zhu and Linxi Fan and Anima Anandkumar , title =. Trans. Mach. Learn. Res. , volume =

  48. [48]

    Proceedings of the 31st International Conference on Computational Linguistics , pages=

    In-context continual learning assisted by an external continual learner , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

  49. [49]

    Zhang, Q., Chen, S., Bei, Y ., Yuan, Z., Zhou, H., Hong, Z., Dong, J., Chen, H., Chang, Y ., and Huang, X

    A survey of graph retrieval-augmented generation for customized large language models , author=. arXiv preprint arXiv:2501.13958 , year=

  50. [50]

    PLOS Digital Health , volume=

    Retrieval augmented generation for large language models in healthcare: A systematic review , author=. PLOS Digital Health , volume=. 2025 , publisher=

  51. [51]

    Computers and Education: Artificial Intelligence , pages=

    Retrieval-augmented generation for educational application: A systematic survey , author=. Computers and Education: Artificial Intelligence , pages=. 2025 , publisher=

  52. [52]

    EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

    Evotest: Evolutionary test-time learning for self-improving agentic systems , author=. arXiv preprint arXiv:2510.13220 , year=

  53. [53]

    arXiv preprint arXiv:2505.18524 , year=

    metaTextGrad: Automatically optimizing language model optimizers , author=. arXiv preprint arXiv:2505.18524 , year=

  54. [54]

    Patil and Kevin Lin and Sarah Wooders and Joseph E

    Charles Packer and Vivian Fang and Shishir G. Patil and Kevin Lin and Sarah Wooders and Joseph E. Gonzalez , title =. CoRR , volume =. 2023 , url =

  55. [55]

    A-MEM: Agentic Memory for LLM Agents

    Wujiang Xu and Zujie Liang and Kai Mei and Hang Gao and Juntao Tan and Yongfeng Zhang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.12110 , eprinttype =. 2502.12110 , timestamp =

  56. [56]

    Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory , journal =

    Mirac Suzgun and Mert Y. Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory , journal =. 2025 , url =. doi:10.48550/ARXIV.2504.07952 , eprinttype =. 2504.07952 , timestamp =

  57. [57]

    arXiv preprint arXiv:2511.06449 , year=

    Flex: Continuous agent evolution via forward learning from experience , author=. arXiv preprint arXiv:2511.06449 , year=

  58. [58]

    ExpSeek: Self-Triggered Experience Seeking for Web Agents

    ExpSeek: Self-Triggered Experience Seeking for Web Agents , author=. arXiv preprint arXiv:2601.08605 , year=

  59. [59]

    Advances in Neural Information Processing Systems , volume=

    Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  60. [60]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory , author=. arXiv preprint arXiv:2511.20857 , year=

  61. [61]

    Aman Madaan and Niket Tandon and Prakhar Gupta and Skyler Hallinan and Luyu Gao and Sarah Wiegreffe and Uri Alon and Nouha Dziri and Shrimai Prabhumoye and Yiming Yang and Shashank Gupta and Bodhisattwa Prasad Majumder and Katherine Hermann and Sean Welleck and Amir Yazdanbakhsh and Peter Clark , title =. Advances in Neural Information Processing Systems ...

  62. [62]

    The Twelfth International Conference on Learning Representations , year=

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines , author=. The Twelfth International Conference on Learning Representations , year=

  63. [63]

    Nature , volume=

    AI models collapse when trained on recursively generated data , author=. Nature , volume=. 2024 , publisher=

  64. [64]

    arXiv preprint arXiv:2601.18510 , year=

    Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates , author=. arXiv preprint arXiv:2601.18510 , year=

  65. [65]

    Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

    Diverse beam search: Decoding diverse solutions from neural sequence models , author=. arXiv preprint arXiv:1610.02424 , year=

  66. [66]

    Ponti and Ivan Titov , title =

    Zeyu Huang and Tianhao Cheng and Zihan Qiu and Zili Wang and Yinghui Xu and Edoardo M. Ponti and Ivan Titov , title =. CoRR , volume =

  67. [67]

    Ponti and Ivan Titov , title =

    Zeyu Huang and Zihan Qiu and Zili Wang and Edoardo M. Ponti and Ivan Titov , title =. The Thirteenth International Conference on Learning Representations,