Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

Congmin Zheng; Jiachen Zhu; Jianghao Lin; Lingyu Yang; Lionel Z. Wang; Rong Shan; Weinan Zhang; Weiwen Liu; Yong Yu; Yuxiang Chen

arxiv: 2605.15721 · v1 · pith:ZS7EFEVSnew · submitted 2026-05-15 · 💻 cs.CL

Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

Jiachen Zhu , Zhuoying Ou , Congmin Zheng , Yuxiang Chen , Zeyu Zheng , Rong Shan , Lingyu Yang , Lionel Z. Wang

show 4 more authors

Weiwen Liu Yong Yu Weinan Zhang Jianghao Lin

This is my paper

Pith reviewed 2026-05-20 19:01 UTC · model grok-4.3

classification 💻 cs.CL

keywords context engineeringcollaborative filteringLLM contextpersonalizationrecommendationprompt optimizationneural collaborative filtering

0 comments

The pith

Context engineering as recommendation enables matching each input with its optimal context.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing context engineering methods treat the task as a global search for one strategy that works on average. This overlooks the potential for different inputs to need different contexts. The paper introduces Neural Collaborative Context Engineering (NCCE) that formulates the problem as recommendation. It bootstraps anchor contexts and runs a Context-CF Co-Evolution process where a neural collaborative filtering model learns instance preferences to guide context generation. The result is a router that assigns tailored contexts to new inputs and improves accuracy.

Core claim

We propose a paradigm shift by formulating context engineering as a recommendation problem. We introduce Neural Collaborative Context Engineering (NCCE), a framework that transitions optimization from a static global search to dynamic, instance-wise routing. NCCE first bootstraps a diverse catalog of anchor contexts and then employs a novel Context-CF Co-Evolution mechanism. This stage establishes a synergistic feedback loop: a lightweight Neural Collaborative Filtering (NCF) model learns instance-context preferences to guide the generation of specialized context variants, while the newly evaluated contexts continuously refine the NCF model's understanding of latent preferences. At inference

What carries the argument

The Context-CF Co-Evolution mechanism that creates a feedback loop between a Neural Collaborative Filtering model learning preferences and the generation of context variants for instance-specific routing.

If this is right

Instance-wise context routing captures performance gains missed by global optimization.
The NCF model enables efficient dynamic assignment at inference without repeated searches.
Personalization in context engineering is shown to be critical for LLM task accuracy.
The co-evolution process refines both the preference model and the context catalog over iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framing could be applied to selecting few-shot examples or other prompt components on a per-input basis.
Testing on a wider range of tasks would reveal how much the gains depend on the diversity of the initial anchor catalog.

Load-bearing premise

A diverse catalog of anchor contexts can be bootstrapped such that the subsequent Context-CF Co-Evolution loop produces genuinely instance-specific improvements rather than simply rediscovering a few strong global contexts.

What would settle it

An experiment showing no accuracy improvement when using the NCF-routed contexts compared to the best single global context or the initial anchor set on new inputs would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2605.15721 by Congmin Zheng, Jiachen Zhu, Jianghao Lin, Lingyu Yang, Lionel Z. Wang, Rong Shan, Weinan Zhang, Weiwen Liu, Yong Yu, Yuxiang Chen, Zeyu Zheng, Zhuoying Ou.

**Figure 1.** Figure 1: Context engineering as recommendation: learning to assign instance-specific composite contexts instead of optimizing a single global context strategy. Large Language Models (LLMs) have become increasingly capable at solving complex reasoning, question answering, and context-dependent tasks [1, 30, 4, 31]. Yet their performance remains highly sensitive to the context provided at inference time. Small cha… view at source ↗

**Figure 2.** Figure 2: The overall architecture of NCCE, featuring a synergistic co-evolutionary loop between a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Performance evolution across iterative rounds. The curves track the task scores of NCCE [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Performance across difference data density in collaborative filtering matrix. continuously enhance overall accuracy. Furthermore, compared to the pointwise loss variant, which fluctuates in later rounds, NCCE with pairwise ranking maintains a highly stable learning curve, confirming its robustness in integrating newly evolved contexts. Necessity of Instance-Wise Routing. The curves also starkly highlight t… view at source ↗

**Figure 6.** Figure 6: t-SNE visualization of context routing assignments. Colors represent different context [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Large Language Models (LLMs) are highly sensitive to their input contexts, motivating the development of automated context engineering. However, existing methods predominantly treat this as a global search problem, seeking a single context strategy that maximizes average performance across a dataset. This restrictive assumption overlooks the fact that different inputs often require distinct guidance, leaving substantial instance-level performance gains untapped. In this paper, we propose a paradigm shift by formulating context engineering as a recommendation problem. We introduce \textbf{Neural Collaborative Context Engineering (NCCE)}, a framework that transitions optimization from a static global search to dynamic, instance-wise routing. NCCE first bootstraps a diverse catalog of anchor contexts and then employs a novel \textbf{Context-CF Co-Evolution} mechanism. This stage establishes a synergistic feedback loop: a lightweight Neural Collaborative Filtering (NCF) model learns instance-context preferences to guide the generation of specialized context variants, while the newly evaluated contexts continuously refine the NCF model's understanding of latent preferences. At inference time, the trained NCF model acts as a context router, dynamically assigning the most suitable context strategy to each unseen instance. Theoretical Proofs and comprehensive experiments demonstrate that by matching individual inputs with their optimal contexts, NCCE significantly improves task accuracy, highlighting the critical importance of personalization in LLM context engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts context engineering as per-instance recommendation via NCCE and a co-evolution loop, but it is unclear whether the gains reflect real personalization or just better global context discovery.

read the letter

The main takeaway is that this work shifts context engineering from global search to a recommendation framing with NCCE and its Context-CF Co-Evolution mechanism. That formulation is new relative to the static or dataset-wide methods referenced in the abstract. The bootstrap of anchor contexts followed by a feedback loop where NCF learns preferences and steers variant generation is a concrete proposal that could be implemented in production pipelines. If the loop keeps contexts diverse and routes them meaningfully, the approach would give practitioners a practical way to handle input-dependent prompt needs without hand-tuning per case. The paper does a reasonable job laying out the inference-time router and tying it back to collaborative filtering ideas that already work in other domains. The theoretical proofs mentioned in the abstract are a plus if they actually constrain the behavior of the co-evolution stage. The central soft spot is exactly the one the stress-test note raises. It is possible for the NCF to converge on a small set of strong contexts and assign them to most inputs, which would make the reported accuracy lift look like improved global search rather than instance-specific routing. The abstract does not spell out controls that would rule this out, such as measuring the entropy of assigned contexts across inputs or comparing against a strong global baseline that also gets to evolve. The bootstrap step for the initial catalog is also load-bearing; any lack of diversity there would propagate. Without seeing quantitative breakdowns of context diversity or ablation on the co-evolution component, the claim that personalization is the driver stays provisional. This paper is aimed at researchers and engineers working on prompt optimization, retrieval-augmented generation, or production LLM systems where context choice affects downstream accuracy. Readers who already use recommendation models or evolutionary search would find the mapping straightforward. It deserves peer review because the framing is distinct and the proposed mechanism is testable, even though the current evidence leaves the key distinction between global and instance-level effects open. I would send it to referees with a request to verify that the reported gains survive a strong global-search comparator and that context assignments actually vary across instances.

Referee Report

2 major / 2 minor

Summary. The paper proposes Neural Collaborative Context Engineering (NCCE) to reframe LLM context engineering as an instance-wise recommendation task rather than global search. It bootstraps a catalog of anchor contexts and introduces a Context-CF Co-Evolution loop in which a Neural Collaborative Filtering (NCF) model learns preferences to guide generation of context variants; at inference the NCF routes each input to its preferred context. The abstract states that theoretical proofs and comprehensive experiments show significant accuracy gains from this personalization.

Significance. If the results hold and the method demonstrably routes inputs to distinct, instance-specific contexts rather than rediscovering a few strong global ones, the work would be significant for shifting context engineering from dataset-level optimization to per-instance routing, with potential gains on heterogeneous tasks.

major comments (2)

Abstract: the claim that 'Theoretical Proofs and comprehensive experiments demonstrate' significant improvements is unsupported in the provided manuscript, which contains no quantitative results, error bars, baseline comparisons, or description of controls against post-hoc context selection; this is load-bearing for the central accuracy claim.
Context-CF Co-Evolution mechanism (described in the abstract and introduction): the feedback loop between NCF preference learning and variant generation lacks any stated mechanism or metric (e.g., entropy of context assignments, per-instance context diversity, or ablation showing routing variation) to ensure convergence produces genuinely instance-specific contexts rather than a small set of dominant global winners; without such evidence the personalization paradigm cannot be distinguished from improved global search.

minor comments (2)

Notation: 'NCF' and 'NCCE' should be expanded on first use; the distinction between 'anchor contexts' and 'specialized context variants' is not made explicit.
The title 'Contexting as Recommendation' would benefit from a brief clarification of how the evolutionary loop differs from standard collaborative filtering pipelines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important areas where the current manuscript requires strengthening to support its central claims. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: Abstract: the claim that 'Theoretical Proofs and comprehensive experiments demonstrate' significant improvements is unsupported in the provided manuscript, which contains no quantitative results, error bars, baseline comparisons, or description of controls against post-hoc context selection; this is load-bearing for the central accuracy claim.

Authors: We agree that the abstract claim is currently unsupported, as the submitted manuscript does not yet include the quantitative results, error bars, baseline comparisons, or explicit controls against post-hoc selection. This phrasing was carried over from an earlier outline and does not reflect the present state of the document. In the revised version we will remove the unsupported claim from the abstract and, if the experiments are completed in time, replace it with a more qualified statement that points to the specific results and controls that will be added to the experimental section. revision: yes
Referee: Context-CF Co-Evolution mechanism (described in the abstract and introduction): the feedback loop between NCF preference learning and variant generation lacks any stated mechanism or metric (e.g., entropy of context assignments, per-instance context diversity, or ablation showing routing variation) to ensure convergence produces genuinely instance-specific contexts rather than a small set of dominant global winners; without such evidence the personalization paradigm cannot be distinguished from improved global search.

Authors: We acknowledge that the current description of the Context-CF Co-Evolution loop does not supply the requested metrics or ablations. While the manuscript outlines the iterative feedback between the NCF router and context variant generation, it does not report assignment entropy, per-instance diversity statistics, or controlled ablations that would demonstrate routing variation. We will add these analyses in the revision, including entropy of context assignments across the test set and an ablation that compares instance-specific routing against a global-search baseline, to provide the necessary evidence that the method produces genuinely personalized contexts rather than converging on a few dominant ones. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard supervised training-inference separation

full rationale

The NCCE framework bootstraps an initial catalog of anchor contexts, evaluates them to create training data for the NCF model, then uses the trained NCF to route contexts for new instances. This follows a conventional supervised learning loop where parameters are fit on observed instance-context preference data and applied to unseen inputs. No equation or step equates a claimed prediction to its own inputs by construction, no uniqueness theorem is imported via self-citation, and the co-evolution is described as iterative refinement rather than a closed definitional loop. The central claim of instance-specific routing rests on empirical generalization rather than tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a useful latent preference structure between instances and contexts that can be learned by a lightweight NCF model; no explicit free parameters or invented physical entities are named, but the bootstrap catalog size and the co-evolution stopping criterion function as implicit modeling choices.

free parameters (1)

number of anchor contexts
The size of the initial diverse catalog is chosen to enable the subsequent co-evolution; its value is not derived from first principles.

axioms (1)

domain assumption Different inputs require distinct guidance that can be captured by a low-rank preference matrix
Invoked when the paper states that global search leaves instance-level gains untapped and that NCF can learn these preferences.

invented entities (1)

Context-CF Co-Evolution mechanism no independent evidence
purpose: Synergistic feedback loop between NCF preference learning and context variant generation
Newly introduced construct whose independent evidence is the reported accuracy gains; no external falsifiable prediction (e.g., a specific performance curve) is stated.

pith-pipeline@v0.9.0 · 5797 in / 1426 out tokens · 47860 ms · 2026-05-20T19:01:32.903534+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a paradigm shift by formulating context engineering as a recommendation problem... Neural Collaborative Filtering (NCF) model learns instance-context preferences

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 13 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al . 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja Ganu, and Akshay Nambi. 2024. PromptWizard: Task-Aware Prompt Optimization Framework. arXiv:2405.18369 [cs.CL]https://arxiv.org/abs/2405.18369

work page arXiv 2024
[3]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl- Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khat- tab. 2026. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning. arXiv:2507.1...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al . 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

work page 2020
[5]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. InProceedings of the 1st workshop on deep learning for recommender systems. 7–10

work page 2016
[6]

Stéphan Clémençon, Gábor Lugosi, and Nicolas Vayatis. 2008. Ranking and empirical mini- mization of U-statistics. (2008)

work page 2008
[7]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191– 198

work page 2016
[8]

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rock- täschel. 2023. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. arXiv:2309.16797 [cs.CL]https://arxiv.org/abs/2309.16797

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2025. EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers. arXiv:2309.08532 [cs.CL] https://arxiv.org/abs/2309. 08532 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. InProceedings of the 26th international conference on world wide web. 173–182

work page 2017
[11]

Yichen Jiang, Shikha Bordia, Zheng Zhong, Charles Dognin, Maneesh Singh, and Mohit Bansal

work page
[12]

InFindings of the Association for Computational Linguistics: EMNLP 2020

HoVer: A dataset for many-hop fact extraction and claim verification. InFindings of the Association for Computational Linguistics: EMNLP 2020. 3441–3460

work page 2020
[13]

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, Heather Miller, et al. 2023. DSPy: compiling declarative language model calls into state-of-the-art pipelines. InThe Twelfth International Conference on Learning Representations

work page 2023
[14]

Yehuda Koren, Robert Bell, and Chris V olinsky. 2009. Matrix factorization techniques for recommender systems.Computer42, 8 (2009), 30–37

work page 2009
[15]

Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to- item collaborative filtering.IEEE Internet computing7, 1 (2003), 76–80

work page 2003
[16]

Reginald Long, Panupong Pasupat, and Percy Liang. 2016. Simpler context-dependent logical forms via model projections. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers). 1456–1465

work page 2016
[17]

Giovanni Monea, Antoine Bosselut, Kianté Brantley, and Yoav Artzi. 2025. LLMs Are In- Context Bandit Reinforcement Learners. arXiv:2410.05362 [cs.CL] https://arxiv.org/ abs/2410.05362

work page arXiv 2025
[18]

GPT-4o System Card

OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, and etc. 2024. GPT-4o System Card. arXiv:2410.21276 [cs.CL]https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. 2024. Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs. arXiv:2406.11695 [cs.CL] https://arxiv.org/abs/2406. 11695

work page arXiv 2024
[20]

Jiarui Qin, Jiachen Zhu, Bo Chen, Zhirong Liu, Weiwen Liu, Ruiming Tang, Rui Zhang, Yong Yu, and Weinan Zhang. 2022. Rankflow: Joint optimization of multi-stage cascade ranking systems as flows. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 814–824

work page 2022
[21]

Jiarui Qin, Jiachen Zhu, Yankai Liu, Junchao Gao, Jianjie Ying, Chaoxiong Liu, Ding Wang, Junlan Feng, Chao Deng, Xiaozheng Wang, et al . 2023. Learning to distinguish multi-user coupling behaviors for TV recommendation. InProceedings of the sixteenth ACM international conference on web search and data mining. 204–212

work page 2023
[22]

Xuanfei Ren, Allen Nie, Tengyang Xie, and Ching-An Cheng. 2026. POLCA: Stochastic Generative Optimization with LLM.arXiv preprint arXiv:2603.14769(2026)

work page arXiv 2026
[23]

Steffen Rendle. 2010. Factorization machines. In2010 IEEE International conference on data mining. IEEE, 995–1000

work page 2010
[24]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback.arXiv preprint arXiv:1205.2618(2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012
[25]

Raparthy, Andrei Lupu, Eric Hambro, Aram H

Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, and Roberta Raileanu. 2024. Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts. arXiv:2402.16822 [cs.CL] https://arxiv.org/abs/2402. 16822

work page arXiv 2024
[26]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. InProceedings of the 10th international conference on World Wide Web. 285–295. 11

work page 2001
[27]

Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Au- toencoders meet collaborative filtering. InProceedings of the 24th international conference on World Wide Web. 111–112

work page 2015
[28]

Rong Shan, Jiachen Zhu, Jianghao Lin, Chenxu Zhu, Bo Chen, Ruiming Tang, Yong Yu, and Weinan Zhang. 2025. Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation.ACM Transactions on Recommender Systems4, 2 (2025), 1–33

work page 2025
[29]

2025.OpenEvolve: an open-source evolutionary coding agent

Asankhaya Sharma. 2025.OpenEvolve: an open-source evolutionary coding agent. https: //github.com/algorithmicsuperintelligence/openevolve

work page 2025
[30]

Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366 [cs.AI]https://arxiv.org/abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al . 2023. Llama 2: open foundation and fine-tuned chat models. arXiv.arXiv preprint arXiv:2307.0928810 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M Ni, et al. 2024. Openr: An open source framework for advanced reasoning with large language models.arXiv preprint arXiv:2410.09671(2024)

work page arXiv 2024
[33]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Ad- vances in neural information processing systems33 (2020), 5776–5788

work page 2020
[34]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Large Language Models as Optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. 2024. Large Language Models as Optimizers. arXiv:2309.03409 [cs.LG] https://arxiv.org/abs/2309.03409

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 conference on empirical methods in natural language processing. 2369–2380

work page 2018
[37]

TextGrad: Automatic "Differentiation" via Text

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou. 2024. TextGrad: Automatic "Differentiation" via Text. arXiv:2406.07496 [cs.CL] https://arxiv.org/abs/2406.07496

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Muhan Zhang and Yixin Chen. 2019. Inductive matrix completion based on graph neural networks.arXiv preprint arXiv:1904.12058(2019)

work page arXiv 2019
[39]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al . 2023. A survey of large language models. arXiv preprint arXiv:2303.182231, 2 (2023), 1–124

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. InInternational conference on machine learning. Pmlr, 12697–12706

work page 2021
[41]

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023. Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910 [cs.LG]https://arxiv.org/abs/2211.01910

work page internal anchor Pith review arXiv 2023
[42]

Jiachen Zhu, Jianghao Lin, Xinyi Dai, Bo Chen, Rong Shan, Jieming Zhu, Ruiming Tang, Yong Yu, and Weinan Zhang. 2024. Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation. arXiv:2408.03533 [cs.IR] https://arxiv.org/abs/2408. 03533 12

work page arXiv 2024
[43]

blind spots

Jiachen Zhu, Yichao Wang, Jianghao Lin, Jiarui Qin, Ruiming Tang, Weinan Zhang, and Yong Yu. 2024. M-scan: A multi-scenario causal-driven adaptive network for recommendation. In Proceedings of the ACM Web Conference 2024. 3844–3853. 13 A Overall Algorithm Algorithm 1Neural Collaborative Context Engineering (NCCE) Require: Training instances X, warm-up opt...

work page 2024
[48]

, " summary

He is younger than Stephen Cummings ( born in 1954) . However , without the specific age or i d e n t i f i c a t i o n of a former Wonder Girls member , we cannot d e f i n i t i v e l y conclude the claim based on the passages provided ." , " summary ": " The passages provide b i r t h d a t e s for several i n d i v i d u a l s named Stephen , but none...

work page 1954
[49]

The Strange Case Of ...\

Prior to that , it had peaked at number 1 on the Cl as si ca l Digital Songs and number 10 on the Dance / E l e c t r o n i c Digital Songs charts , as well as charting in Germany at number 59." , " Ha le st or m | H ale st or m is an American hard rock band from Red Lion , Pennsylvania , c o n s i s t i n g of lead vocalist and gu it ar is t Lzzy Hale , ...

work page 2009
[50]

Calling All Dawns \

is an American composer of concert music , film , and video game scores . His work is pr im ar il y o r c h e s t r a l and choral , often with a world music inf lu en ce . He has won two Grammy Awards for his cl as si ca l c ro ss ov er album \" Calling All Dawns \"." , " Reaching for the Moon ( album ) | Reaching for the Moon is the third album by jazz ...

work page 1991
[51]

Stephen Pearcy was born on July 3 , 1956

work page 1956
[52]

Stephen Duffy was born on May 30 , 1960

work page 1960
[53]

Stephen Cummings was born on S ep te mbe r 13 , 1954

work page 1954
[54]

None of the above i n d i v i d u a l s were a s s o c i a t e d with Wonder Girls , a South Korean girl group formed in 2007 by JYP E n t e r t a i n m e n t

Stephen Gately was born on March 17 , 1976. None of the above i n d i v i d u a l s were a s s o c i a t e d with Wonder Girls , a South Korean girl group formed in 2007 by JYP E n t e r t a i n m e n t . Therefore , we do not have i n f o r m a t i o n from the passages that e x p l i c i t l y i d e n t i f i e s a former Wonder Girls member to compare ...

work page 1976
[55]

, 30 " summary

He is younger than Stephen Cummings ( born in 1954) . However , without the specific age or i d e n t i f i c a t i o n of a former Wonder Girls member , we cannot d e f i n i t i v e l y conclude the claim based on the passages provided ." , 30 " summary ": " The passages provide b i r t h d a t e s for several i n d i v i d u a l s named Stephen , but n...

work page 1954
[56]

The Strange Case Of ...\

Prior to that , it had peaked at number 1 on the Cl as si ca l Digital Songs and number 10 on the Dance / E l e c t r o n i c Digital Songs charts , as well as charting in Germany at number 59." , " Ha le st or m | H ale st or m is an American hard rock band from Red Lion , Pennsylvania , c o n s i s t i n g of lead vocalist and gu it ar is t Lzzy Hale , ...

work page 2009
[57]

Calling All Dawns \

is an American composer of concert music , film , and video game scores . His work is pr im ar il y o r c h e s t r a l and choral , often with a world music inf lu en ce . He has won two Grammy Awards for his cl as si ca l c ro ss ov er album \" Calling All Dawns \"." , " Reaching for the Moon ( album ) | Reaching for the Moon is the third album by jazz ...

work page 1991
[58]

C o n t r a d i c t i o n s Collapse \

is a Romanian - American s ci ent is t who is the current Pr of es so r of Ecology in the D e p a r t m e n t of Land Re so ur ce s and E n v i r o n m e n t a l Sciences at Montana State U n i v e r s i t y . He is a pr in ci pa l i n v e s t i g a t o r in the McMurdo Dry Valleys Long Term E c o l o g i c a l Research ( LTER ) project ." , " None ( Mes ...

work page 1994
[59]

The Voice of the Civil Rights Movement \

, known as Odetta , was an American singer , actress , guitarist , songwriter , and a civil and human rights activist , often referred to as \" The Voice of the Civil Rights Movement \". Her musical r e p e r t o i r e co ns is te d largely of American folk music , blues , jazz , and s p i r i t u a l s . An im po rt an t figure in the American folk music...

work page 1950
[60]

C o n t r a d i c t i o n s Collapse \

is a Romanian - American s ci ent is t who is the current Pr of es so r of Ecology in the D e p a r t m e n t of Land Re so ur ce s and E n v i r o n m e n t a l Sciences at Montana State U n i v e r s i t y . He is a pr in ci pa l i n v e s t i g a t o r in the McMurdo Dry Valleys Long Term E c o l o g i c a l Research ( LTER ) project ." , " None ( Mes ...

work page 1994
[61]

The ‘ search_query ‘ should target r e t r i e v i n g the missing i n f o r m a t i o n rather than r e i t e r a t i n g what is already in the ‘ context ‘

** Extract Missing or Am bi gu ou s I n f o r m a t i o n **: Focus on i d e n t i f y i n g gaps or a m b i g u i t i e s in the ‘ context ‘ that prevent a ns we rin g the question . The ‘ search_query ‘ should target r e t r i e v i n g the missing i n f o r m a t i o n rather than r e i t e r a t i n g what is already in the ‘ context ‘

work page
[62]

** Preserve Key Entities and R e l a t i o n s h i p s **: Ensure all entities ( e . g . , names , dates , titles ) and their r e l a t i o n s h i p s from the question are a c c u r a t e l y i n c o r p o r a t e d into the ‘ search_query ‘. Avoid altering or omitting critical details

work page
[63]

The query should remain neutral and factual , aimed solely at finding the missing pieces of i n f o r m a t i o n

** Avoid Re as oni ng or A s s u m p t i o n s **: Do not include reasoning , explanations , or inferred c o n c l u s i o n s in the ‘ search_query ‘. The query should remain neutral and factual , aimed solely at finding the missing pieces of i n f o r m a t i o n

work page
[64]

** Adapt to S p e c i f i c i t y **: When the question contains highly specific details ( e . g . , dates , names , or unique i d e n t i f i e r s ) , ensure these are included verbatim in the ‘ search_query ‘. Avoid g e n e r a l i z i n g or b r o a d e n i n g the scope u n n e c e s s a r i l y

work page
[65]

The ‘ search_query ‘ should focus e x c l u s i v e l y on u n r e s o l v e d aspects of the question

** Avoid R e d u n d a n c i e s **: Do not include i n f o r m a t i o n already fully resolved in the ‘ context ‘. The ‘ search_query ‘ should focus e x c l u s i v e l y on u n r e s o l v e d aspects of the question

work page
[66]

Mel Groomes ’ alma mater \

** Examples C l a r i f i c a t i o n **: For cases where the question e x p l i c i t l y r e f e r e n c e s an entity or detail absent in the ‘ context ‘ ( e . g . , \" Mel Groomes ’ alma mater \") , p r i o r i t i z e c o n s t r u c t i n g a query that captures the specific missing entity and its r e l a t i o n s h i p to the question ( e . g . , ...

work page 1972
[67]

The Voice of the Civil Rights Movement \

, known as Odetta , was an American singer , actress , guitarist , songwriter , and a civil and human rights activist , often referred to as \" The Voice of the Civil Rights Movement \". Her musical r e p e r t o i r e co ns is te d largely of American folk music , blues , jazz , and s p i r i t u a l s . An im po rt an t figure in the American folk music...

work page 1950
[68]

Pay p a r t i c u l a r a tt en ti on to numeric data , dates , proper nouns , entity names , and other key details

** P rec is io n in T e r m i n o l o g y and Data E x t r a c t i o n **: C ar efu ll y extract and use precise and complete details directly from the context . Pay p a r t i c u l a r a tt en ti on to numeric data , dates , proper nouns , entity names , and other key details . Do not rely on a s s u m p t i o n s or external kn ow le dg e unless e x p l...

work page
[69]

If the context does not directly provide the n ece ss ar y information , e x p l i c i t l y state what is missing and provide an a p p r o p r i a t e fallback response ( e

** C o n t e x t u a l C o m p l e t e n e s s **: R i g o r o u s l y validate that all elements of the re as on in g and the final answer are fully s up por te d by the context . If the context does not directly provide the n ece ss ar y information , e x p l i c i t l y state what is missing and provide an a p p r o p r i a t e fallback response ( e . ...

work page
[70]

Clearly outline how each piece of i n f o r m a t i o n from the context c o n t r i b u t e s to deriving the answer

** Logical Step - by - Step Re as on in g **: C on str uc t the r ea so nin g in a clear , explicit , and l og ica ll y c o n s i s t e n t manner . Clearly outline how each piece of i n f o r m a t i o n from the context c o n t r i b u t e s to deriving the answer . Avoid skipping i n t e r m e d i a t e steps or making vague c o n n e c t i o n s betwe...

work page
[71]

Pay close at te nt io n to details such as specific dates , numeric constraints , entity relationships , and other query - specific nuances

** Query - Specific I n t e r p r e t a t i o n and Nuance Handling **: T h o r o u g h l y analyze the phrasing and implied c o n d i t i o n s in the question . Pay close at te nt io n to details such as specific dates , numeric constraints , entity relationships , and other query - specific nuances . Ensure the re as on in g and answer directly and ful...

work page
[72]

For example : - For date - related queries , cross - check all dates in the context to ensure accuracy

** Error I d e n t i f i c a t i o n and R e s o l u t i o n **: P r o a c t i v e l y validate ex tr ac te d i n f o r m a t i o n against the context to avoid errors . For example : - For date - related queries , cross - check all dates in the context to ensure accuracy . - For numeric or quantity - related queries , verify c a l c u l a t i o n s or e ...

work page
[73]

Avoid guessing or i n t r o d u c i n g u n s u p p o r t e d i n f o r m a t i o n

** Fallback Res po ns es for Am bi gui ty or Missing Context **: If the context does not support a d e f i n i t i v e answer , clearly c o m m u n i c a t e this in the re as on in g and provide a suitable fallback response . Avoid guessing or i n t r o d u c i n g u n s u p p o r t e d i n f o r m a t i o n

work page
[74]

For yes / no questions , use l ow er ca se ( ’ yes ’ , ’no ’)

** Answer F o r m a t t i n g and C o n s i s t e n c y **: Adhere strictly to the expected answer format based on the question or provided feedback . For yes / no questions , use l ow er ca se ( ’ yes ’ , ’no ’) . For other types of queries , ensure the answer matches the exact phrasing or c o n v e n t i o n s present in the context

work page
[75]

, " fields

** Feedback - Informed R e f i n e m e n t **: Where prior e x e c u t i o n s have failed due to i n a c c u r a c i e s or mis in te rp re tat io ns , pay special at te nt io n to similar patterns in future queries . Use lessons from such failures to refine re as on in g and avoid re pe ati ng errors . Failure to adhere to these p r i n c i p l e s will...

work page
[76]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al . 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja Ganu, and Akshay Nambi. 2024. PromptWizard: Task-Aware Prompt Optimization Framework. arXiv:2405.18369 [cs.CL]https://arxiv.org/abs/2405.18369

work page arXiv 2024

[3] [3]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl- Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khat- tab. 2026. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning. arXiv:2507.1...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[4] [4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al . 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

work page 2020

[5] [5]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. InProceedings of the 1st workshop on deep learning for recommender systems. 7–10

work page 2016

[6] [6]

Stéphan Clémençon, Gábor Lugosi, and Nicolas Vayatis. 2008. Ranking and empirical mini- mization of U-statistics. (2008)

work page 2008

[7] [7]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191– 198

work page 2016

[8] [8]

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rock- täschel. 2023. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. arXiv:2309.16797 [cs.CL]https://arxiv.org/abs/2309.16797

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2025. EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers. arXiv:2309.08532 [cs.CL] https://arxiv.org/abs/2309. 08532 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. InProceedings of the 26th international conference on world wide web. 173–182

work page 2017

[11] [11]

Yichen Jiang, Shikha Bordia, Zheng Zhong, Charles Dognin, Maneesh Singh, and Mohit Bansal

work page

[12] [12]

InFindings of the Association for Computational Linguistics: EMNLP 2020

HoVer: A dataset for many-hop fact extraction and claim verification. InFindings of the Association for Computational Linguistics: EMNLP 2020. 3441–3460

work page 2020

[13] [13]

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, Heather Miller, et al. 2023. DSPy: compiling declarative language model calls into state-of-the-art pipelines. InThe Twelfth International Conference on Learning Representations

work page 2023

[14] [14]

Yehuda Koren, Robert Bell, and Chris V olinsky. 2009. Matrix factorization techniques for recommender systems.Computer42, 8 (2009), 30–37

work page 2009

[15] [15]

Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to- item collaborative filtering.IEEE Internet computing7, 1 (2003), 76–80

work page 2003

[16] [16]

Reginald Long, Panupong Pasupat, and Percy Liang. 2016. Simpler context-dependent logical forms via model projections. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers). 1456–1465

work page 2016

[17] [17]

Giovanni Monea, Antoine Bosselut, Kianté Brantley, and Yoav Artzi. 2025. LLMs Are In- Context Bandit Reinforcement Learners. arXiv:2410.05362 [cs.CL] https://arxiv.org/ abs/2410.05362

work page arXiv 2025

[18] [18]

GPT-4o System Card

OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, and etc. 2024. GPT-4o System Card. arXiv:2410.21276 [cs.CL]https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. 2024. Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs. arXiv:2406.11695 [cs.CL] https://arxiv.org/abs/2406. 11695

work page arXiv 2024

[20] [20]

Jiarui Qin, Jiachen Zhu, Bo Chen, Zhirong Liu, Weiwen Liu, Ruiming Tang, Rui Zhang, Yong Yu, and Weinan Zhang. 2022. Rankflow: Joint optimization of multi-stage cascade ranking systems as flows. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 814–824

work page 2022

[21] [21]

Jiarui Qin, Jiachen Zhu, Yankai Liu, Junchao Gao, Jianjie Ying, Chaoxiong Liu, Ding Wang, Junlan Feng, Chao Deng, Xiaozheng Wang, et al . 2023. Learning to distinguish multi-user coupling behaviors for TV recommendation. InProceedings of the sixteenth ACM international conference on web search and data mining. 204–212

work page 2023

[22] [22]

Xuanfei Ren, Allen Nie, Tengyang Xie, and Ching-An Cheng. 2026. POLCA: Stochastic Generative Optimization with LLM.arXiv preprint arXiv:2603.14769(2026)

work page arXiv 2026

[23] [23]

Steffen Rendle. 2010. Factorization machines. In2010 IEEE International conference on data mining. IEEE, 995–1000

work page 2010

[24] [24]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback.arXiv preprint arXiv:1205.2618(2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012

[25] [25]

Raparthy, Andrei Lupu, Eric Hambro, Aram H

Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, and Roberta Raileanu. 2024. Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts. arXiv:2402.16822 [cs.CL] https://arxiv.org/abs/2402. 16822

work page arXiv 2024

[26] [26]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. InProceedings of the 10th international conference on World Wide Web. 285–295. 11

work page 2001

[27] [27]

Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Au- toencoders meet collaborative filtering. InProceedings of the 24th international conference on World Wide Web. 111–112

work page 2015

[28] [28]

Rong Shan, Jiachen Zhu, Jianghao Lin, Chenxu Zhu, Bo Chen, Ruiming Tang, Yong Yu, and Weinan Zhang. 2025. Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation.ACM Transactions on Recommender Systems4, 2 (2025), 1–33

work page 2025

[29] [29]

2025.OpenEvolve: an open-source evolutionary coding agent

Asankhaya Sharma. 2025.OpenEvolve: an open-source evolutionary coding agent. https: //github.com/algorithmicsuperintelligence/openevolve

work page 2025

[30] [30]

Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366 [cs.AI]https://arxiv.org/abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al . 2023. Llama 2: open foundation and fine-tuned chat models. arXiv.arXiv preprint arXiv:2307.0928810 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M Ni, et al. 2024. Openr: An open source framework for advanced reasoning with large language models.arXiv preprint arXiv:2410.09671(2024)

work page arXiv 2024

[33] [33]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Ad- vances in neural information processing systems33 (2020), 5776–5788

work page 2020

[34] [34]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Large Language Models as Optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. 2024. Large Language Models as Optimizers. arXiv:2309.03409 [cs.LG] https://arxiv.org/abs/2309.03409

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 conference on empirical methods in natural language processing. 2369–2380

work page 2018

[37] [37]

TextGrad: Automatic "Differentiation" via Text

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou. 2024. TextGrad: Automatic "Differentiation" via Text. arXiv:2406.07496 [cs.CL] https://arxiv.org/abs/2406.07496

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Muhan Zhang and Yixin Chen. 2019. Inductive matrix completion based on graph neural networks.arXiv preprint arXiv:1904.12058(2019)

work page arXiv 2019

[39] [39]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al . 2023. A survey of large language models. arXiv preprint arXiv:2303.182231, 2 (2023), 1–124

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. InInternational conference on machine learning. Pmlr, 12697–12706

work page 2021

[41] [41]

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023. Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910 [cs.LG]https://arxiv.org/abs/2211.01910

work page internal anchor Pith review arXiv 2023

[42] [42]

Jiachen Zhu, Jianghao Lin, Xinyi Dai, Bo Chen, Rong Shan, Jieming Zhu, Ruiming Tang, Yong Yu, and Weinan Zhang. 2024. Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation. arXiv:2408.03533 [cs.IR] https://arxiv.org/abs/2408. 03533 12

work page arXiv 2024

[43] [43]

blind spots

Jiachen Zhu, Yichao Wang, Jianghao Lin, Jiarui Qin, Ruiming Tang, Weinan Zhang, and Yong Yu. 2024. M-scan: A multi-scenario causal-driven adaptive network for recommendation. In Proceedings of the ACM Web Conference 2024. 3844–3853. 13 A Overall Algorithm Algorithm 1Neural Collaborative Context Engineering (NCCE) Require: Training instances X, warm-up opt...

work page 2024

[44] [48]

, " summary

He is younger than Stephen Cummings ( born in 1954) . However , without the specific age or i d e n t i f i c a t i o n of a former Wonder Girls member , we cannot d e f i n i t i v e l y conclude the claim based on the passages provided ." , " summary ": " The passages provide b i r t h d a t e s for several i n d i v i d u a l s named Stephen , but none...

work page 1954

[45] [49]

The Strange Case Of ...\

Prior to that , it had peaked at number 1 on the Cl as si ca l Digital Songs and number 10 on the Dance / E l e c t r o n i c Digital Songs charts , as well as charting in Germany at number 59." , " Ha le st or m | H ale st or m is an American hard rock band from Red Lion , Pennsylvania , c o n s i s t i n g of lead vocalist and gu it ar is t Lzzy Hale , ...

work page 2009

[46] [50]

Calling All Dawns \

is an American composer of concert music , film , and video game scores . His work is pr im ar il y o r c h e s t r a l and choral , often with a world music inf lu en ce . He has won two Grammy Awards for his cl as si ca l c ro ss ov er album \" Calling All Dawns \"." , " Reaching for the Moon ( album ) | Reaching for the Moon is the third album by jazz ...

work page 1991

[47] [51]

Stephen Pearcy was born on July 3 , 1956

work page 1956

[48] [52]

Stephen Duffy was born on May 30 , 1960

work page 1960

[49] [53]

Stephen Cummings was born on S ep te mbe r 13 , 1954

work page 1954

[50] [54]

None of the above i n d i v i d u a l s were a s s o c i a t e d with Wonder Girls , a South Korean girl group formed in 2007 by JYP E n t e r t a i n m e n t

Stephen Gately was born on March 17 , 1976. None of the above i n d i v i d u a l s were a s s o c i a t e d with Wonder Girls , a South Korean girl group formed in 2007 by JYP E n t e r t a i n m e n t . Therefore , we do not have i n f o r m a t i o n from the passages that e x p l i c i t l y i d e n t i f i e s a former Wonder Girls member to compare ...

work page 1976

[51] [55]

, 30 " summary

He is younger than Stephen Cummings ( born in 1954) . However , without the specific age or i d e n t i f i c a t i o n of a former Wonder Girls member , we cannot d e f i n i t i v e l y conclude the claim based on the passages provided ." , 30 " summary ": " The passages provide b i r t h d a t e s for several i n d i v i d u a l s named Stephen , but n...

work page 1954

[52] [56]

The Strange Case Of ...\

Prior to that , it had peaked at number 1 on the Cl as si ca l Digital Songs and number 10 on the Dance / E l e c t r o n i c Digital Songs charts , as well as charting in Germany at number 59." , " Ha le st or m | H ale st or m is an American hard rock band from Red Lion , Pennsylvania , c o n s i s t i n g of lead vocalist and gu it ar is t Lzzy Hale , ...

work page 2009

[53] [57]

Calling All Dawns \

is an American composer of concert music , film , and video game scores . His work is pr im ar il y o r c h e s t r a l and choral , often with a world music inf lu en ce . He has won two Grammy Awards for his cl as si ca l c ro ss ov er album \" Calling All Dawns \"." , " Reaching for the Moon ( album ) | Reaching for the Moon is the third album by jazz ...

work page 1991

[54] [58]

C o n t r a d i c t i o n s Collapse \

is a Romanian - American s ci ent is t who is the current Pr of es so r of Ecology in the D e p a r t m e n t of Land Re so ur ce s and E n v i r o n m e n t a l Sciences at Montana State U n i v e r s i t y . He is a pr in ci pa l i n v e s t i g a t o r in the McMurdo Dry Valleys Long Term E c o l o g i c a l Research ( LTER ) project ." , " None ( Mes ...

work page 1994

[55] [59]

The Voice of the Civil Rights Movement \

, known as Odetta , was an American singer , actress , guitarist , songwriter , and a civil and human rights activist , often referred to as \" The Voice of the Civil Rights Movement \". Her musical r e p e r t o i r e co ns is te d largely of American folk music , blues , jazz , and s p i r i t u a l s . An im po rt an t figure in the American folk music...

work page 1950

[56] [60]

C o n t r a d i c t i o n s Collapse \

is a Romanian - American s ci ent is t who is the current Pr of es so r of Ecology in the D e p a r t m e n t of Land Re so ur ce s and E n v i r o n m e n t a l Sciences at Montana State U n i v e r s i t y . He is a pr in ci pa l i n v e s t i g a t o r in the McMurdo Dry Valleys Long Term E c o l o g i c a l Research ( LTER ) project ." , " None ( Mes ...

work page 1994

[57] [61]

The ‘ search_query ‘ should target r e t r i e v i n g the missing i n f o r m a t i o n rather than r e i t e r a t i n g what is already in the ‘ context ‘

** Extract Missing or Am bi gu ou s I n f o r m a t i o n **: Focus on i d e n t i f y i n g gaps or a m b i g u i t i e s in the ‘ context ‘ that prevent a ns we rin g the question . The ‘ search_query ‘ should target r e t r i e v i n g the missing i n f o r m a t i o n rather than r e i t e r a t i n g what is already in the ‘ context ‘

work page

[58] [62]

** Preserve Key Entities and R e l a t i o n s h i p s **: Ensure all entities ( e . g . , names , dates , titles ) and their r e l a t i o n s h i p s from the question are a c c u r a t e l y i n c o r p o r a t e d into the ‘ search_query ‘. Avoid altering or omitting critical details

work page

[59] [63]

The query should remain neutral and factual , aimed solely at finding the missing pieces of i n f o r m a t i o n

** Avoid Re as oni ng or A s s u m p t i o n s **: Do not include reasoning , explanations , or inferred c o n c l u s i o n s in the ‘ search_query ‘. The query should remain neutral and factual , aimed solely at finding the missing pieces of i n f o r m a t i o n

work page

[60] [64]

** Adapt to S p e c i f i c i t y **: When the question contains highly specific details ( e . g . , dates , names , or unique i d e n t i f i e r s ) , ensure these are included verbatim in the ‘ search_query ‘. Avoid g e n e r a l i z i n g or b r o a d e n i n g the scope u n n e c e s s a r i l y

work page

[61] [65]

The ‘ search_query ‘ should focus e x c l u s i v e l y on u n r e s o l v e d aspects of the question

** Avoid R e d u n d a n c i e s **: Do not include i n f o r m a t i o n already fully resolved in the ‘ context ‘. The ‘ search_query ‘ should focus e x c l u s i v e l y on u n r e s o l v e d aspects of the question

work page

[62] [66]

Mel Groomes ’ alma mater \

** Examples C l a r i f i c a t i o n **: For cases where the question e x p l i c i t l y r e f e r e n c e s an entity or detail absent in the ‘ context ‘ ( e . g . , \" Mel Groomes ’ alma mater \") , p r i o r i t i z e c o n s t r u c t i n g a query that captures the specific missing entity and its r e l a t i o n s h i p to the question ( e . g . , ...

work page 1972

[63] [67]

The Voice of the Civil Rights Movement \

, known as Odetta , was an American singer , actress , guitarist , songwriter , and a civil and human rights activist , often referred to as \" The Voice of the Civil Rights Movement \". Her musical r e p e r t o i r e co ns is te d largely of American folk music , blues , jazz , and s p i r i t u a l s . An im po rt an t figure in the American folk music...

work page 1950

[64] [68]

Pay p a r t i c u l a r a tt en ti on to numeric data , dates , proper nouns , entity names , and other key details

** P rec is io n in T e r m i n o l o g y and Data E x t r a c t i o n **: C ar efu ll y extract and use precise and complete details directly from the context . Pay p a r t i c u l a r a tt en ti on to numeric data , dates , proper nouns , entity names , and other key details . Do not rely on a s s u m p t i o n s or external kn ow le dg e unless e x p l...

work page

[65] [69]

If the context does not directly provide the n ece ss ar y information , e x p l i c i t l y state what is missing and provide an a p p r o p r i a t e fallback response ( e

** C o n t e x t u a l C o m p l e t e n e s s **: R i g o r o u s l y validate that all elements of the re as on in g and the final answer are fully s up por te d by the context . If the context does not directly provide the n ece ss ar y information , e x p l i c i t l y state what is missing and provide an a p p r o p r i a t e fallback response ( e . ...

work page

[66] [70]

Clearly outline how each piece of i n f o r m a t i o n from the context c o n t r i b u t e s to deriving the answer

** Logical Step - by - Step Re as on in g **: C on str uc t the r ea so nin g in a clear , explicit , and l og ica ll y c o n s i s t e n t manner . Clearly outline how each piece of i n f o r m a t i o n from the context c o n t r i b u t e s to deriving the answer . Avoid skipping i n t e r m e d i a t e steps or making vague c o n n e c t i o n s betwe...

work page

[67] [71]

Pay close at te nt io n to details such as specific dates , numeric constraints , entity relationships , and other query - specific nuances

** Query - Specific I n t e r p r e t a t i o n and Nuance Handling **: T h o r o u g h l y analyze the phrasing and implied c o n d i t i o n s in the question . Pay close at te nt io n to details such as specific dates , numeric constraints , entity relationships , and other query - specific nuances . Ensure the re as on in g and answer directly and ful...

work page

[68] [72]

For example : - For date - related queries , cross - check all dates in the context to ensure accuracy

** Error I d e n t i f i c a t i o n and R e s o l u t i o n **: P r o a c t i v e l y validate ex tr ac te d i n f o r m a t i o n against the context to avoid errors . For example : - For date - related queries , cross - check all dates in the context to ensure accuracy . - For numeric or quantity - related queries , verify c a l c u l a t i o n s or e ...

work page

[69] [73]

Avoid guessing or i n t r o d u c i n g u n s u p p o r t e d i n f o r m a t i o n

** Fallback Res po ns es for Am bi gui ty or Missing Context **: If the context does not support a d e f i n i t i v e answer , clearly c o m m u n i c a t e this in the re as on in g and provide a suitable fallback response . Avoid guessing or i n t r o d u c i n g u n s u p p o r t e d i n f o r m a t i o n

work page

[70] [74]

For yes / no questions , use l ow er ca se ( ’ yes ’ , ’no ’)

** Answer F o r m a t t i n g and C o n s i s t e n c y **: Adhere strictly to the expected answer format based on the question or provided feedback . For yes / no questions , use l ow er ca se ( ’ yes ’ , ’no ’) . For other types of queries , ensure the answer matches the exact phrasing or c o n v e n t i o n s present in the context

work page

[71] [75]

, " fields

** Feedback - Informed R e f i n e m e n t **: Where prior e x e c u t i o n s have failed due to i n a c c u r a c i e s or mis in te rp re tat io ns , pay special at te nt io n to similar patterns in future queries . Use lessons from such failures to refine re as on in g and avoid re pe ati ng errors . Failure to adhere to these p r i n c i p l e s will...

work page

[72] [76]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page