arxiv: 2605.00369 · v3 · submitted 2026-05-01 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

Chenyu Huang , Jianghao Lin , Zhengyang Tang , Bo Jiang , Ruoqing Jiang , Benyou Wang , Lai Wei

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords inventory policy evolutionlarge language modelswhite-box policiesstatistical safety guaranteesnon-stationary environmentsconfidence intervalsreinforcement learning

0 comments

The pith

InvEvolve uses large language models to evolve white-box inventory policies with statistical safety guarantees and a lower bound on success probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents InvEvolve as a framework that trains large language models via reinforcement learning to generate inventory policies from demand data and additional features. It adds confidence-interval certification so that deployed policies carry statistical safety assurances for future periods. A unified theoretical model ties training, inference, and deployment together to produce a lower bound on the chance that an evolved policy is both safe and better than benchmarks. The same model also quantifies the multi-period performance gap relative to an oracle. Experiments on synthetic instances and real retail data show the evolved policies beat classical methods and deep learning baselines.

Core claim

InvEvolve evolves new policies that improve upon existing benchmarks and provides a lower bound on the probability that it evolves a statistically safe and improved policy, with outperformance shown on both synthetic data and real-world retail data.

What carries the argument

The end-to-end framework that combines LLM-based evolutionary search with confidence-interval-based certification, backed by a unified theoretical model linking training, inference, and deployment.

If this is right

Evolved policies come with explicit statistical safety guarantees that can be used directly in deployment decisions.
The framework handles non-stationary demand together with numerical and textual features.
It produces white-box policies whose logic remains interpretable after evolution.
Multi-period performance gaps relative to the oracle benchmark are characterized in closed form.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same certification approach could be tested on adjacent sequential decisions such as dynamic pricing or capacity allocation.
Because the policies remain white-box, they may be easier to audit or combine with domain constraints than black-box neural policies.
If the derived probability bounds prove tight in practice, the method could reduce the sample size needed to validate AI-generated operational rules.

Load-bearing premise

The unified theoretical model correctly connects training, inference, and deployment to deliver a valid lower bound on the probability of a safe improved policy and an accurate characterization of the multi-period performance gap relative to the oracle-safe benchmark.

What would settle it

Repeatedly apply InvEvolve to fresh inventory instances and observe whether the empirical fraction of safe improved policies falls below the stated lower bound.

read the original abstract

We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose InvEvolve, an end-to-end inventory policy evolution and inference framework grounded in confidence-interval-based certification. Built on a large language model trained via reinforcement learning, InvEvolve can process demand data together with additional numerical and textual features, and generates white-box inventory policies with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical model that connects training, inference, and deployment. This allows us to drive a lower bound on the probability that InvEvolve evolves a statistically safe and improved policy, and to characterize the multi-period performance gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, InvEvolve outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

InvEvolve adapts LLM evolutionary search to non-stationary inventory with CI certification and a claimed lower bound, but the bound's tightness and empirical details will decide its weight.

read the letter

InvEvolve adapts LLM evolutionary search to non-stationary inventory with CI certification and a claimed lower bound, but the bound's tightness and empirical details will decide its weight. The paper takes the AlphaEvolve style of LLM-driven evolution and moves it into online inventory control where demand shifts over time. It trains the model with reinforcement learning so the LLM can take demand sequences plus extra numerical and textual features, then output white-box policies that come with statistical safety checks for future periods. A unified model links the training stage to inference and deployment, which lets them state a lower bound on the probability of evolving a safe and improved policy and describe the gap to an oracle benchmark. Tests on synthetic data and real retail data show outperformance against classical policies and deep learning baselines. The white-box output and the attempt to add guarantees are the parts that feel most useful for actual supply-chain work, where managers need rules they can inspect and trust under changing conditions. The framework looks internally consistent at the level described, with no obvious circularity between the evolutionary process and the certification step. The softer spots sit in the specifics of that lower bound and the performance-gap characterization. How tight the bound ends up being depends on assumptions about the LLM's evolution behavior and the non-stationarity pattern; if those assumptions are strong, the practical value of the guarantee shrinks. The empirical claims are stated without visible effect sizes, run-to-run variability, or exact exclusion criteria for the real-world data, so it is hard to judge how large or reliable the gains are. This work is aimed at people who combine operations research with LLM methods, especially those who want interpretable policies rather than black-box forecasts. A reader already working on certified decision rules or retail inventory would find the setup worth examining even if the numbers need closer checking. I would send it for peer review. The integration is specific enough and the guarantees are positioned clearly enough that referees can test the derivations and the experiments directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes InvEvolve, an end-to-end framework that leverages large language models trained with reinforcement learning to evolve white-box inventory policies for online, non-stationary environments. Grounded in confidence-interval-based certification, it generates policies with statistical safety guarantees. A unified theoretical model is introduced to connect training, inference, and deployment, enabling a lower bound on the probability of evolving a statistically safe and improved policy and characterizing the multi-period performance gap to an oracle-safe benchmark. Experiments on synthetic and real-world retail data demonstrate outperformance over classical inventory policies and deep learning-based methods.

Significance. If the theoretical lower bound and empirical outperformance hold under scrutiny, this work represents a significant advancement in applying LLMs to dynamic decision-making problems in operations research. The integration of evolutionary search with formal guarantees addresses key limitations in prior LLM-based optimization methods for online settings, potentially enabling safer deployment of AI-generated policies in inventory management. The white-box nature of the evolved policies is an additional strength for interpretability.

major comments (2)

[Unified theoretical model] The section presenting the unified theoretical model: the derivation of the lower bound on the probability that InvEvolve evolves a statistically safe and improved policy must be shown to be independent of the fitted RL parameters and confidence intervals used during training; if the bound is constructed from the same data-dependent quantities that define the evolved policy, it risks circularity and does not constitute a genuine performance guarantee.
[Empirical evaluation] The experimental evaluation section: the claims of outperformance on synthetic and real-world retail data require explicit reporting of the exact baselines (including parameter settings for classical policies), number of independent runs, statistical tests, and how non-stationary demand sequences are generated or split to ensure the reported improvements are not attributable to post-hoc selection or specific data characteristics.

minor comments (2)

[Introduction] The citation to AlphaEvolve in the introduction should include the full bibliographic details rather than a high-level reference.
[Method] Notation for demand features, textual inputs, and policy parameters should be defined once and used consistently to avoid ambiguity in the description of the LLM input processing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions to strengthen the presentation and rigor.

read point-by-point responses

Referee: [Unified theoretical model] The section presenting the unified theoretical model: the derivation of the lower bound on the probability that InvEvolve evolves a statistically safe and improved policy must be shown to be independent of the fitted RL parameters and confidence intervals used during training; if the bound is constructed from the same data-dependent quantities that define the evolved policy, it risks circularity and does not constitute a genuine performance guarantee.

Authors: We appreciate the referee highlighting this potential issue with the theoretical guarantee. In the unified model, the lower bound is derived from the structural properties of the evolutionary search combined with the conservative nature of the confidence-interval certification procedure, which is defined at the model level prior to any parameter fitting. The bound relies on worst-case assumptions over demand distributions and certification thresholds rather than the specific fitted RL parameters or realized confidence intervals from training data. The policy evolution and subsequent certification are sequential, with the probability statement holding uniformly. To remove any ambiguity regarding independence, we will add an explicit remark and a short proof sketch in the revised theoretical section (and appendix) demonstrating that the lower bound expression does not depend on the particular values of the fitted parameters or the data-dependent intervals used to certify the final policy. revision: partial
Referee: [Empirical evaluation] The experimental evaluation section: the claims of outperformance on synthetic and real-world retail data require explicit reporting of the exact baselines (including parameter settings for classical policies), number of independent runs, statistical tests, and how non-stationary demand sequences are generated or split to ensure the reported improvements are not attributable to post-hoc selection or specific data characteristics.

Authors: We agree that additional experimental details are necessary for full reproducibility and to rule out selection effects. In the revised manuscript we will expand the experimental setup to report: (i) exact baseline configurations, including classical policies such as base-stock levels computed via dynamic programming on training data and (s,S) policies with parameters obtained by grid search over historical costs; (ii) all results as averages over 30 independent runs with different random seeds, accompanied by standard errors; (iii) statistical significance assessed via paired t-tests and Wilcoxon signed-rank tests with p-values; and (iv) precise generation and splitting procedures for non-stationary demands (synthetic sequences generated via time-varying Poisson processes with sinusoidal trends plus Gaussian noise; real retail data split chronologically with training on the first 80% of periods and testing on the final 20% to prevent leakage). These additions will be placed in a dedicated experimental details subsection and will be reflected in updated tables and figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes InvEvolve as an LLM-based evolutionary framework for inventory policies, grounded in confidence-interval certification, with a unified theoretical model claimed to connect training, inference, and deployment phases. This model is asserted to yield a lower bound on the probability of evolving a statistically safe and improved policy plus a characterization of the multi-period performance gap to an oracle benchmark. No equations, derivations, or self-citations are exhibited in the provided abstract or high-level description that reduce these bounds or characterizations to fitted parameters, self-definitions, or prior author results by construction. Empirical claims of outperformance on synthetic and retail data are presented as independent validation. The derivation chain therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available. The central unverified element is the unified theoretical model that is asserted to produce the probability lower bound and performance gap; no free parameters or new entities are named.

axioms (1)

domain assumption A unified theoretical model connects training, inference, and deployment to produce a lower bound on the probability that an evolved policy is statistically safe and improved.
Abstract states that this model allows driving the lower bound and characterizing the multi-period performance gap.

pith-pipeline@v0.9.0 · 5522 in / 1361 out tokens · 55098 ms · 2026-05-12T02:31:00.648612+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

unified theoretical model that connects training, inference, and deployment... lower bound on the probability that InvEvolve evolves a statistically safe and improved policy
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1... pK(Gtr) ≥ 1/(1+ρK) with exponential concentration on good region

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

176 extracted references · 176 canonical work pages · 10 internal anchors

[1]

2000 , publisher=

Foundations for Inventory Theory , author=. 2000 , publisher=

work page 2000
[2]

Model-free reinforcement learning:

Zhang, Zihan and Zhou, Yuan and Ji, Xiangyang , booktitle=. Model-free reinforcement learning:. 2021 , organization=

work page 2021
[3]

Econometrica , volume =

Optimal Inventory Policy , author =. Econometrica , volume =. 1951 , doi =

work page 1951
[4]

Management Science , volume =

Optimal Policies for a Multi-Echelon Inventory Problem , author =. Management Science , volume =. 1960 , doi =

work page 1960
[5]

Operations Research , volume =

Old and New Methods for Lost-Sales Inventory Systems , author =. Operations Research , volume =. 2008 , doi =

work page 2008
[6]

Operations Research , volume =

The Big Data Newsvendor: Practical Insights from Machine Learning , author =. Operations Research , volume =. 2019 , doi =

work page 2019
[7]

Management Science , volume =

From Predictive to Prescriptive Analytics , author =. Management Science , volume =. 2020 , doi =

work page 2020
[8]

Management Science , volume =

Smart ``Predict, then Optimize'' , author =. Management Science , volume =. 2022 , doi =

work page 2022
[9]

Management Science , volume =

Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times , author =. Management Science , volume =. 2020 , doi =

work page 2020
[10]

Manufacturing & Service Operations Management , volume =

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization , author =. Manufacturing & Service Operations Management , volume =. 2022 , doi =

work page 2022
[11]

Operations Research , volume =

ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling , author =. Operations Research , volume =. 2025 , doi =

work page 2025
[12]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author =. arXiv preprint arXiv:2402.03300 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[13]

The Fourteenth International Conference on Learning Representations (ICLR) , year =

StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models , author =. The Fourteenth International Conference on Learning Representations (ICLR) , year =

work page
[14]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[15]

The Complete Journey , year =

work page
[16]

Large-scale optimization model auto-formulation: Harnessing LLM flexibility via structured workflow.arXiv preprint arXiv:2601.09635, 2026

Large-scale optimization model auto-formulation: Harnessing LLM flexibility via structured workflow , author=. arXiv preprint arXiv:2601.09635 , year=

work page arXiv
[17]

InvAgent: Alargelanguagemodelbasedmulti-agentinventorymanagementsystem.arXiv preprint arXiv:2407.11384v1,

Quan, Yinzhu and Liu, Zefang , title =. 2024 , eprint =. doi:10.48550/arXiv.2407.11384 , url =

work page doi:10.48550/arxiv.2407.11384 2024
[18]

Management Science , volume =

A Practical End-to-End Inventory Management Model with Deep Learning , author =. Management Science , volume =. 2023 , doi =

work page 2023
[19]

arXiv preprint arXiv:2603.19621 , year=

Deepstock: Reinforcement learning with policy regularizations for inventory management , author=. arXiv preprint arXiv:2603.19621 , year=

work page arXiv
[20]

Proceedings of the 42nd International Conference on Machine Learning , series =

Autoformulation of Mathematical Optimization Models Using LLMs , author =. Proceedings of the 42nd International Conference on Machine Learning , series =

work page
[21]

Nature , volume =

Mathematical Discoveries from Program Search with Large Language Models , author =. Nature , volume =. 2024 , doi =

work page 2024
[22]

Advances in Neural Information Processing Systems , year =

ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution , author =. Advances in Neural Information Processing Systems , year =

work page
[23]

Advances in Neural Information Processing Systems , year =

Large Language Models as End-to-End Combinatorial Optimization Solvers , author =. Advances in Neural Information Processing Systems , year =

work page
[24]

arXiv preprint arXiv:2309.13830 , year=

Deep neural newsvendor , author=. arXiv preprint arXiv:2309.13830 , year=

work page arXiv
[25]

arXiv preprint arXiv:2308.05617 , year=

A neural network based choice model for assortment optimization , author=. arXiv preprint arXiv:2308.05617 , year=

work page arXiv
[26]

Data augmentation using

Ding, Bosheng and Qin, Chengwei and Zhao, Ruochen and Luo, Tianze and Li, Xinze and Chen, Guizhen and Xia, Wenhan and Hu, Junjie and Luu, Anh Tuan and Joty, Shafiq , journal=. Data augmentation using

work page
[27]

CIO Insight , year=

Decision Intelligence: Definition and Examples , author=. CIO Insight , year=

work page
[28]

arXiv preprint arXiv:2207.12877 , year=

Representing random utility choice models with neural networks , author=. arXiv preprint arXiv:2207.12877 , year=

work page arXiv
[29]

Neurocomputing , volume=

Probabilistic forecasting with temporal convolutional neural network , author=. Neurocomputing , volume=. 2020 , publisher=

work page 2020
[30]

Zaremba, Wojciech and Brockman, Greg and Others , year=. Open. OpenAI Blog , url=

work page
[31]

Chatlaw: Open- source legal large language model with integrated exter- nal knowledge bases

Chatlaw: Open-source legal large language model with integrated external knowledge bases , author=. arXiv preprint arXiv:2306.16092 , year=

work page arXiv
[32]

Customer Needs and Solutions , volume=

Sentiment Analysis in the Age of Generative AI , author=. Customer Needs and Solutions , volume=. 2024 , publisher=

work page 2024
[33]

Hu- atuo: Tuning llama model with chinese medical knowledge

Huatuo: Tuning llama model with chinese medical knowledge , author=. arXiv preprint arXiv:2304.06975 , year=

work page arXiv
[34]

Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[35]

2013 , url =

INFORMS , title =. 2013 , url =

work page 2013
[36]

Nature , volume=

Solving olympiad geometry without human demonstrations , author=. Nature , volume=. 2024 , publisher=

work page 2024
[37]

Advanced Analytics Drives Reengineering of Field Operations for the 2020

Adams, Tamara and Ferrucci, Alessandro and Carvalho, Pedro and Em, Sothiara and Whitley, Benjamin and Cecchi, Ryan and Hicks, Teresa and Wooten, Alexander and Cuffe, John and Studds, Stephanie and others , journal=. Advanced Analytics Drives Reengineering of Field Operations for the 2020. 2023 , publisher=

work page 2020
[38]

TGVx: Dynamic personalized

Wang, Xiao-Jun and Liu, Tao and Fan, Weiguo , journal=. TGVx: Dynamic personalized. 2023 , publisher=

work page 2023
[39]

International Conference on Artificial Intelligence and Statistics , pages=

Distributionally robust off-dynamics reinforcement learning: Provable efficiency with linear function approximation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024
[40]

Xu, Yifan and Liu, Xiao and Liu, Xinghan and Hou, Zhenyu and Li, Yueyan and Zhang, Xiaohan and Wang, Zihan and Zeng, Aohan and Du, Zhengxiao and Zhao, Wenyi and others , journal=. Chat

work page
[41]

arXiv preprint arXiv:2403.02884 , year=

Mathscale: Scaling instruction tuning for mathematical reasoning , author=. arXiv preprint arXiv:2403.02884 , year=

work page arXiv
[42]

arXiv preprint arXiv:2401.04997 , year=

Prompting large language models for recommender systems: A comprehensive framework and empirical analysis , author=. arXiv preprint arXiv:2401.04997 , year=

work page arXiv
[43]

2020 , note =

Amazon Robotics: Innovation in Fulfillment Center Operations , author =. 2020 , note =

work page 2020
[44]

Stochastic Systems , volume=

Queueing network controls via deep reinforcement learning , author=. Stochastic Systems , volume=. 2022 , publisher=

work page 2022
[45]

Journal of Hydrology , volume=

An overview of the optimization modelling applications , author=. Journal of Hydrology , volume=. 2012 , publisher=

work page 2012
[46]

Journal of Big Data , volume=

A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications , author=. Journal of Big Data , volume=. 2023 , publisher=

work page 2023
[47]

2023 , organization=

Ramamonjison, Rindranirina and Yu, Timothy and Li, Raymond and Li, Haley and Carenini, Giuseppe and Ghaddar, Bissan and He, Shiqi and Mostajabdaveh, Mahdi and Banitalebi-Dehkordi, Amin and Zhou, Zirui and others , booktitle=. 2023 , organization=

work page 2023
[48]

Chain-of-Experts: When

Xiao, Ziyang and Zhang, Dongxiang and Wu, Yangjun and Xu, Lilin and Wang, Yuan Jessica and Han, Xiongwei and Fu, Xiaojin and Zhong, Tao and Zeng, Jia and Song, Mingli and others , booktitle=. Chain-of-Experts: When

work page
[49]

OptiMUS: Scalable Optimization Modeling with (

AhmadiTeshnizi, Ali and Gao, Wenzhi and Udell, Madeleine , journal=. OptiMUS: Scalable Optimization Modeling with (

work page
[50]

arXiv preprint arXiv:2307.03875 , year=

Large language models for supply chain optimization , author=. arXiv preprint arXiv:2307.03875 , year=

work page arXiv
[51]

2024 , url =

Llama 3 Model Card , author=. 2024 , url =

work page 2024
[52]

Journal of Computer and System Sciences , volume=

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , author=. Journal of Computer and System Sciences , volume=

work page
[53]

Journal of the American Statistical Association , volume=

Probability Inequalities for Sums of Bounded Random Variables , author=. Journal of the American Statistical Association , volume=

work page
[54]

Claude Code overview , year=

work page
[55]

Operations Research , volume=

Technical Note---Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models , author=. Operations Research , volume=. 2021 , doi=

work page 2021
[56]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[57]

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=

work page
[58]

Linear programming word problems formulation using ensemble

He, JiangLong and Vignesh, Shiv and Kumar, Deepak and Uppal, Akshay and others , journal=. Linear programming word problems formulation using ensemble

work page
[59]

arXiv preprint arXiv:2302.04643 , year=

A novel approach for auto-formulation of optimization problems , author=. arXiv preprint arXiv:2302.04643 , year=

work page arXiv
[60]

arXiv preprint arXiv:2304.03287 , year=

Synthesis of mathematical programs from natural language specifications , author=. arXiv preprint arXiv:2304.03287 , year=

work page arXiv
[61]

arXiv preprint arXiv:2311.15271 , year=

Synthesizing mixed-integer linear programming models from natural language descriptions , author=. arXiv preprint arXiv:2311.15271 , year=

work page arXiv
[62]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

work page
[63]

Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion: An autonomous agent with dynamic memory and self-reflection , author=. arXiv preprint arXiv:2303.11366 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[64]

Joty and Steven C

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation , author=. arXiv preprint arXiv:2109.00859 , year=

work page arXiv
[65]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[66]

PaLM 2 Technical Report

Palm 2 technical report , author=. arXiv preprint arXiv:2305.10403 , year=

work page internal anchor Pith review arXiv
[67]

Mathematical

Berry, John and Houston, Ken , year=. Mathematical

work page
[68]

How Far Can Camels Go?

Wang, Yizhong and Ivison, Hamish and Dasigi, Pradeep and Hessel, Jack and Khot, Tushar and Chandu, Khyathi Raghavi and Wadden, David and MacMillan, Kelsey and Smith, Noah A and Beltagy, Iz and others , journal=. How Far Can Camels Go?

work page
[69]

Dense Passage Retrieval for Open-Domain Question Answering

Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau , booktitle =. Dense Passage Retrieval for Open-Domain Question Answering , url =. doi:10.18653/v1/2020.emnlp-main.550 , pages =

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[70]

Muennighoff, Niklas and Tazi, Nouamane and Magne, Loic and Reimers, Nils , booktitle =

work page
[71]

Tezak and Jong Wook Kim and Chris Hallacy and Johannes Heidecke and Pranav Shyam and Boris Power and Tyna Eloundou Nekoul and Girish Sastry and Gretchen Krueger and David P

Arvind Neelakantan and Tao Xu and Raul Puri and Alec Radford and Jesse Michael Han and Jerry Tworek and Qiming Yuan and Nikolas A. Tezak and Jong Wook Kim and Chris Hallacy and Johannes Heidecke and Pranav Shyam and Boris Power and Tyna Eloundou Nekoul and Girish Sastry and Gretchen Krueger and David P. Schnurr and Felipe Petroski Such and Kenny Sai-Kin H...

work page
[72]

Towards Unsupervised Dense Information Retrieval with Contrastive Learning , url =

Gautier Izacard and Mathilde Caron and Lucas Hosseini and Sebastian Riedel and Piotr Bojanowski and Armand Joulin and Edouard Grave , journal =. Towards Unsupervised Dense Information Retrieval with Contrastive Learning , url =

work page
[73]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , title =

Thakur, Nandan and Reimers, Nils and R. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , title =

work page
[74]

SimCSE: Simple Contrastive Learning of Sentence Embeddings , booktitle =

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi , booktitle =. doi:10.18653/v1/2021.emnlp-main.552 , pages =

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[75]

Hinton , bibsource =

Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , bibsource =. A Simple Framework for Contrastive Learning of Visual Representations , url =. Proceedings of the 37th International Conference on Machine Learning,

work page
[76]

BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle =. doi:10.18653/v1/N19-1423 , pages =

work page doi:10.18653/v1/n19-1423
[77]

Large Dual Encoders Are Generalizable Retrievers , url =

Ni, Jianmo and Qu, Chen and Lu, Jing and Dai, Zhuyun and Hernandez Abrego, Gustavo and Ma, Ji and Zhao, Vincent and Luan, Yi and Hall, Keith and Chang, Ming-Wei and Yang, Yinfei , booktitle =. Large Dual Encoders Are Generalizable Retrievers , url =

work page
[78]

In: Inui, K., Jiang, J., Ng, V., Wan, X

Reimers, Nils and Gurevych, Iryna , booktitle =. Sentence-. doi:10.18653/v1/D19-1410 , pages =

work page doi:10.18653/v1/d19-1410
[79]

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , url =

Ni, Jianmo and Hernandez Abrego, Gustavo and Constant, Noah and Ma, Ji and Hall, Keith and Cer, Daniel and Yang, Yinfei , booktitle =. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , url =. doi:10.18653/v1/2022.findings-acl.146 , pages =

work page doi:10.18653/v1/2022.findings-acl.146 2022
[80]

Indexing by latent semantic analysis , volume =

Deerwester, Scott and Dumais, Susan T and Furnas, George W and Landauer, Thomas K and Harshman, Richard , journal =. Indexing by latent semantic analysis , volume =

work page

Showing first 80 references.