Recognition: no theorem link
Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models
Pith reviewed 2026-05-13 05:56 UTC · model grok-4.3
The pith
Reflecting on failed reformulation attempts lets LLMs build reusable memory that improves robust optimization automation without tuning or experts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AutoREM autonomously builds a structured textual experience memory through an offline adaptation procedure that reflects on previously failed reformulation trajectories. The memory encodes reusable patterns for mathematically consistent transformations and is then used at inference time to steer the LLM toward correct deterministic counterparts. The resulting system improves reformulation accuracy and efficiency on both in-distribution and out-of-distribution instances and works with multiple base LLMs without any parameter updates or domain-specific input.
What carries the argument
Structured textual experience memory generated by reflecting on failed trajectories via an offline adaptation procedure, which supplies reusable guidance for multi-step mathematical reformulations.
If this is right
- AutoREM raises reformulation accuracy on both familiar and unseen problem sets.
- The same memory transfers directly to different base LLMs without modification or retraining.
- Efficiency improves because the LLM requires fewer attempts to reach a valid deterministic equivalent.
- No domain expertise or parameter changes are needed for the gains to appear.
Where Pith is reading between the lines
- The same self-reflection mechanism could be tested on reformulation tasks outside robust optimization, such as stochastic or nonlinear programming.
- Practitioners in operations research might adopt the approach to apply robust methods to supply-chain or financial models without hiring specialists.
- If memory size grows with problem complexity, the method may require new compression or retrieval techniques for very large instances.
- The framework implies that error-based memory can substitute for explicit fine-tuning in other technical domains that demand chained reasoning.
Load-bearing premise
Reflecting on failed trajectories produces a memory that generalizes reliably to new robust optimization instances without requiring domain-specific expert knowledge or any parameter updates to the underlying LLM.
What would settle it
Apply the same memory to a fresh collection of robust optimization problems or a previously unseen base LLM and observe no gain, or a decline, in the fraction of correctly reformulated instances relative to the unaugmented model.
Figures
read the original abstract
Robust optimization (RO) provides a principled framework for decision-making under uncertainty, but its practical use is often limited by the need to manually reformulate uncertain optimization models into tractable deterministic counterparts. Recent large language models (LLMs) have been shown promising for automating optimization formulation, yet RO reformulation remains challenging because it requires precise multi-step reasoning and mathematically consistent transformations. To facilitate systematic evaluation of LLM-based reformulation, for which no dedicated benchmark currently exists, we develop AutoRO-Bench, a benchmark featuring an automated data generation pipeline for the core RO reformulation task and a curated dataset for the RO application task. To address the reformulation challenge, we propose Automated Reformulation with Experience Memory (AutoREM), a tuning-free memory-augmented framework that autonomously builds a structured textual experience memory by reflecting on past failed trajectories through a tailored offline adaptation procedure. AutoREM requires neither domain-specific expert knowledge nor parameter updates, and the resulting memory readily transfers across different base LLMs. Experimental results show that AutoREM consistently improves the accuracy and efficiency of RO reformulation across in-distribution datasets, out-of-distribution datasets, and diverse base LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AutoRO-Bench, a benchmark with an automated data generation pipeline and curated dataset for evaluating LLM-based reformulation of robust optimization (RO) problems into deterministic equivalents. It proposes AutoREM, a tuning-free memory-augmented framework that builds a structured textual experience memory by reflecting on failed reformulation trajectories through an offline adaptation procedure. AutoREM claims to require no domain-specific expert knowledge or parameter updates to the base LLM, with the memory transferring across different LLMs. Experiments reportedly demonstrate consistent gains in accuracy and efficiency on in-distribution, out-of-distribution, and cross-LLM settings.
Significance. If the claims hold, the work could meaningfully advance automation of RO reformulation, a bottleneck due to manual multi-step mathematical transformations. AutoRO-Bench fills a gap by providing a dedicated evaluation resource. The tuning-free memory approach is attractive for practical use across LLMs. Credit is due for the focus on mathematical consistency and the transferability claim. However, insufficient experimental detail on metrics, statistics, OOD construction, and the reflection process limits assessment of whether the gains are robust or generalizable beyond prompt engineering.
major comments (2)
- [Abstract] Abstract: The central claim of consistent improvements in accuracy and efficiency across in-distribution, out-of-distribution datasets, and diverse base LLMs is presented without any information on the evaluation metrics, statistical significance tests, error bars, number of runs, or how out-of-distribution cases were constructed. This directly undermines verification of the strongest claim.
- [AutoREM framework] AutoREM description (offline adaptation procedure): The framework's core assumption—that reflecting on failed trajectories autonomously produces a memory that generalizes without expert knowledge or parameter updates—is load-bearing, yet no details are given on the reflection prompt template, memory indexing/retrieval structure, or mechanisms enforcing mathematical consistency (e.g., dualization or worst-case enumeration). This leaves open whether observed gains reduce to implicit heuristics in the prompts rather than true memory augmentation.
minor comments (1)
- [Introduction / Benchmark section] The benchmark name AutoRO-Bench and its two components (reformulation task vs. application task) are introduced clearly in the abstract but would benefit from an explicit high-level diagram or table summarizing the data generation pipeline in the main text.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We have revised the manuscript to address the concerns about experimental details and framework transparency. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of consistent improvements in accuracy and efficiency across in-distribution, out-of-distribution datasets, and diverse base LLMs is presented without any information on the evaluation metrics, statistical significance tests, error bars, number of runs, or how out-of-distribution cases were constructed. This directly undermines verification of the strongest claim.
Authors: We agree the abstract should briefly contextualize the metrics to support the claim. In the revised version we specify that accuracy is the fraction of reformulations verified as mathematically equivalent by an automated checker, efficiency is measured by token count and wall-clock time, statistical significance is assessed via paired t-tests (p<0.05) over five independent runs with standard-deviation error bars, and OOD instances are generated by systematically altering uncertainty-set shapes and constraint structures absent from the training distribution (detailed in Section 4.3). These additions make the central claim verifiable while remaining within abstract length constraints. revision: yes
-
Referee: [AutoREM framework] AutoREM description (offline adaptation procedure): The framework's core assumption—that reflecting on failed trajectories autonomously produces a memory that generalizes without expert knowledge or parameter updates—is load-bearing, yet no details are given on the reflection prompt template, memory indexing/retrieval structure, or mechanisms enforcing mathematical consistency (e.g., dualization or worst-case enumeration). This leaves open whether observed gains reduce to implicit heuristics in the prompts rather than true memory augmentation.
Authors: We accept that the original description lacked sufficient implementation detail. The revised manuscript adds the complete reflection prompt template in Appendix B; it directs the LLM to diagnose specific failure modes (incorrect dualization, missed worst-case realizations, etc.) and distill reusable rules without injecting external expert knowledge. Memory is stored as a feature-keyed dictionary (keys encode uncertainty type and constraint signature; values store the corresponding reformulation strategy) and retrieved by cosine similarity on sentence embeddings. Post-retrieval validation enforces mathematical consistency by cross-checking generated dual variables and worst-case enumerations against a lightweight symbolic verifier. These additions demonstrate that performance gains derive from the structured memory rather than prompt heuristics alone; we also include pseudocode for the offline adaptation loop. revision: yes
Circularity Check
No significant circularity; empirical validation on independently generated benchmark
full rationale
The paper introduces AutoRO-Bench via an automated data generation pipeline and proposes the AutoREM framework that builds textual memory from reflection on failed trajectories. All performance claims are supported by experimental results across in-distribution, out-of-distribution, and multi-LLM settings rather than any mathematical derivation or parameter fit that reduces to the inputs by construction. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the described chain; the method is presented as tuning-free with results serving as external evidence.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large language models can perform precise multi-step mathematical transformations when guided by structured memory of prior failures.
- domain assumption An automated pipeline can generate representative robust optimization instances that cover both in-distribution and out-of-distribution cases.
Reference graph
Works this paper leans on
-
[1]
Chaithanya Bandi and Dimitris Bertsimas. Tractable stochastic analysis in high dimensions via robust optimization.Mathematical Programming, 134(1):23–70, 2012
work page 2012
-
[2]
The price of robustness.Operations Research, 52(1):35–53, 2004
Dimitris Bertsimas and Melvyn Sim. The price of robustness.Operations Research, 52(1):35–53, 2004
work page 2004
-
[3]
Robust solutions of uncertain linear programs.Operations Research Letters, 25(1):1–13, 1999
Aharon Ben-Tal and Arkadi Nemirovski. Robust solutions of uncertain linear programs.Operations Research Letters, 25(1):1–13, 1999
work page 1999
-
[4]
Aharon Ben-Tal and Arkadi Nemirovski. Robust solutions of Linear Programming problems contaminated with uncertain data.Mathematical Programming, 88(3):411–424, 2000
work page 2000
-
[5]
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 645(8081):633–638, 2025
work page 2025
-
[6]
Open- Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Jingcheng Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, and Heung-Yeung Shum. Open- Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model. In Advances in Neural Information Processing Systems. arXiv, July 2025
work page 2025
-
[7]
A survey of optimization modeling meets llms: progress and future directions
Ziyang Xiao, Jingrong Xie, Lilin Xu, Shisi Guan, Jingyan Zhu, Xiongwei Han, Xiaojin Fu, WingYin Yu, Han Wu, Wei Shi, et al. A survey of optimization modeling meets llms: progress and future directions. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pages 10742– 10750, 2025
work page 2025
-
[8]
Andre He, Daniel Fried, and S. Welleck. Rewarding the unlikely: Lifting GRPO beyond distribution sharpening. InProceedings of the Conference on Empirical Methods in Natural Language Processing, pages 25548–25560, 2025
work page 2025
-
[9]
Spurious rewards: Rethinking training signals in RLVR.arXiv preprint arXiv:2506.10947,
Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, and Luke Zettlemoyer. Spurious rewards: Rethinking training signals in RLVR.arXiv preprint arXiv:2506.10947, 2025
-
[10]
Outcome-based exploration for LLM reasoning.arXiv preprint arXiv:2509.06941, 2025
Yuda Song, Julia Kempe, and Remi Munos. Outcome-based exploration for LLM reasoning.arXiv preprint arXiv:2509.06941, 2025
-
[11]
Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in llms beyond the base model? InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[12]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the Age of AI Agents.arXiv preprint arXiv:2512.13564, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji- Rong Wen. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems, 43(6):1–47, 2025
work page 2025
-
[14]
Agentic context engineering: Evolving contexts for self-improving language models
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models. InInternational Conference on Learning Representations, 2026
work page 2026
-
[15]
Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister. ReasoningBank: Scaling agent self-evolving with reasoning memory. In International Conference on Learning Representations, 2026. 10
work page 2026
-
[16]
Shao, Dongdong Ge, and Yinyu Ye
Yitian Chen, Jingfan Xia, S. Shao, Dongdong Ge, and Yinyu Ye. Solver-informed RL: Grounding large language models for authentic optimization modeling. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[17]
Zezhen Ding, Zhen Tan, Jiheng Zhang, and Tianlong Chen. OR-R1: Automating modeling and solving of operations research optimization problem via test-time reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 228–236, 2026
work page 2026
-
[18]
Chenyu Huang, Zhengyang Tang, Shixi Hu, Ruoqing Jiang, Xin Zheng, Dongdong Ge, Benyou Wang, and Zizhuo Wang. ORLM: A customizable framework in training large models for automated optimization modeling.Operations Research, 73(6):2986–3009, 2025
work page 2025
-
[19]
LLMOPT: Learning to define and solve general optimization problems from scratch
Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, and Yang Yu. LLMOPT: Learning to define and solve general optimization problems from scratch. InInternational Conference on Learning Representations, 2025
work page 2025
-
[20]
DeepOR: A deep reasoning foundation model for optimization modeling
Ziyang Xiao, Yuan Jessica Wang, Xiongwei Han, Shisi Guan, Jingyan Zhu, Jingrong Xie, Lilin Xu, Han Wu, Wing Yin Yu, Zehua Liu, Xiaojin Fu, Gang Chen, and Dongxiang Zhang. DeepOR: A deep reasoning foundation model for optimization modeling. InProceedings of the AAAI Conference on Artificial Intelligence, pages 34052–34060, 2026
work page 2026
-
[21]
MURKA: Multi-Reward Reinforce- ment Learning with Knowledge Alignment for Optimization Tasks
Wantong Xie, Yi-Xiang Hu, Jieyang Xu, Feng Wu, and Xiang-Yang Li. MURKA: Multi-Reward Reinforce- ment Learning with Knowledge Alignment for Optimization Tasks. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[22]
Chenyu Zhou, Tianyi Xu, Jianghao Lin, and Dongdong Ge. StepORLM: A self-evolving framework with generative process supervision for operations research language models. InInternational Conference on Learning Representations, 2026
work page 2026
-
[23]
OptiMUS: Scalable optimization modeling with (MI)LP solvers and large language models
Ali AhmadiTeshnizi, Wenzhi Gao, and Madeleine Udell. OptiMUS: Scalable optimization modeling with (MI)LP solvers and large language models. InInternational Conference on Machine Learning, 2024
work page 2024
-
[24]
Autoformulation of Mathematical Optimization Models Using LLMs
Nicolás Astorga, Tennison Liu, Yuanzhang Xiao, and Mihaela Van Der Schaar. Autoformulation of Mathematical Optimization Models Using LLMs. InInternational Conference on Machine Learning, pages 1864–1886. PMLR, October 2025
work page 2025
-
[25]
Xia Jiang, Yaoxin Wu, Chenhao Zhang, and Yingqian Zhang. DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints. InInternational Conference on Learning Representations, volume 2025, pages 46731–46752, May 2025
work page 2025
-
[26]
Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, and Jinhua Zhao. AlphaOPT: Formulating optimization programs with self-improving LLM experience library.arXiv preprint arXiv:2510.18428, 2025
-
[27]
SolverLLM: Leveraging test-time scaling for optimization problem via LLM-guided search
Dong Li, Xiaoyu Zhao, Linlin Yu, Yanchi Liu, Wei Cheng, Zhengzhang Chen, Zhong Chen, Feng Chen, Chen Zhao, and Haifeng Chen. SolverLLM: Leveraging test-time scaling for optimization problem via LLM-guided search. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[28]
Kuo Liang, Yuhang Lu, Jianming Mao, Shuyi Sun, Chunwei Yang, Congcong Zeng, Xiao Jin, Hanzhang Qin, Ruihao Zhu, and Chung-Piaw Teo. Large-scale optimization model auto-formulation: Harnessing LLM flexibility via structured workflow.arXiv preprint arXiv:2601.09635, 2026
-
[29]
MM-agent: LLM as agents for real-world mathematical modeling problem
Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, and Hao Liu. MM-agent: LLM as agents for real-world mathematical modeling problem. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[30]
OptiTree: Hierarchical thoughts generation with tree search for LLM optimization modeling
Haoyang Liu, Jie Wang, Yuyang Cai, Xiongwei Han, Yufei Kuang, and Jianye Hao. OptiTree: Hierarchical thoughts generation with tree search for LLM optimization modeling. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[31]
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Tennison Liu and Mihaela van der Schaar. MATHMO: Automated Mathematical Modeling Through Adaptive Search. InInternational Conference on Learning Representations, 2026
work page 2026
-
[32]
Guiding large language models in modeling optimization problems via question partitioning
Xiaotian Pan, Junhao Fang, Feng Wu, Sijia Zhang, Yi-Xiang Hu, Shaoang Li, and Xiang-Yang Li. Guiding large language models in modeling optimization problems via question partitioning. InProceedings of the International Joint Conference on Artificial Intelligence, pages 2657–2665, 2024. 11
work page 2024
-
[33]
Rindra Ramamonjison, Haley Li, Timothy Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-dehkordi, Zirui Zhou, and Yong Zhang. Augmenting operations research with auto-formulation of optimization models from problem descriptions. InProceedings of the Conference on Empirical Methods in Natural Language Processing, pages 29–62, 2022
work page 2022
-
[34]
BPP-search: Enhancing tree of thought reasoning for mathematical modeling problem solving
Teng Wang, Wing Yin Yu, Zhenqi He, Zehua Liu, HaileiGong HaileiGong, Han Wu, Xiongwei Han, Wei Shi, Ruifeng She, Fangzhou Zhu, and Tao Zhong. BPP-search: Enhancing tree of thought reasoning for mathematical modeling problem solving. InProceedings of the Annual Meeting of the Association for Computational Linguistics, pages 821–838, 2025
work page 2025
-
[35]
ORMind: A cognitive-inspired end-to-end reasoning framework for operations research
Zhiyuan Wang, Bokui Chen, Yinya Huang, Qingxing Cao, Ming He, Jianping Fan, and Xiaodan Liang. ORMind: A cognitive-inspired end-to-end reasoning framework for operations research. InProceedings of the Annual Meeting of the Association for Computational Linguistics, pages 104–131, 2025
work page 2025
-
[36]
Chain-of-Experts: When LLMs Meet Complex Operations Research Problems
Ziyang Xiao, Dongxiang Zhang, Yangjun Wu, Lilin Xu, Yuan Jessica Wang, Xiongwei Han, Xiaojin Fu, Tao Zhong, Jia Zeng, Mingli Song, and Gang Chen. Chain-of-Experts: When LLMs Meet Complex Operations Research Problems. InInternational Conference on Learning Representations, 2024
work page 2024
-
[37]
Solving general natural-language-description optimization problems with large language models
Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, and Wotao Yin. Solving general natural-language-description optimization problems with large language models. InProceedings of the North American Chapter of the Association for Computational Linguistics, pages 483–490, 2024
work page 2024
-
[38]
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Yansen Zhang, Qingcan Kang, Wing Yin Yu, Xiaojin Fu, Xiongwei Han, Tao Zhong, and Chen Ma. Decision Information Meets Large Language Models: The Future of Explainable Operations Research. In International Conference on Learning Representations, volume 2025, pages 65698–65722, May 2025
work page 2025
-
[39]
Dimitris Bertsimas and Georgios Margaritis. Robust and adaptive optimization under a large language model lens.arXiv preprint arXiv:2501.00568, 2024
-
[40]
Large language models are zero-shot reasoners
Shixiang Shane Gu, Yusuke Iwasawa, Takeshi Kojima, Yutaka Matsuo, and Machel Reid. Large language models are zero-shot reasoners. InAdvances in Neural Information Processing Systems, pages 22199– 22213, 2022
work page 2022
-
[41]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, F. Xia, Quoc Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[42]
Self-refine: Iterative refinement with self-feedback
Uri Alon, Peter Clark, Nouha Dziri, Luyu Gao, Prakhar Gupta, Shashank Gupta, Skyler Hallinan, Katherine Hermann, Aman Madaan, Bodhisattwa Prasad Majumder, Shrimai Prabhumoye, Niket Tandon, Sean Welleck, Sarah Wiegreffe, Yiming Yang, and Amir Yazdanbakhsh. Self-refine: Iterative refinement with self-feedback. InAdvances in Neural Information Processing Sys...
work page 2023
-
[43]
Reflexion: Language agents with verbal reinforcement learning
Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, Noah Shinn, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, pages 8634–8652, 2023
work page 2023
-
[44]
Graph of thoughts: Solving elaborate problems with large language models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 17682–17690, 2024
work page 2024
-
[45]
Tree of thoughts: Deliberate problem solving with large language models
Yuan Cao, Tom Griffiths, Karthik Narasimhan, Izhak Shafran, Shunyu Yao, Dian Yu, and Jeffrey Zhao. Tree of thoughts: Deliberate problem solving with large language models. InAdvances in Neural Information Processing Systems, pages 11809–11822, 2023
work page 2023
-
[46]
Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. InInternational Conference on Learning Representations, 2026
work page 2026
-
[47]
Scalable best-of-n selection for large language models via self-certainty
Zhewei Kang, Xuandong Zhao, and Dawn Song. Scalable best-of-n selection for large language models via self-certainty. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[48]
Reasoning with sampling: Your base model is smarter than you think
Aayush Karan and Yilun Du. Reasoning with sampling: Your base model is smarter than you think. In International Conference on Learning Representations, 2026
work page 2026
-
[49]
Mutual reasoning makes smaller llms stronger problem-solvers
Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, and Mao Yang. Mutual reasoning makes smaller llms stronger problem-solvers. InInternational Conference on Learning Representations, 2025. 12
work page 2025
- [50]
-
[51]
Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning. InInternational Confe...
work page 2026
-
[52]
System prompt optimization with meta-learning
Yumin Choi, Jinheon Baek, and Sung Ju Hwang. System prompt optimization with meta-learning. In Advances in Neural Information Processing Systems, 2025
work page 2025
-
[53]
Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, and Zhiting Hu. PromptAgent: Strategic planning with language models enables expert-level prompt optimization. InInternational Conference on Learning Representations, 2024
work page 2024
-
[54]
Evolving prompts in-context: An open-ended, self-replicating perspective
Jianyu Wang, Zhiqiang Hu, and Lidong Bing. Evolving prompts in-context: An open-ended, self-replicating perspective. InInternational Conference on Machine Learning, 2025
work page 2025
-
[55]
PREFER: Prompt ensemble learning via feedback-reflect-refine
Chenrui Zhang, Lin Liu, Chuyuan Wang, Xiao Sun, Hongyu Wang, Jinpeng Wang, and Mingchen Cai. PREFER: Prompt ensemble learning via feedback-reflect-refine. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19525–19532, 2024
work page 2024
-
[56]
Kyunghoon Bae, Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Honglak Lee, Lajanugen Logeswaran, and Sungryull Sohn. AutoGuide: Automated generation and selection of context-aware guidelines for large language model agents. InAdvances in Neural Information Processing Systems, pages 119919–119948, 2024
work page 2024
-
[57]
Contextual experience replay for self-improvement of language agents
Yitao Liu, Chenglei Si, Karthik R Narasimhan, and Shunyu Yao. Contextual experience replay for self-improvement of language agents. InProceedings of the Annual Meeting of the Association for Computational Linguistics, pages 14179–14198, 2025
work page 2025
-
[58]
REMem: Reasoning with episodic memory in language agent
Yiheng Shu, Saisri Padmaja Jonnalagedda, Xiang Gao, Bernal Jiménez Gutiérrez, Weijian Qi, Kamalika Das, Huan Sun, and Yu Su. REMem: Reasoning with episodic memory in language agent. InInternational Conference on Learning Representations, 2026
work page 2026
-
[59]
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Y . Liu, and Gao Huang. ExpeL: LLM agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19632–19642, 2023
work page 2023
-
[60]
Robust convex optimization.Mathematics of Operations Research, 23(4):769–805, 1998
Aharon Ben-Tal and Arkadi Nemirovski. Robust convex optimization.Mathematics of Operations Research, 23(4):769–805, 1998. doi: 10.1287/moor.23.4.769
-
[61]
Princeton Series in Applied Mathematics
Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ, 2009. ISBN 978-0-691-14368-2
work page 2009
-
[62]
Zhi Chen and Peng Xiong. RSOME in Python: An open-source package for robust stochastic optimization made easy.INFORMS Journal of Computing, 35(4):717–724, 2023
work page 2023
-
[63]
Robust stochastic optimization made easy with RSOME
Zhi Chen, Melvyn Sim, and Peng Xiong. Robust stochastic optimization made easy with RSOME. Management Science, 66(8):3329–3339, 2020
work page 2020
-
[64]
Optibench meets resocratic: Measure and improve llms for optimization modeling
Zhicheng Yang, Yiwei Wang, Yinya Huang, Zhijiang Guo, Wei Shi, Xiongwei Han, Liang Feng, Linqi Song, Xiaodan Liang, and Jing Tang. Optibench meets resocratic: Measure and improve llms for optimization modeling. InInternational Conference on Learning Representations, 2025. 13 Automated Reformulation of Robust Optimization via Memory-Augmented Large Languag...
work page 2025
-
[65]
Not all-deterministic.At least one row (objective or constraint) must have a positive uncertainty type; instances where every row is deterministic are discarded before solving
-
[66]
Feasible and bounded.The solver must return status optimal; instances that are infeasible, unbounded, or produce a solver error are discarded
-
[67]
Non-degenerate solution.The optimal solution x∗ must not be identical to the variable lower or upper bound at every component, i.e., it is required that x∗ is not coordinatewise equal to xl or xu. Formally, the instance is rejected if x∗ i ≈x l or x∗ i ≈x u forall i, as such solutions are considered degenerate boundary solutions that offer little reformul...
-
[68]
Nominal LP(Step 1): generate the constraint matrix A and right-hand side b such that x0 is strictly feasible
-
[69]
Polyhedral uncertainty set(Step 3): generate the halfspace representation (F,g) of the perturbation polytope Upoly such that the sampled interior point ζ0 is guaranteed to lie strictly inside. 3 Algorithm 2FEASIBLEPOLYTOPE: Random Feasible Inequality System Generator Require:Dimensiond; interior pointv 0 ∈R d; coefficient range[l, u]; number of rowsm Ensu...
-
[70]
Preamble.A fixed paragraph explaining that certain parameters are uncertain, perturbations are confined to uncertainty sets, and the goal is a worst-case feasible decision
-
[71]
Each coefficient may deviate by at most ±[∆1, . . . ,∆m] from its nominal value
Uncertain parameter descriptions.One bullet per uncertain row, identifying the physical meaning of the uncertain coefficients and specifying the uncertainty set. The set is described as: • Box:“Each coefficient may deviate by at most ±[∆1, . . . ,∆m] from its nominal value.” • Budget:Same as box, followed by “the sum of normalized deviations ( P |ξj|/∆j) ...
-
[72]
Closing statement.“The robust model seeks a decision that guarantees [maximizing/min- imizing] the objective value under all worst-case realizations within the uncertainty sets above.” Example.The following is an excerpt from one instance (a bakery production problem from OptiBench), illustrating the box uncertainty format: Robust Extension:In this robust...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.