Recognition: 2 theorem links
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
Pith reviewed 2026-05-15 18:10 UTC · model grok-4.3
The pith
ReWOO generates a full reasoning plan without tool observations first, then executes it in one pass to cut token use and allow smaller models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReWOO decouples the reasoning process from external observations by first producing a complete plan of tool calls and subsequent logic without any tool responses, then executing that plan in a single pass once observations are fetched. This modular structure lowers overall token consumption compared to interleaved approaches while preserving or improving task performance on multi-step reasoning problems.
What carries the argument
The ReWOO paradigm that generates a full reasoning plan independently before any tool observations arrive and then executes the plan in one pass.
If this is right
- Reduces token consumption by a factor of five on multi-step reasoning benchmarks such as HotpotQA.
- Improves accuracy by about four percent on HotpotQA relative to interleaved methods.
- Preserves performance even when external tools return no response or fail.
- Enables instruction fine-tuning that transfers reasoning ability from a 175B model to a 7B model.
Where Pith is reading between the lines
- The upfront planning structure could support running tool-augmented systems on hardware with tight memory limits by reducing per-step context size.
- Partial observation feedback could be added later without fully reverting to interleaved prompting, offering a middle ground for adaptive tasks.
- Specialized fine-tuning on distinct tool sets becomes simpler because the reasoning module no longer needs to be retrained alongside every tool change.
Load-bearing premise
That a complete reasoning plan can be generated without any intermediate tool results and that single-pass execution will not miss information that interleaved observations would have provided.
What would settle it
Compare ReWOO against an interleaved baseline on multi-hop questions where each tool call depends on the exact output of the prior call; if ReWOO accuracy falls sharply on cases the baseline solves correctly, the decoupling claim is challenged.
read the original abstract
Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution. Existing ALM systems trigger LLM thought processes while pulling observations from these tools in an interleaved fashion. Specifically, an LLM reasons to call an external tool, gets halted to fetch the tool's response, and then decides the next action based on all preceding response tokens. Such a paradigm, though straightforward and easy to implement, often leads to huge computation complexity from redundant prompts and repeated execution. This study addresses such challenges for the first time, proposing a modular paradigm ReWOO (Reasoning WithOut Observation) that detaches the reasoning process from external observations, thus significantly reducing token consumption. Comprehensive evaluations across six public NLP benchmarks and a curated dataset reveal consistent performance enhancements with our proposed methodology. Notably, ReWOO achieves 5x token efficiency and 4% accuracy improvement on HotpotQA, a multi-step reasoning benchmark. Furthermore, ReWOO demonstrates robustness under tool-failure scenarios. Beyond prompt efficiency, decoupling parametric modules from non-parametric tool calls enables instruction fine-tuning to offload LLMs into smaller language models, thus substantially reducing model parameters. Our illustrative work offloads reasoning ability from 175B GPT3.5 into 7B LLaMA, demonstrating the significant potential for truly efficient and scalable ALM systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ReWOO, a modular paradigm for augmented language models that decouples reasoning from tool observations by first generating a complete static plan via a Planner and then executing it in a single pass via a Worker module. It reports consistent gains across six NLP benchmarks plus a curated dataset, including 5x token efficiency and 4% accuracy improvement on HotpotQA, robustness under tool failures, and successful offloading of reasoning from 175B GPT-3.5 to 7B LLaMA via instruction fine-tuning.
Significance. If the core claims hold, ReWOO could enable substantially more efficient ALM deployments by eliminating interleaved prompting overhead and supporting model compression, with direct implications for scalable multi-step reasoning systems.
major comments (3)
- [§3.2] §3.2 (Planner description): The generation of a non-adaptive plan is presented without addressing how entity references for subsequent hops (e.g., second-hop queries in HotpotQA) are resolved without the first observation; this directly underpins the reported accuracy and efficiency numbers.
- [Table 3] Table 3 (HotpotQA row): The 4% accuracy lift and 5x token reduction are reported without an ablation on plan completeness or cases where the first observation deviates from the Planner's implicit template, leaving the central decoupling assumption untested for the benchmark that most requires adaptive branching.
- [§5.2] §5.2 (tool-failure robustness): The robustness test is described at a high level but provides no quantitative breakdown of recovery success when the Worker must interpret a plan that was generated without the actual observations, which is required to substantiate the claim.
minor comments (2)
- [Abstract] The abstract states 'six public NLP benchmarks' but does not enumerate them; the full list and per-benchmark breakdowns should appear in §4.
- [§3] Notation for Planner output format and Worker execution trace could be formalized with a short pseudocode block or equations in §3 to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on ReWOO. We address each major comment below with clarifications and proposed revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Planner description): The generation of a non-adaptive plan is presented without addressing how entity references for subsequent hops (e.g., second-hop queries in HotpotQA) are resolved without the first observation; this directly underpins the reported accuracy and efficiency numbers.
Authors: The Planner employs symbolic placeholders (e.g., 'result_of_step_1') to reference prior steps in the static plan. The Worker resolves these by substituting actual tool observations at execution time. We will revise §3.2 to include an explicit HotpotQA example demonstrating this reference resolution mechanism. revision: yes
-
Referee: [Table 3] Table 3 (HotpotQA row): The 4% accuracy lift and 5x token reduction are reported without an ablation on plan completeness or cases where the first observation deviates from the Planner's implicit template, leaving the central decoupling assumption untested for the benchmark that most requires adaptive branching.
Authors: We agree an ablation on plan completeness and deviation cases is needed to validate the core assumption. We will add this analysis to the revised Table 3 (or a new subsection), reporting accuracy and token metrics when the first observation deviates from the plan template on HotpotQA. revision: yes
-
Referee: [§5.2] §5.2 (tool-failure robustness): The robustness test is described at a high level but provides no quantitative breakdown of recovery success when the Worker must interpret a plan that was generated without the actual observations, which is required to substantiate the claim.
Authors: The §5.2 experiments simulate failures by supplying erroneous or null observations to the Worker. We will expand this section with a quantitative breakdown, including recovery success rates across failure scenarios (e.g., null vs. incorrect results), to substantiate the robustness claim. revision: yes
Circularity Check
No circularity in ReWOO's empirical paradigm
full rationale
The paper introduces ReWOO as a structural decoupling of reasoning plan generation from tool observations, with all performance claims (5x token efficiency, 4% HotpotQA accuracy lift, offloading from 175B to 7B) resting on direct empirical measurements across benchmarks rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the core method; the efficiency advantage follows immediately from avoiding interleaved prompting, which is a design choice evaluated externally.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can generate coherent multi-step plans without intermediate observations
Forward citations
Cited by 19 Pith papers
-
SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems
SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero librar...
-
TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints
TRIAGE evaluates LLMs on prospective metacognitive control by requiring a single plan for task selection, sequencing, and token allocation under a calibrated budget, revealing substantial gaps in current models across...
-
Can LLM Agents Respond to Disasters? Benchmarking Heterogeneous Geospatial Reasoning in Emergency Operations
DORA is the first end-to-end agentic benchmark for LLM-based disaster response, covering perception, spatial analysis, evacuation planning, temporal reasoning, and report generation over heterogeneous geospatial data,...
-
Evaluating Plan Compliance in Autonomous Programming Agents
Autonomous programming agents frequently fail to follow instructed plans, falling back on incomplete internalized workflows, while standard plans and periodic reminders improve performance but poor plans can degrade i...
-
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
-
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
-
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents
LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.
-
Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning
HierVA improves multi-step chart question answering by having a high-level manager maintain key joint contexts while specialized workers perform targeted reasoning with visual zoom-in.
-
Affordance Agent Harness: Verification-Gated Skill Orchestration
Affordance Agent Harness is a verification-gated orchestration system that unifies skills via an evidence store, episodic memory priors, an adaptive router, and a self-consistency verifier to improve accuracy-cost tra...
-
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
-
Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows
Complete cyclic subtask graphs offer a lens to measure when multi-agent revisitation aids recovery and exploration versus when it increases costs or is dominated by other bottlenecks in LLM agent workflows.
-
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
-
A Survey on Large Language Model based Autonomous Agents
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...
-
SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks
SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.
-
Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling
Full-horizon planning with on-demand replanning achieves accuracy parity with single-step planning in tool-calling agents for knowledge base and multi-hop question answering while consuming 2-3 times fewer tokens.
-
RealRoute: Dynamic Query Routing System via Retrieve-then-Verify Paradigm
RealRoute uses parallel source-agnostic retrieval followed by dynamic verification to improve accuracy over predictive LLM routers in heterogeneous multi-hop RAG tasks.
-
Affordance Agent Harness: Verification-Gated Skill Orchestration
Affordance Agent Harness is a verification-gated orchestration framework that adaptively combines heterogeneous skills, retrieves episodic memories, and uses self-consistency checks to improve affordance grounding acc...
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Reference graph
Works this paper leans on
-
[1]
Re- act: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- act: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023
work page 2023
-
[2]
Augmented language models: a survey
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023
-
[3]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, et al. Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434, 2023
-
[5]
Tool learning with foundation models
Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, et al. Tool learning with foundation models. arXiv preprint arXiv:2304.08354, 2023
-
[6]
Geunwoo Kim, Pierre Baldi, and Stephen McAleer. Language models can solve computer tasks. arXiv preprint arXiv:2303.17491, 2023
-
[7]
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
ChemCrow: Augmenting large-language models with chemistry tools
Andres M Bran, Sam Cox, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large- language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023
work page internal anchor Pith review arXiv 2023
-
[9]
Do as i can, not as i say: Grounding language in robotic affordances
Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pages 287–318. PMLR, 2023
work page 2023
-
[10]
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[11]
https://github.com/Significant-Gravitas/ Auto-GPT, 2023
Auto-gpt: An autonomous gpt-4 experiment. https://github.com/Significant-Gravitas/ Auto-GPT, 2023. [Online; accessed 13-May-2023]
work page 2023
-
[12]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[13]
Alpaca: A strong, replicable instruction-following model
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on F oundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 2023
work page 2023
-
[14]
Specializing smaller language models towards multi-step reasoning
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, and Tushar Khot. Specializing smaller language models towards multi-step reasoning. arXiv preprint arXiv:2301.12726, 2023
-
[15]
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
Star: Self-taught reasoner bootstrapping reasoning with reasoning
Eric Zelikman, Jesse Mu, Noah D Goodman, and Yuhuai Tony Wu. Star: Self-taught reasoner bootstrapping reasoning with reasoning. 2022
work page 2022
-
[19]
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Ethan Kim. Sports understanding. https://github.com/google/BIG-bench/tree/main/ bigbench/benchmark_tasks/sports_understanding, 2022. [Online; accessed 13-May-2023]
work page 2022
-
[21]
Bigbench: Towards an industry standard benchmark for big data analytics
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data , pages 1197–1208, 2013. 10
work page 2013
-
[22]
Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics , 9:346–361, 2021
work page 2021
-
[23]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[24]
Designing effective questions for classroom response system teaching
Ian D Beatty, William J Gerace, William J Leonard, and Robert J Dufresne. Designing effective questions for classroom response system teaching. American journal of physics , 74(1):31–39, 2006
work page 2006
-
[25]
PAL: Program-aided Language Models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. arXiv preprint arXiv:2211.10435, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[27]
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[28]
https://github.com/hwchase17/langchain, 2023
Langchain. https://github.com/hwchase17/langchain, 2023. [Online; accessed 13-May-2023]
work page 2023
-
[29]
Internet-augmented language models through few-shot prompting for open-domain question answering
Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, and Nikolai Grigorev. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint arXiv:2203.05115, 2022
-
[30]
Code as Policies: Language Model Programs for Embodied Control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[31]
PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Chatgpt for robotics: Design principles and model abilities
Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. Chatgpt for robotics: Design principles and model abilities. 2023, 2023
work page 2023
-
[33]
Jiang et al.Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Q Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. arXiv preprint arXiv:2210.12283, 2022
-
[34]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Art: Automatic multi-step reasoning and tool-use for large language models
Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023
-
[36]
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems , 35:1950–1965, 2022
work page 1950
-
[37]
Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[39]
David Wingate, Mohammad Shoeybi, and Taylor Sorensen. Prompt compression and contrastive condi- tioning for controllability and toxicity reduction in language models. arXiv preprint arXiv:2210.03162, 2022
-
[40]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Ges- mundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019
work page 2019
-
[41]
Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021
-
[42]
Zhoujun Cheng, Jungo Kasai, and Tao Yu. Batch prompting: Efficient inference with large language model apis. arXiv preprint arXiv:2301.08721, 2023. 12 Appendix A Additional Observations A.1 Token Decomposition We decompose the token usage of different prompt paradigms on HotpotQA into different components – context prompts, exemplars, and intermediate ste...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.