Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
Learning to Solve Arithmetic Word Problems with Verb Categorization
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.
citing papers explorer
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
-
HyperAdapt: Simple High-Rank Adaptation
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
-
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.
-
Automatic Chain of Thought Prompting in Large Language Models
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.