Towards understanding chain-of-thought prompting: An empirical study of what matters

Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun · 2023 · DOI 10.18653/v1/2023.acl-long.153

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

cs.AI · 2026-02-14 · conditional · novelty 7.0

Fine-tuning LLMs on Navya-Nyaya's six-phase reasoning structure yields 100% semantic correctness on held-out logical problems despite only 40% strict format adherence.

Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning

stat.ML · 2026-05-13 · unverdicted · novelty 6.0

A conformal procedure for CoT replaces majority voting with weighted aggregation and calibrates abstention to guarantee low confident-error rates, achieving 90.1% selective accuracy on GSM8K by abstaining on under 5% of cases.

SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models

cs.CL · 2025-07-25 · conditional · novelty 6.0

SLoW selects low-frequency word dictionaries to boost LLM translation quality and efficiency across 100 languages from FLORES.

Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models

cs.CL · 2024-11-02 · unverdicted · novelty 6.0

DIP interleaves English word translations into non-English prompts to boost multilingual reasoning on synthetic benchmarks spanning 10-200 languages.

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

cs.CL · 2023-09-11 · conditional · novelty 6.0

MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.

Adam's Law: Textual Frequency Law on Large Language Models

cs.CL · 2026-04-02 · unverdicted · novelty 3.0

Frequent sentence-level text improves LLM prompting and fine-tuning performance across math, translation, commonsense, and tool-use tasks via a proposed frequency law and curriculum ordering.

citing papers explorer

Showing 8 of 8 citing papers.

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective cs.LG · 2026-05-20 · unverdicted · none · ref 93
Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.
Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya cs.AI · 2026-02-14 · conditional · none · ref 18
Fine-tuning LLMs on Navya-Nyaya's six-phase reasoning structure yields 100% semantic correctness on held-out logical problems despite only 40% strict format adherence.
Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning stat.ML · 2026-05-13 · unverdicted · none · ref 12
A conformal procedure for CoT replaces majority voting with weighted aggregation and calibrates abstention to guarantee low confident-error rates, achieving 90.1% selective accuracy on GSM8K by abstaining on under 5% of cases.
SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models cs.CL · 2025-07-25 · conditional · none · ref 31
SLoW selects low-frequency word dictionaries to boost LLM translation quality and efficiency across 100 languages from FLORES.
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models cs.CL · 2024-11-02 · unverdicted · none · ref 32
DIP interleaves English word translations into non-English prompts to boost multilingual reasoning on synthetic benchmarks spanning 10-200 languages.
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning cs.CL · 2023-09-11 · conditional · none · ref 47
MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection cs.AI · 2026-05-04 · unverdicted · none · ref 45
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
Adam's Law: Textual Frequency Law on Large Language Models cs.CL · 2026-04-02 · unverdicted · none · ref 37
Frequent sentence-level text improves LLM prompting and fine-tuning performance across math, translation, commonsense, and tool-use tasks via a proposed frequency law and curriculum ordering.

Towards understanding chain-of-thought prompting: An empirical study of what matters

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer