Explain Yourself! Leveraging Language Models for Commonsense Reasoning

Rajani, Nazneen Fatema, McCann, Bryan, Xiong, Caiming, Socher, Richard · 2019 · DOI 10.18653/v1/p19-1487

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

representative citing papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

cs.CL · 2022-01-28 · accept · novelty 9.0

Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.

Prompts for Public-Sector LLMs Should Be Governed as Commons

cs.CY · 2026-05-30 · unverdicted · novelty 6.0

Prompts for public-sector LLMs encode value-laden decisions and should be governed through community-maintained Prompt Commons repositories with provenance, licensing, and moderation.

LIMO: Less is More for Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

cs.CL · 2023-05-23 · conditional · novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

cs.CL · 2023-05-03 · conditional · novelty 6.0

Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

citing papers explorer

Showing 7 of 7 citing papers.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models cs.CL · 2022-01-28 · accept · none · ref 53
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs cs.LG · 2026-06-26 · unverdicted · none · ref 12
Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.
Prompts for Public-Sector LLMs Should Be Governed as Commons cs.CY · 2026-05-30 · unverdicted · none · ref 29
Prompts for public-sector LLMs encode value-laden decisions and should be governed through community-maintained Prompt Commons repositories with provenance, licensing, and moderation.
LIMO: Less is More for Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 169
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 142
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes cs.CL · 2023-05-03 · conditional · none · ref 95
Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 250
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

fields

years

verdicts

representative citing papers

citing papers explorer