hub

Bogdan, Uzay Macar, Neel Nanda, and Arthur Conmy

· 2025 · arXiv 2506.19143

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

cs.AI · 2026-06-24 · conditional · novelty 7.0 · 2 refs

Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

cs.AI · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

Epi2Diff extracts cognitive episode sequences from LRM reasoning traces and combines them with semantic features to predict human item difficulty, outperforming baselines on four educational datasets.

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Introduces a hierarchical latent selection model showing SFT supplies raw module materials in compound traces while RL decomposes them to identify atomic modules and enable recombination for new reasoning configurations.

Arithmetic Pedagogy for Language Models

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.

ReasonOps: Operator Segmentation for LLM Reasoning Traces

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.

Stateful Reasoning via Insight Replay

cs.AI · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.

Large Language Models Decide Early and Explain Later

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

LLMs settle on their answer after a minority of CoT tokens and produce an average 760 more as post-decision explanation, enabling early stopping that saves 500 tokens per query at a 2% accuracy cost.

Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness

cs.CL · 2026-03-24 · unverdicted · novelty 6.0

SLRC quantifies genuine step necessity in LLM reasoning as a causal estimator, LC-CoSR training reduces rigidity with stability guarantees, and evaluations reveal a faithfulness-sycophancy paradox across frontier models.

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

cs.LG · 2026-06-07 · unverdicted · novelty 5.0

Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

cs.LG · 2026-06-03 · unverdicted · novelty 4.0

PivotTrace selects unlabeled data for RLVR by quantifying uncertainty via pivot density from attention dynamics, outperforming full supervision using only 29.3% annotations and converging 2.75 times faster.

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

cs.LG · 2025-10-28

citing papers explorer

Showing 1 of 1 citing paper after filters.

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning cs.AI · 2026-06-24 · conditional · none · ref 4 · 2 links
Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.

Bogdan, Uzay Macar, Neel Nanda, and Arthur Conmy

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer