pith. sign in

How to think step-by-step: A mechanistic understanding of chain-of-thought reason- ing

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

years

2026 4 2025 3

clear filters

representative citing papers

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Emergent Slow Thinking in LLMs as Inverse Tree Freezing

cs.AI · 2025-09-28 · unverdicted · novelty 6.0

RLVR drives a concept network in LLMs through nucleation and freezing into inverse trees that support slow thinking, and intervening with brief SFT at peak frustration outperforms standard RLVR while post-freeze SFT causes forgetting.

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

cs.CL · 2025-09-07 · unverdicted · novelty 6.0

Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Emergent Slow Thinking in LLMs as Inverse Tree Freezing cs.AI · 2025-09-28 · unverdicted · none · ref 10

    RLVR drives a concept network in LLMs through nucleation and freezing into inverse trees that support slow thinking, and intervening with brief SFT at peak frustration outperforms standard RLVR while post-freeze SFT causes forgetting.

  • Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal cs.CL · 2025-09-07 · unverdicted · none · ref 14

    Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.

  • ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs cs.CR · 2025-04-08 · unverdicted · none · ref 26

    ShadowCoT introduces a reasoning-level backdoor attack on LLMs achieving 94.4% attack success rate and 88.4% hijacking success rate with 0.15% parameter updates via internal state conditioning and reasoning chain pollution.