Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Ravichander, Abhilasha, Belinkov, Yonatan, Hovy, Eduard , editor = · 2021 · DOI 10.18653/v1/2021.eacl-main.295

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

Evaluation of two latent reasoning models against controls shows observable latent patterns appear without the proposed mechanisms, have graded causal effects on behavior, and concentrate in structured low-rank directions, arguing that patterns are insufficient evidence for reasoning.

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

Chart information is encoded but not routed to predictions in VLMs for claim verification, unlike tables, revealed by layer-wise probing and attention analysis on three models.

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

cs.CL · 2026-05-19 · conditional · novelty 7.0

Different scoring mechanisms cause encoder-based authorship attribution models to consolidate authorship signals at different layers, as shown by causal interventions and gradient analysis.

Validating Causal Abstraction Metrics on Simulated Complex Systems

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Authors create a benchmark across discrete/continuous and static/dynamical systems and introduce the Causal Abstraction Error (CAE) metric that reliably distinguishes valid from invalid causal abstractions when it includes faithfulness testing.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models cs.CL · 2026-06-10 · unverdicted · none · ref 2
Evaluation of two latent reasoning models against controls shows observable latent patterns appear without the proposed mechanisms, have graded causal effects on behavior, and concentrate in structured low-rank directions, arguing that patterns are insufficient evidence for reasoning.
Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification cs.CL · 2026-06-01 · unverdicted · none · ref 4
Chart information is encoded but not routed to predictions in VLMs for claim verification, unlike tables, revealed by layer-wise probing and attention analysis on three models.
Validating Causal Abstraction Metrics on Simulated Complex Systems cs.LG · 2026-06-30 · unverdicted · none · ref 171
Authors create a benchmark across discrete/continuous and static/dynamical systems and introduce the Causal Abstraction Error (CAE) metric that reliably distinguishes valid from invalid causal abstractions when it includes faithfulness testing.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 256
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

fields

years

verdicts

representative citing papers

citing papers explorer