Information-Theoretic Probing for Linguistic Structure , booktitle =

Pimentel, Tiago, Valvoda, Josef, Maudslay, Rowan Hall, Zmigrod, Ran, Williams, Adina, Cotterell, Ryan · 2020 · DOI 10.18653/v1/2020.acl-main.420

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Brain-LLM Alignment Tracks Training Data, Not Typology

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Training-language dominance, not English inherent properties, determines brain-LLM alignment across English, Chinese, and French, with additional independent effects from typological distance concentrated in syntactic brain regions.

Deep Minds and Shallow Probes

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.

On the Emergence of Syntax by Means of Local Interaction

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

Validating Causal Abstraction Metrics on Simulated Complex Systems

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Authors create a benchmark across discrete/continuous and static/dynamical systems and introduce the Causal Abstraction Error (CAE) metric that reliably distinguishes valid from invalid causal abstractions when it includes faithfulness testing.

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

cs.CR · 2026-06-08 · unverdicted · novelty 6.0

Adversarial fine-tuning evades activation-based steganography detection in five LLMs while preserving secret recovery, but a recontextualization dataset restores both ridge and MLP probe detectability.

A Geometric Measure of Linear Separability for Neural Representations

cs.LG · 2026-06-07 · unverdicted · novelty 6.0

Introduces the directional linear separability measure (LSM) as an asymmetric diagnostic for one-sided affine separability of neural representations.

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.

Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

Exploring Concreteness Through a Figurative Lens

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.

Probing Classifiers: Promises, Shortcomings, and Advances

cs.CL · 2021-02-24 · unverdicted · novelty 3.0

Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models cs.LG · 2025-09-30 · unverdicted · none · ref 50
TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.

Information-Theoretic Probing for Linguistic Structure , booktitle =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer