Training-language dominance, not English inherent properties, determines brain-LLM alignment across English, Chinese, and French, with additional independent effects from typological distance concentrated in syntactic brain regions.
Information-Theoretic Probing for Linguistic Structure , booktitle =
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10representative citing papers
Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
Authors create a benchmark across discrete/continuous and static/dynamical systems and introduce the Causal Abstraction Error (CAE) metric that reliably distinguishes valid from invalid causal abstractions when it includes faithfulness testing.
Adversarial fine-tuning evades activation-based steganography detection in five LLMs while preserving secret recovery, but a recontextualization dataset restores both ridge and MLP probe detectability.
Introduces the directional linear separability measure (LSM) as an asymmetric diagnostic for one-sided affine separability of neural representations.
TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.
Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.
citing papers explorer
-
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.