hub

and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E

Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith · 2019 · Proceedings of the 2019 Conference of the North · DOI 10.18653/v1/n19-1112

11 Pith papers cite this work, alongside 134 external citations. Polarity classification is still indexing.

11 Pith papers citing it

134 external citations · Crossref

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Deep Minds and Shallow Probes

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

cs.LG · 2025-02-04 · unverdicted · novelty 7.0

Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Polar probe linearly decodes semantic structures from LLMs

cs.CL · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.

Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

Trait-space drift monitoring detects emergent misalignment checkpoints in 7-9B LLMs with 2.2% FNR, 2.9% FPR and 0.99 AUROC, outperforming PCA and SAE baselines.

MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

cs.CL · 2026-04-27 · unverdicted · novelty 5.0

MIPIC trains Matryoshka representations using self-distilled intra-relational alignment and progressive information chaining, yielding competitive results on STS, NLI, and classification tasks especially at low dimensions.

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

cs.CL · 2025-06-02 · unverdicted · novelty 5.0

Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

Fast & Faithful Function Vectors

cs.CL · 2026-06-03 · unverdicted · novelty 4.0

LRP-based attention head selection and distributed application improve the efficiency and accuracy of function vectors for steering LLMs compared to prior choices.

Probing Classifiers: Promises, Shortcomings, and Advances

cs.CL · 2021-02-24 · unverdicted · novelty 3.0

Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

citing papers explorer

Showing 11 of 11 citing papers.

Deep Minds and Shallow Probes cs.LG · 2026-05-12 · unverdicted · none · ref 14
Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution cs.LG · 2025-02-04 · unverdicted · none · ref 13
Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.
Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models cs.CL · 2026-06-19 · unverdicted · none · ref 270
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 28
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
Polar probe linearly decodes semantic structures from LLMs cs.CL · 2026-05-13 · unverdicted · none · ref 40 · 2 links
LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt cs.CL · 2026-06-01 · unverdicted · none · ref 32
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning cs.LG · 2026-05-31 · unverdicted · none · ref 7
Trait-space drift monitoring detects emergent misalignment checkpoints in 7-9B LLMs with 2.2% FNR, 2.9% FPR and 0.99 AUROC, outperforming PCA and SAE baselines.
MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining cs.CL · 2026-04-27 · unverdicted · none · ref 24
MIPIC trains Matryoshka representations using self-distilled intra-relational alignment and progressive information chaining, yielding competitive results on STS, NLI, and classification tasks especially at low dimensions.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unverdicted · none · ref 25
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
Fast & Faithful Function Vectors cs.CL · 2026-06-03 · unverdicted · none · ref 24
LRP-based attention head selection and distributed application improve the efficiency and accuracy of function vectors for steering LLMs compared to prior choices.
Probing Classifiers: Promises, Shortcomings, and Advances cs.CL · 2021-02-24 · unverdicted · none · ref 13
Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer