Analysis of Glauber dynamics on masked language models shows O(n log n) mixing under bounded cross-token influence and metastability with exponential escape times at low temperatures, plus empirical phase transitions.
What do you learn from context? Probing for sentence structure in contextualized word representations
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
representative citing papers
Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.
BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised clusters do not align with these categories.
citing papers explorer
-
Mixing Times of Glauber Dynamics on Masked Language Models
Analysis of Glauber dynamics on masked language models shows O(n log n) mixing under bounded cross-token influence and metastability with exponential escape times at low temperatures, plus empirical phase transitions.
-
From Words to Amino Acids: Does the Curse of Depth Persist?
Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
-
On the Blessing of Pre-training in Weak-to-Strong Generalization
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
-
Enriching and Controlling Global Semantics for Text Summarization
A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.
-
Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised clusters do not align with these categories.
- REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations