pith. sign in

What do you learn from context? Probing for sentence structure in contextualized word representations

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it
abstract

Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.

fields

cs.CL 3 cs.LG 3

years

2026 5 2021 1

representative citing papers

Mixing Times of Glauber Dynamics on Masked Language Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Analysis of Glauber dynamics on masked language models shows O(n log n) mixing under bounded cross-token influence and metastability with exponential escape times at low temperatures, plus empirical phase transitions.

citing papers explorer

Showing 6 of 6 citing papers.