A Primer in BERT ology: What We Know About How BERT Works

Rogers, A · 2020 · DOI 10.1162/tacl_a_00349

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open at publisher browse 11 citing papers

representative citing papers

Probabilistic Attribution For Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Develops a model-agnostic attribution score as the log-ratio of conditional response probabilities with and without a marginalized prompt token, derived via Bayes inversion of next-token distributions, and relates it to conditional entropies.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

cs.SE · 2023-05-20 · unverdicted · novelty 7.0

LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.

Predictive Prefetching for Retrieval-Augmented Generation

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.

Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.

Search-R3: Unifying Reasoning and Embedding in Large Language Models

cs.CL · 2025-10-08 · unverdicted · novelty 5.0

Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

cs.CL · 2025-06-02 · unverdicted · novelty 5.0

Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

cs.CV · 2026-07-02 · unverdicted · novelty 4.0

EADP filters textual noise via statistical entropy then casts token selection as submodular maximization with spatial prior to preserve fine-grained cues in VLMs under strict budgets.

Fast & Faithful Function Vectors

cs.CL · 2026-06-03 · unverdicted · novelty 4.0

LRP-based attention head selection and distributed application improve the efficiency and accuracy of function vectors for steering LLMs compared to prior choices.

Probing Classifiers: Promises, Shortcomings, and Advances

cs.CL · 2021-02-24 · unverdicted · novelty 3.0

Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

citing papers explorer

Showing 11 of 11 citing papers after filters.

Probabilistic Attribution For Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 30
Develops a model-agnostic attribution score as the log-ratio of conditional response probabilities with and without a marginalized prompt token, derived via Bayes inversion of next-token distributions, and relates it to conditional entropies.
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling cs.CL · 2026-05-18 · unverdicted · none · ref 127
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs cs.SE · 2023-05-20 · unverdicted · none · ref 68
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
Predictive Prefetching for Retrieval-Augmented Generation cs.CL · 2026-05-18 · unverdicted · none · ref 37
Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt cs.CL · 2026-06-01 · unverdicted · none · ref 64
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 63
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 60
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unverdicted · none · ref 31
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning cs.CV · 2026-07-02 · unverdicted · none · ref 50
EADP filters textual noise via statistical entropy then casts token selection as submodular maximization with spatial prior to preserve fine-grained cues in VLMs under strict budgets.
Fast & Faithful Function Vectors cs.CL · 2026-06-03 · unverdicted · none · ref 23
LRP-based attention head selection and distributed application improve the efficiency and accuracy of function vectors for steering LLMs compared to prior choices.
Probing Classifiers: Promises, Shortcomings, and Advances cs.CL · 2021-02-24 · unverdicted · none · ref 58
Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

A Primer in BERT ology: What We Know About How BERT Works

fields

years

verdicts

representative citing papers

citing papers explorer