A Primer in BERT ology: What We Know About How BERT Works

Anna Rogers, Olga Kovaleva, Anna Rumshisky · 2021 · DOI 10.1162/tacl_a_00349

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

representative citing papers

Probabilistic Attribution For Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Develops a model-agnostic attribution score as the log-ratio of conditional response probabilities with and without a marginalized prompt token, derived via Bayes inversion of next-token distributions, and relates it to conditional entropies.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

cs.SE · 2023-05-20 · unverdicted · novelty 7.0

LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.

Predictive Prefetching for Retrieval-Augmented Generation

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.

Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.

Search-R3: Unifying Reasoning and Embedding in Large Language Models

cs.CL · 2025-10-08 · unverdicted · novelty 5.0

Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

cs.CL · 2025-06-02 · unverdicted · novelty 5.0

Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

Probing Classifiers: Promises, Shortcomings, and Advances

cs.CL · 2021-02-24 · unverdicted · novelty 3.0

Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

citing papers explorer

Showing 8 of 8 citing papers.

Probabilistic Attribution For Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 30
Develops a model-agnostic attribution score as the log-ratio of conditional response probabilities with and without a marginalized prompt token, derived via Bayes inversion of next-token distributions, and relates it to conditional entropies.
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling cs.CL · 2026-05-18 · unverdicted · none · ref 127
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs cs.SE · 2023-05-20 · unverdicted · none · ref 68
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
Predictive Prefetching for Retrieval-Augmented Generation cs.CL · 2026-05-18 · unverdicted · none · ref 37
Introduces predictive prefetching for RAG that anticipates retrieval needs several tokens ahead via three components, reporting up to 43.5% latency reduction and 62.4% TTFT improvement while preserving answer quality.
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 63
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 60
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unverdicted · none · ref 31
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
Probing Classifiers: Promises, Shortcomings, and Advances cs.CL · 2021-02-24 · unverdicted · none · ref 58
Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.

A Primer in BERT ology: What We Know About How BERT Works

fields

years

verdicts

representative citing papers

citing papers explorer