Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova · 2019

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.

NEST: Nested Event Stream Transformer for Sequences of Multisets

cs.LG · 2026-01-31 · unverdicted · novelty 7.0

NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.

Scaling Vision Transformers for Functional MRI with Flat Maps

cs.CV · 2025-10-15 · conditional · novelty 7.0

CortexMAE adapts Vision Transformers to fMRI via cortical flat maps, shows power-law scaling on 2.1K hours of data, and outperforms priors on cognitive state decoding while failing to beat a simple functional connectivity baseline on subject-level trait prediction.

Beyond Syntax: Action Semantics Learning for App Agents

cs.AI · 2025-06-21 · unverdicted · novelty 7.0

Action Semantics Learning trains app agents to align with the semantic effects of actions via a Semantic Estimator module, improving robustness to out-of-distribution scenarios over syntax-matching fine-tuning.

Prototype Guided Post-pretraining for Single-Cell Representation Learning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

CellRefine adds a marker-gene-guided post-pretraining stage to single-cell models that refines the cell embedding manifold and improves downstream task performance by up to 15%.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

citing papers explorer

Showing 7 of 7 citing papers.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models cs.LG · 2026-05-08 · unverdicted · none · ref 39
Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study cs.CV · 2026-05-07 · unverdicted · none · ref 7
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.
NEST: Nested Event Stream Transformer for Sequences of Multisets cs.LG · 2026-01-31 · unverdicted · none · ref 5
NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.
Scaling Vision Transformers for Functional MRI with Flat Maps cs.CV · 2025-10-15 · conditional · none · ref 8
CortexMAE adapts Vision Transformers to fMRI via cortical flat maps, shows power-law scaling on 2.1K hours of data, and outperforms priors on cognitive state decoding while failing to beat a simple functional connectivity baseline on subject-level trait prediction.
Beyond Syntax: Action Semantics Learning for App Agents cs.AI · 2025-06-21 · unverdicted · none · ref 29
Action Semantics Learning trains app agents to align with the semantic effects of actions via a Semantic Estimator module, improving robustness to out-of-distribution scenarios over syntax-matching fine-tuning.
Prototype Guided Post-pretraining for Single-Cell Representation Learning cs.LG · 2026-05-08 · unverdicted · none · ref 84
CellRefine adds a marker-gene-guided post-pretraining stage to single-cell models that refines the cell embedding manifold and improves downstream task performance by up to 15%.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 44
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Bert: Pre-training of deep bidi- rectional transformers for language understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer