BERT: Pre- training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers

cs.LG · 2026-05-09 · unverdicted · novelty 3.0

This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.

citing papers explorer

Showing 1 of 1 citing paper.

Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers cs.LG · 2026-05-09 · unverdicted · none · ref 9
This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.

BERT: Pre- training of deep bidirectional transformers for language understanding

fields

years

verdicts

representative citing papers

citing papers explorer