BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

LEAP adds a layer-wise exit-aware constraint to standard distillation, reconciling it with early-exit mechanisms and delivering 1.61x wall-clock speedup on MiniLM at 0.95 threshold with 91.9% early exits by layer 7.

citing papers explorer

Showing 1 of 1 citing paper after filters.

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference cs.LG · 2026-05-01 · unverdicted · none · ref 31
LEAP adds a layer-wise exit-aware constraint to standard distillation, reconciling it with early-exit mechanisms and delivering 1.61x wall-clock speedup on MiniLM at 0.95 threshold with 91.9% early exits by layer 7.

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

fields

years

verdicts

representative citing papers

citing papers explorer