Block-based double decoders achieve full supervision in pretraining like decoder-only models and efficient inference like encoder-decoders through doubly-causal block-based attention masks, outperforming encoder-decoders in scaling experiments.
Improv- ing language understanding by generative pre-training
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Block-Based Double Decoders
Block-based double decoders achieve full supervision in pretraining like decoder-only models and efficient inference like encoder-decoders through doubly-causal block-based attention masks, outperforming encoder-decoders in scaling experiments.