A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
RoFormer : Enhanced transformer with rotary position embedding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
citing papers explorer
-
Super Apriel: One Checkpoint, Many Speeds
A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
- SMolLM: Small Language Models Learn Small Molecular Grammar