A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
RoFormer : Enhanced transformer with rotary position embedding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A 53K-parameter weight-shared transformer generates novel valid SMILES at 95% rate on ZINC-250K and resolves constraints hierarchically via bracket, ring, and valence stages as shown by probing and ablation.
citing papers explorer
-
Super Apriel: One Checkpoint, Many Speeds
A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
-
SMolLM: Small Language Models Learn Small Molecular Grammar
A 53K-parameter weight-shared transformer generates novel valid SMILES at 95% rate on ZINC-250K and resolves constraints hierarchically via bracket, ring, and valence stages as shown by probing and ablation.