A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Adding positional encoding to MLP inputs for robot self-collision detection improves accuracy by capturing high-frequency position variations better than standard inputs.
citing papers explorer
-
HRM-Text: Efficient Pretraining Beyond Scaling
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
-
Improving Machine Learning-Based Robot Self-Collision Checking with Input Positional Encoding
Adding positional encoding to MLP inputs for robot self-collision detection improves accuracy by capturing high-frequency position variations better than standard inputs.