Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
arXiv preprint arXiv:2210.11693 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.
A retrospective survey and empirical evaluation of deep learning optimization algorithms that identifies trends, design trade-offs, and future directions.
citing papers explorer
-
The Falcon Series of Open Language Models
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
-
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.
-
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
A retrospective survey and empirical evaluation of deep learning optimization algorithms that identifies trends, design trade-offs, and future directions.