Flm-101b: An open llm and how to train it with $100 k budget

11 Preprint Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, et al · 2023 · arXiv 2309.03852

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts

cs.LG · 2025-10-09 · unverdicted · novelty 5.0

Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 2 of 2 citing papers.

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts cs.LG · 2025-10-09 · unverdicted · none · ref 12
Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.
A Survey of Large Language Models cs.CL · 2023-03-31 · accept · none · ref 104
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Flm-101b: An open llm and how to train it with $100 k budget

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer