Low-precision training of large language models: Methods, challenges, and opportunities.arXiv preprint arXiv:2505.01043

Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao · 2025 · arXiv 2505.01043

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.

Reliable Evaluation Protocol for Low-Precision Retrieval

cs.IR · 2025-08-05 · unverdicted · novelty 6.0

Proposes High-Precision Scoring (HPS) and Tie-aware Retrieval Metrics (TRM) to reduce tie-induced instability in low-precision retrieval evaluation.

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

cs.LG · 2026-05-19 · 2 refs

citing papers explorer

Showing 4 of 4 citing papers.

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention cs.LG · 2025-10-05 · unverdicted · none · ref 7
Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models cs.LG · 2026-04-16 · unverdicted · none · ref 15
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
Reliable Evaluation Protocol for Low-Precision Retrieval cs.IR · 2025-08-05 · unverdicted · none · ref 3
Proposes High-Precision Scoring (HPS) and Tie-aware Retrieval Metrics (TRM) to reduce tie-induced instability in low-precision retrieval evaluation.
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor cs.LG · 2026-05-19 · unreviewed · ref 20 · 2 links

Low-precision training of large language models: Methods, challenges, and opportunities.arXiv preprint arXiv:2505.01043

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer