Mixture-of-channels: Exploiting sparse ffns for efficient llms pre-training and inference.ArXiv preprint, abs/2511.09323, 2025

Tong Wu, Yutong He, Bin Wang, Kun Yuan · 2025 · arXiv 2511.09323

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

Continual training recipe upcycles dense Qwen2.5-8B LLM to 4x channel-sparse model via predictor-gated bank-wise sparsity in SwiGLU FFN with a single-layer repair for long-context failure on RULER-CWE.

citing papers explorer

Showing 1 of 1 citing paper.

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs cs.CL · 2026-06-09 · unverdicted · none · ref 8
Continual training recipe upcycles dense Qwen2.5-8B LLM to 4x channel-sparse model via predictor-gated bank-wise sparsity in SwiGLU FFN with a single-layer repair for long-context failure on RULER-CWE.

Mixture-of-channels: Exploiting sparse ffns for efficient llms pre-training and inference.ArXiv preprint, abs/2511.09323, 2025

fields

years

verdicts

representative citing papers

citing papers explorer