Programming every example: Lifting pre-training data quality like experts at scale.arXiv preprint arXiv:2409.17115,

Fan Zhou, Zengzhi Wang, Qian Liu, Junlong Li, Pengfei Liu · 2024 · arXiv 2409.17115

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

cs.LG · 2026-03-18 · unverdicted · novelty 7.0

PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Path-Constrained Mixture-of-Experts cs.LG · 2026-03-18 · unverdicted · none · ref 20
PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 186
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

Programming every example: Lifting pre-training data quality like experts at scale.arXiv preprint arXiv:2409.17115,

fields

years

verdicts

representative citing papers

citing papers explorer