pith. machine review for the scientific record. sign in

Programming every example: Lifting pre-training data quality like experts at scale.arXiv preprint arXiv:2409.17115,

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2026 1 2025 1

verdicts

UNVERDICTED 2

representative citing papers

Path-Constrained Mixture-of-Experts

cs.LG · 2026-03-18 · unverdicted · novelty 7.0

PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.

citing papers explorer

Showing 2 of 2 citing papers.

  • Path-Constrained Mixture-of-Experts cs.LG · 2026-03-18 · unverdicted · none · ref 20

    PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.

  • Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 186

    A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.