Title resolution pending

How much over-parameterization is sufficient to learn deep · 1911 · arXiv 1911.12360

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients cs.LG · 2026-05-14 · unverdicted · none · ref 54
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization cs.LG · 2026-05-13 · unverdicted · none · ref 248
The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 241
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer