arxiv: 2401.01335 · v3 · submitted 2024-01-02 · 💻 cs.LG · cs.AI· cs.CL· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen , Yihe Deng , Huizhuo Yuan , Kaixuan Ji , Quanquan Gu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLstat.ML

keywords self-play fine-tuninglanguage model self-improvementsupervised fine-tuningpreference optimizationLLM alignmentiterative refinement

0 comments

The pith

Self-play fine-tuning turns a weak supervised LLM into a strong one by iteratively contrasting its own generations against fixed human data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Self-Play fIne-tuNing (SPIN), which begins with a supervised fine-tuned language model and lets it generate responses from its own prior iterations. The training objective then teaches the model to favor the original human-annotated responses over these self-generated ones. The process repeats across iterations, and theory shows the objective reaches its global minimum only when the model's policy exactly matches the target human data distribution. Experiments on standard benchmarks demonstrate consistent gains and even exceed direct preference optimization that uses additional GPT-4 preference pairs.

Core claim

SPIN refines an LLM policy through repeated self-play in which the model produces its own training examples from previous checkpoints and learns to distinguish them from the fixed set of human demonstrations; the resulting objective has a unique global optimum achieved exclusively when the policy aligns with the target data distribution.

What carries the argument

The self-play mechanism in which responses generated by the model at iteration t are contrasted against the unchanging human-annotated demonstrations to update the policy at iteration t+1.

If this is right

Performance rises on the HuggingFace Open LLM Leaderboard, MT-Bench, and Big-Bench tasks without any new human annotations.
SPIN surpasses direct preference optimization even when the latter receives supplementary GPT-4 preference data.
The initial supervised fine-tuning dataset alone suffices to reach higher capability levels through iterative refinement.
The training objective converges to the target distribution only at its global optimum, providing a clear stopping criterion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach implies that limited human demonstration data can be amplified through internal generation loops rather than external collection.
Self-play of this form may extend to other sequence-generation tasks where synthetic examples are inexpensive to produce.
If the contrast remains informative across many rounds, the method could reduce dependence on large-scale preference labeling pipelines.

Load-bearing premise

Responses generated by earlier model versions supply clean contrastive signals that steadily move the policy toward the human data distribution without accumulating biases or shifts that would block further gains.

What would settle it

If repeated self-play iterations produce no measurable reduction in divergence between the model's output distribution and the human-annotated distribution, or if benchmark scores plateau or degrade, the alignment claim would be refuted.

read the original abstract

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at https://github.com/uclaml/SPIN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPIN improves SFT models via self-play but the iterative convergence isn't proven.

read the letter

Hey colleague, The main takeaway from this paper is that their SPIN method can take a basic SFT model and boost its performance by having it play against its own previous versions, using self-generated outputs as the 'losing' side in a contrast with human data. They report solid gains on standard LLM benchmarks and even surpass DPO that had extra GPT-4 preference data. On the positive side, the approach is simple to understand and implement, and the empirical results across multiple datasets like the Open LLM Leaderboard and MT-Bench look promising. Having the code available at that GitHub link makes it easier to verify and build on. Where it could be stronger is the theoretical backing for the iterative process. The proof covers the static case where the optimum is the target distribution, but it doesn't address whether the self-play loop reliably gets there or could get stuck due to shifting distributions from early iterations. That's a gap the stress-test highlights, and it seems valid based on the abstract. Also, while they compare to DPO, more details on matching the total training effort would help rule out that the gains come just from more steps. This kind of work is aimed at the LLM fine-tuning community, especially those looking to maximize existing human data. A reader focused on practical improvements in alignment would get value from the method and results. It has enough substance and novelty to merit peer review rather than a desk reject. I'd recommend sending it out for review.

Referee Report

1 major / 2 minor

Summary. The paper proposes Self-Play fIne-tuNing (SPIN), which starts from an SFT model and iteratively generates responses from previous policy iterates to create contrastive training pairs against the fixed human-annotated data; the model is then fine-tuned to prefer the human data. It proves that the resulting objective has a unique global optimum precisely when the policy matches the target data distribution, and reports empirical gains on the HuggingFace Open LLM Leaderboard, MT-Bench, and Big-Bench tasks that exceed those of DPO trained with additional GPT-4 preference data.

Significance. If the empirical improvements are robust to compute-matched controls and the iterative dynamics reliably reach the claimed optimum, the result would be significant: it offers a route to strengthen LLMs using only existing SFT demonstrations, without further human or GPT-4 annotations. The theoretical statement is a standard consequence of imitation-learning objectives, but the self-play mechanism itself is the novel element whose practical reliability remains to be fully established.

major comments (1)

[Abstract / Theoretical Analysis] Abstract and theoretical section: the proof establishes that the static objective attains its global minimum only at exact alignment with the target distribution, yet no analysis (contraction mapping, Lyapunov function, or fixed-point convergence argument) is supplied for the sequence of policies generated by the self-play iteration. Early weak generations could therefore induce a persistent distribution shift or suboptimal fixed point from which gradient updates cannot escape, so the static guarantee does not automatically transfer to the training trajectory.

minor comments (2)

[Experiments] Empirical section: the comparison with DPO augmented by GPT-4 data should explicitly state total training tokens, learning-rate schedules, and whether the extra preference data is matched in volume to the self-generated data used by SPIN.
[Method] The manuscript would benefit from a short discussion of how the self-generated negative examples are sampled (temperature, top-p, number of samples per prompt) and whether any filtering is applied to avoid degenerate outputs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment point by point below.

read point-by-point responses

Referee: [Abstract / Theoretical Analysis] Abstract and theoretical section: the proof establishes that the static objective attains its global minimum only at exact alignment with the target distribution, yet no analysis (contraction mapping, Lyapunov function, or fixed-point convergence argument) is supplied for the sequence of policies generated by the self-play iteration. Early weak generations could therefore induce a persistent distribution shift or suboptimal fixed point from which gradient updates cannot escape, so the static guarantee does not automatically transfer to the training trajectory.

Authors: We acknowledge that the theoretical analysis establishes the global optimum of the static objective but does not include a formal convergence argument (e.g., contraction mapping or Lyapunov function) for the sequence of policies produced by the iterative self-play procedure. This is a valid observation. The self-play iteration is motivated by the fact that each step contrasts the current policy's outputs against fixed human-annotated data, which in principle reduces distribution shift over time; however, we do not claim or prove that the iteration is guaranteed to reach the global optimum from arbitrary initializations. In the revised manuscript we will add a dedicated paragraph in the theoretical section and an accompanying figure in the experiments section that plots benchmark performance versus iteration number. These additions will empirically document monotonic improvement and the absence of observable stagnation on the evaluated tasks, thereby clarifying the practical behavior of the iteration while preserving the paper's primary contribution of the self-play objective and its static optimality result. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper proves that the global optimum of its training objective occurs precisely when the policy matches the target human data distribution. This is a direct consequence of the standard form of the preference-based loss (human responses preferred over self-generated ones), which has its unique minimum at the target distribution by construction of the objective itself rather than by any self-referential fit or redefinition. The iterative self-play generates negatives from prior policy iterates, but the static objective's minimum is independently fixed by the human-annotated data and does not reduce to the outputs of the iteration. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes are invoked to establish the claim. The derivation remains self-contained; the lack of a separate convergence argument for the sequence of iterates is a gap in dynamics analysis, not a circular reduction in the stated theorem.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions from reinforcement learning and imitation learning that a policy can be improved by contrasting against a fixed target distribution; no new free parameters or invented entities are introduced in the abstract description.

axioms (1)

standard math A global optimum of the training objective exists and is achieved precisely when the learned policy matches the target human data distribution.
This is invoked as the theoretical foundation for why the self-play objective drives improvement.

pith-pipeline@v0.9.0 · 5614 in / 1298 out tokens · 51513 ms · 2026-05-14T22:55:44.631426+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.LawOfExistence defect_zero_iff_one contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
cs.LG 2026-05 unverdicted novelty 7.0

The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general functi...
RewardHarness: Self-Evolving Agentic Post-Training
cs.AI 2026-05 unverdicted novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning
cs.LG 2026-04 unverdicted novelty 7.0

IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...
Structural Verification for Reliable EDA Code Generation without Tool-in-the-Loop Debugging
cs.SE 2026-04 unverdicted novelty 7.0

Structural dependency graphs and staged pre-execution verification raise LLM-based EDA code pass rates to 82.5% (single-step) and 70-84% (multi-step) while halving tool calls by catching dependency violations before runtime.
KTO: Model Alignment as Prospect Theoretic Optimization
cs.LG 2024-02 conditional novelty 7.0

KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.
Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation
cs.CL 2026-05 unverdicted novelty 6.0

Local teachability collapse in trajectory suffixes makes uniform dense supervision suboptimal in strong-to-weak OPD; truncating at BIC-style change points on teacher margin improves performance.
Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation
cs.LG 2026-05 unverdicted novelty 6.0

RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout pe...
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
cs.AI 2026-05 unverdicted novelty 6.0

MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall i...
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
cs.AI 2026-05 unverdicted novelty 6.0

MORA breaks the safety-helpfulness trade-off in LLM alignment by pre-sampling single-reward prompts and rewriting them to expand multi-dimensional reward diversity, yielding 5-12.4% single-preference gains in sequenti...
Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
G-Zero: Self-Play for Open-Ended Generation from Zero Data
cs.LG 2026-05 unverdicted novelty 6.0

G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
cs.LG 2026-05 unverdicted novelty 6.0

GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
PaT: Planning-after-Trial for Efficient Test-Time Code Generation
cs.CL 2026-05 unverdicted novelty 6.0

PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models
cs.LG 2026-05 conditional novelty 6.0

Gate-DPO attenuates gradients on low-probability rejected responses to reduce probability collapse and improve chosen-response likelihood during preference optimization.
SignDPO: Multi-level Direct Preference Optimisation for Skeleton-based Gloss-free Sign Language Translation
cs.CL 2026-04 unverdicted novelty 6.0

SignDPO uses hierarchical perturbations, self-guided attention-based sampling, and an automated language-level preference generator to align skeleton trajectories with linguistic semantics, outperforming prior gloss-f...
GroupDPO: Memory efficient Group-wise Direct Preference Optimization
cs.CL 2026-04 unverdicted novelty 6.0

GroupDPO decouples group-wise preference optimization during backpropagation to cut peak memory while keeping the same gradients, allowing larger groups and consistent gains over single-pair DPO plus an NLL term on positives.
$\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
cs.LG 2026-04 unverdicted novelty 6.0

π-Play uses self-generated question construction paths as privileged information in multi-agent self-distillation to convert sparse-reward self-play into a dense-feedback loop, surpassing supervised search agents and ...
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
cs.CL 2026-04 unverdicted novelty 6.0

Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.
Autogenesis: A Self-Evolving Agent Protocol
cs.AI 2026-04 unverdicted novelty 5.0

Autogenesis Protocol defines resource and evolution layers for LLM agents, enabling a system that shows performance gains on long-horizon planning benchmarks.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
cs.AI 2025-07 accept novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 20 Pith papers · 46 internal anchors

[1]

arXiv preprint arXiv:2306.05268 , year=

Factorized Contrastive Learning: Going Beyond Multi-view Redundancy , author=. arXiv preprint arXiv:2306.05268 , year=

work page arXiv
[2]

Fine-Tuning Language Models from Human Preferences

Fine-tuning language models from human preferences , author=. arXiv preprint arXiv:1909.08593 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1909
[3]

Self-Rewarding Language Models

Self-rewarding language models , author=. arXiv preprint arXiv:2401.10020 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

rlhf: Scaling reinforcement learning from human feedback with ai feedback , author=

Rlaif: Scaling reinforcement learning from human feedback with ai feedback , author=. arXiv preprint arXiv:2309.00267 , year=

work page arXiv
[5]

Advances in Neural Information Processing Systems , volume=

Learning to summarize with human feedback , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[7]

Learning Factored Representations in a Deep Mixture of Experts

Learning factored representations in a deep mixture of experts , author=. arXiv preprint arXiv:1312.4314 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Neural Networks , volume=

Learning and approximation capabilities of adaptive spline activation function neural networks , author=. Neural Networks , volume=. 1998 , publisher=

work page 1998
[9]

Advances in neural information processing systems , pages=

Mixtures of Gaussian processes , author=. Advances in neural information processing systems , pages=. 2001 , publisher=

work page 2001
[10]

Advances in neural information processing systems , pages=

Hidden Markov decision trees , author=. Advances in neural information processing systems , pages=. 1997 , publisher=

work page 1997
[11]

Neural computation , volume=

A parallel mixture of SVMs for very large scale problems , author=. Neural computation , volume=. 2002 , publisher=

work page 2002
[12]

arXiv preprint arXiv:2005.10190 , year=

Feature purification: How adversarial training performs robust deep learning , author=. arXiv preprint arXiv:2005.10190 , year=

work page arXiv 2005
[13]

arXiv preprint arXiv:2012.09816 , year=

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning , author=. arXiv preprint arXiv:2012.09816 , year=

work page arXiv 2012
[14]

International conference on machine learning , pages=

Language modeling with gated convolutional networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[15]

Advances in neural information processing systems , pages=

Attention is all you need , author=. Advances in neural information processing systems , pages=

work page
[16]

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity , author=. arXiv preprint arXiv:2101.03961 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Neural computation , volume=

Adaptive mixtures of local experts , author=. Neural computation , volume=. 1991 , publisher=

work page 1991
[18]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Advances in Neural Information Processing Systems , pages=

Global convergence of langevin dynamics based algorithms for nonconvex optimization , author=. Advances in Neural Information Processing Systems , pages=

work page
[20]

Journal of Functional Analysis , volume=

Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality , author=. Journal of Functional Analysis , volume=. 2000 , publisher=

work page 2000
[21]

arXiv preprint arXiv:1910.11508 , year=

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations , author=. arXiv preprint arXiv:1910.11508 , year=

work page arXiv 1910
[22]

Gradient descent optimizes over-parameterized deep ReLU networks

Zou, Difan and Cao, Yuan and Zhou, Dongruo and Gu, Quanquan. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning. 2019

work page 2019
[23]

arXiv preprint arXiv:1904.04326 , year=

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics , author=. arXiv preprint arXiv:1904.04326 , year=

work page arXiv 1904
[24]

the Thirty-Fourth AAAI Conference on Artificial Intelligence , year=

Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , author=. the Thirty-Fourth AAAI Conference on Artificial Intelligence , year=

work page
[25]

Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=

Globally optimal gradient descent for a convnet with gaussian inputs , author=. Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=. 2017 , organization=

work page 2017
[26]

International Conference on Machine Learning , pages=

Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , author=. International Conference on Machine Learning , pages=

work page
[27]

Training Over-parameterized Deep

Zhang, Huishuai and Yu, Da and Chen, Wei and Liu, Tie-Yan , journal=. Training Over-parameterized Deep

work page
[28]

arXiv preprint arXiv:1902.07111 , year=

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network , author=. arXiv preprint arXiv:1902.07111 , year=

work page arXiv 1902
[29]

Advances in neural information processing systems , pages=

Better mini-batch algorithms via accelerated gradient methods , author=. Advances in neural information processing systems , pages=

work page
[30]

Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=

Gradient methods for minimizing functionals , author=. Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=. 1963 , publisher=

work page 1963
[31]

Journal of Machine Learning Research , volume=

Stochastic dual coordinate ascent methods for regularized loss minimization , author=. Journal of Machine Learning Research , volume=

work page
[32]

Bell Labs Technical Journal , volume=

The one-sided barrier problem for Gaussian noise , author=. Bell Labs Technical Journal , volume=. 1962 , publisher=

work page 1962
[33]

Alex Krizhevsky , title =

work page
[34]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , author=. arXiv preprint arXiv:1312.6120 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

International Conference on Machine Learning , pages=

Gradient descent with identity initialization efficiently learns positive definite linear transformations , author=. International Conference on Machine Learning , pages=

work page
[36]

Electronic Communications in Probability , volume=

A tail inequality for quadratic forms of subgaussian random vectors , author=. Electronic Communications in Probability , volume=. 2012 , publisher=

work page 2012
[37]

NIPS Tutorial , year=

High-performance hardware for machine learning , author=. NIPS Tutorial , year=

work page
[38]

Advances in neural information processing systems , pages=

Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , pages=

work page
[39]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Going deeper with convolutions , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[40]

, author=

Fast and Robust Neural Network Joint Models for Statistical Machine Translation. , author=. ACL (1) , pages=

work page
[41]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Neural networks , volume=

Approximation capabilities of multilayer feedforward networks , author=. Neural networks , volume=. 1991 , publisher=

work page 1991
[43]

Advances In Neural Information Processing Systems , pages=

Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity , author=. Advances In Neural Information Processing Systems , pages=

work page
[44]

Conference on Learning Theory , pages=

On the expressive power of deep learning: A tensor analysis , author=. Conference on Learning Theory , pages=

work page
[45]

International Conference on Machine Learning , pages=

Convolutional rectifier networks as generalized tensor decompositions , author=. International Conference on Machine Learning , pages=

work page
[46]

On the Expressive Power of Deep Neural Networks

On the expressive power of deep neural networks , author=. arXiv preprint arXiv:1606.05336 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Advances In Neural Information Processing Systems , pages=

Exponential expressivity in deep neural networks through transient chaos , author=. Advances In Neural Information Processing Systems , pages=

work page
[48]

Advances in neural information processing systems , pages=

On the number of linear regions of deep neural networks , author=. Advances in neural information processing systems , pages=

work page
[49]

Training , volume=

Training a single sigmoidal neuron is hard , author=. Training , volume=. 2006 , publisher=

work page 2006
[50]

Advances in Neural Information Processing Systems , pages=

On the computational efficiency of training neural networks , author=. Advances in Neural Information Processing Systems , pages=

work page
[51]

Distribution-Specific Hardness of Learning Neural Networks

Distribution-specific hardness of learning neural networks , author=. arXiv preprint arXiv:1609.01037 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

International Conference on Machine Learning , pages=

Failures of gradient-based deep learning , author=. International Conference on Machine Learning , pages=

work page
[53]

Weight Sharing is Crucial to Succesful Optimization

Weight Sharing is Crucial to Succesful Optimization , author=. arXiv preprint arXiv:1706.00687 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Advances in neural information processing systems , pages=

Training a 3-node neural network is NP-complete , author=. Advances in neural information processing systems , pages=

work page
[55]

Reliably learning the

Goel, Surbhi and Kanade, Varun and Klivans, Adam and Thaler, Justin , journal=. Reliably learning the

work page
[56]

Learning Halfspaces and Neural Networks with Random Initialization

Learning halfspaces and neural networks with random initialization , author=. arXiv preprint arXiv:1511.07948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Provable Methods for Training Neural Networks with Sparse Connectivity

Provable methods for training neural networks with sparse connectivity , author=. arXiv preprint arXiv:1412.2693 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods , author=. arXiv preprint arXiv:1506.08473 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Conference on Learning Theory , pages=

Escaping from saddle points—online stochastic gradient for tensor decomposition , author=. Conference on Learning Theory , pages=

work page
[60]

Advances in Neural Information Processing Systems , pages=

Provable efficient online matrix completion via non-convex stochastic gradient descent , author=. Advances in Neural Information Processing Systems , pages=

work page
[61]

How to Escape Saddle Points Efficiently

How to Escape Saddle Points Efficiently , author=. arXiv preprint arXiv:1703.00887 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[62]

Depth Creates No Bad Local Minima

Depth Creates No Bad Local Minima , author=. arXiv preprint arXiv:1702.08580 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[63]

International Conference on Learning Representations , year=

Topology and Geometry of Half-Rectified Network Optimization , author=. International Conference on Learning Representations , year=

work page
[64]

How regularization affects the critical points in linear networks

How regularization affects the critical points in linear networks , author=. arXiv preprint arXiv:1709.09625 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[65]

Porcupine Neural Networks: (Almost) All Local Optima are Global

Porcupine Neural Networks:(Almost) All Local Optima are Global , author=. arXiv preprint arXiv:1710.02196 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[66]

"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

" Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , author=. arXiv preprint arXiv:1705.02766 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Artificial Intelligence and Statistics , pages=

The loss surfaces of multilayer networks , author=. Artificial Intelligence and Statistics , pages=

work page
[68]

Provable learning of Noisy-or Networks

Provable learning of Noisy-or Networks , author=. arXiv preprint arXiv:1612.08795 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Artificial Intelligence and Statistics , pages=

On the Learnability of Fully-Connected Neural Networks , author=. Artificial Intelligence and Statistics , pages=

work page
[70]

Conference on Learning Theory , pages=

Fast exact matrix completion with finite samples , author=. Conference on Learning Theory , pages=

work page
[71]

International Conference on Machine Learning , pages=

Expressiveness of rectifier networks , author=. International Conference on Machine Learning , pages=

work page
[72]

IEEE Transactions on Information theory , volume=

Universal approximation bounds for superpositions of a sigmoidal function , author=. IEEE Transactions on Information theory , volume=. 1993 , publisher=

work page 1993
[73]

The Landscape of Empirical Risk for Non-convex Losses

The landscape of empirical risk for non-convex losses , author=. arXiv preprint arXiv:1607.06534 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[74]

Advances in neural information processing systems , pages=

Exponentially many local minima for single neurons , author=. Advances in neural information processing systems , pages=

work page
[75]

Learning One-hidden-layer Neural Networks with Landscape Design

Learning One-hidden-layer Neural Networks with Landscape Design , author=. arXiv preprint arXiv:1711.00501 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

, author=

The Isotron Algorithm: High-Dimensional Isotonic Regression. , author=. COLT , year=

work page
[77]

arXiv preprint arXiv:1802.06463 , year=

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression , author=. arXiv preprint arXiv:1802.06463 , year=

work page arXiv
[78]

Advances in Neural Information Processing Systems , pages=

Efficient learning of generalized linear and single index models with isotonic regression , author=. Advances in Neural Information Processing Systems , pages=

work page
[79]

Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages=

Low-rank matrix completion using alternating minimization , author=. Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages=. 2013 , organization=

work page 2013
[80]

International conference on machine learning , pages=

On the importance of initialization and momentum in deep learning , author=. International conference on machine learning , pages=

work page

Showing first 80 references.