Recognition: 2 theorem links
· Lean TheoremSelf-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Pith reviewed 2026-05-14 22:55 UTC · model grok-4.3
The pith
Self-play fine-tuning turns a weak supervised LLM into a strong one by iteratively contrasting its own generations against fixed human data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPIN refines an LLM policy through repeated self-play in which the model produces its own training examples from previous checkpoints and learns to distinguish them from the fixed set of human demonstrations; the resulting objective has a unique global optimum achieved exclusively when the policy aligns with the target data distribution.
What carries the argument
The self-play mechanism in which responses generated by the model at iteration t are contrasted against the unchanging human-annotated demonstrations to update the policy at iteration t+1.
If this is right
- Performance rises on the HuggingFace Open LLM Leaderboard, MT-Bench, and Big-Bench tasks without any new human annotations.
- SPIN surpasses direct preference optimization even when the latter receives supplementary GPT-4 preference data.
- The initial supervised fine-tuning dataset alone suffices to reach higher capability levels through iterative refinement.
- The training objective converges to the target distribution only at its global optimum, providing a clear stopping criterion.
Where Pith is reading between the lines
- The approach implies that limited human demonstration data can be amplified through internal generation loops rather than external collection.
- Self-play of this form may extend to other sequence-generation tasks where synthetic examples are inexpensive to produce.
- If the contrast remains informative across many rounds, the method could reduce dependence on large-scale preference labeling pipelines.
Load-bearing premise
Responses generated by earlier model versions supply clean contrastive signals that steadily move the policy toward the human data distribution without accumulating biases or shifts that would block further gains.
What would settle it
If repeated self-play iterations produce no measurable reduction in divergence between the model's output distribution and the human-annotated distribution, or if benchmark scores plateau or degrade, the alignment claim would be refuted.
read the original abstract
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at https://github.com/uclaml/SPIN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Self-Play fIne-tuNing (SPIN), which starts from an SFT model and iteratively generates responses from previous policy iterates to create contrastive training pairs against the fixed human-annotated data; the model is then fine-tuned to prefer the human data. It proves that the resulting objective has a unique global optimum precisely when the policy matches the target data distribution, and reports empirical gains on the HuggingFace Open LLM Leaderboard, MT-Bench, and Big-Bench tasks that exceed those of DPO trained with additional GPT-4 preference data.
Significance. If the empirical improvements are robust to compute-matched controls and the iterative dynamics reliably reach the claimed optimum, the result would be significant: it offers a route to strengthen LLMs using only existing SFT demonstrations, without further human or GPT-4 annotations. The theoretical statement is a standard consequence of imitation-learning objectives, but the self-play mechanism itself is the novel element whose practical reliability remains to be fully established.
major comments (1)
- [Abstract / Theoretical Analysis] Abstract and theoretical section: the proof establishes that the static objective attains its global minimum only at exact alignment with the target distribution, yet no analysis (contraction mapping, Lyapunov function, or fixed-point convergence argument) is supplied for the sequence of policies generated by the self-play iteration. Early weak generations could therefore induce a persistent distribution shift or suboptimal fixed point from which gradient updates cannot escape, so the static guarantee does not automatically transfer to the training trajectory.
minor comments (2)
- [Experiments] Empirical section: the comparison with DPO augmented by GPT-4 data should explicitly state total training tokens, learning-rate schedules, and whether the extra preference data is matched in volume to the self-generated data used by SPIN.
- [Method] The manuscript would benefit from a short discussion of how the self-generated negative examples are sampled (temperature, top-p, number of samples per prompt) and whether any filtering is applied to avoid degenerate outputs.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the single major comment point by point below.
read point-by-point responses
-
Referee: [Abstract / Theoretical Analysis] Abstract and theoretical section: the proof establishes that the static objective attains its global minimum only at exact alignment with the target distribution, yet no analysis (contraction mapping, Lyapunov function, or fixed-point convergence argument) is supplied for the sequence of policies generated by the self-play iteration. Early weak generations could therefore induce a persistent distribution shift or suboptimal fixed point from which gradient updates cannot escape, so the static guarantee does not automatically transfer to the training trajectory.
Authors: We acknowledge that the theoretical analysis establishes the global optimum of the static objective but does not include a formal convergence argument (e.g., contraction mapping or Lyapunov function) for the sequence of policies produced by the iterative self-play procedure. This is a valid observation. The self-play iteration is motivated by the fact that each step contrasts the current policy's outputs against fixed human-annotated data, which in principle reduces distribution shift over time; however, we do not claim or prove that the iteration is guaranteed to reach the global optimum from arbitrary initializations. In the revised manuscript we will add a dedicated paragraph in the theoretical section and an accompanying figure in the experiments section that plots benchmark performance versus iteration number. These additions will empirically document monotonic improvement and the absence of observable stagnation on the evaluated tasks, thereby clarifying the practical behavior of the iteration while preserving the paper's primary contribution of the self-play objective and its static optimality result. revision: partial
Circularity Check
No circularity in derivation chain
full rationale
The paper proves that the global optimum of its training objective occurs precisely when the policy matches the target human data distribution. This is a direct consequence of the standard form of the preference-based loss (human responses preferred over self-generated ones), which has its unique minimum at the target distribution by construction of the objective itself rather than by any self-referential fit or redefinition. The iterative self-play generates negatives from prior policy iterates, but the static objective's minimum is independently fixed by the human-annotated data and does not reduce to the outputs of the iteration. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes are invoked to establish the claim. The derivation remains self-contained; the lack of a separate convergence argument for the sequence of iterates is a gap in dynamics analysis, not a circular reduction in the stated theorem.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math A global optimum of the training objective exists and is achieved precisely when the learned policy matches the target human data distribution.
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.LawOfExistencedefect_zero_iff_one contradicts?
contradictsCONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.
Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general functi...
-
RewardHarness: Self-Evolving Agentic Post-Training
RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
-
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning
IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...
-
Structural Verification for Reliable EDA Code Generation without Tool-in-the-Loop Debugging
Structural dependency graphs and staged pre-execution verification raise LLM-based EDA code pass rates to 82.5% (single-step) and 70-84% (multi-step) while halving tool calls by catching dependency violations before runtime.
-
KTO: Model Alignment as Prospect Theoretic Optimization
KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.
-
Self-Rewarding Language Models
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
-
Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation
Local teachability collapse in trajectory suffixes makes uniform dense supervision suboptimal in strong-to-weak OPD; truncating at BIC-style change points on teacher margin improves performance.
-
Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation
RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout pe...
-
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall i...
-
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
MORA breaks the safety-helpfulness trade-off in LLM alignment by pre-sampling single-reward prompts and rewriting them to expand multi-dimensional reward diversity, yielding 5-12.4% single-preference gains in sequenti...
-
Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning
Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
-
G-Zero: Self-Play for Open-Ended Generation from Zero Data
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
-
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
-
PaT: Planning-after-Trial for Efficient Test-Time Code Generation
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
-
Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models
Gate-DPO attenuates gradients on low-probability rejected responses to reduce probability collapse and improve chosen-response likelihood during preference optimization.
-
SignDPO: Multi-level Direct Preference Optimisation for Skeleton-based Gloss-free Sign Language Translation
SignDPO uses hierarchical perturbations, self-guided attention-based sampling, and an automated language-level preference generator to align skeleton trajectories with linguistic semantics, outperforming prior gloss-f...
-
GroupDPO: Memory efficient Group-wise Direct Preference Optimization
GroupDPO decouples group-wise preference optimization during backpropagation to cut peak memory while keeping the same gradients, allowing larger groups and consistent gains over single-pair DPO plus an NLL term on positives.
-
$\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
π-Play uses self-generated question construction paths as privileged information in multi-agent self-distillation to convert sparse-reward self-play into a dense-feedback loop, surpassing supervised search agents and ...
-
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.
-
Autogenesis: A Self-Evolving Agent Protocol
Autogenesis Protocol defines resource and evolution layers for LLM agents, enabling a system that shows performance gains on long-horizon planning benchmarks.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2306.05268 , year=
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy , author=. arXiv preprint arXiv:2306.05268 , year=
-
[2]
Fine-Tuning Language Models from Human Preferences
Fine-tuning language models from human preferences , author=. arXiv preprint arXiv:1909.08593 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[3]
Self-Rewarding Language Models
Self-rewarding language models , author=. arXiv preprint arXiv:2401.10020 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
rlhf: Scaling reinforcement learning from human feedback with ai feedback , author=
Rlaif: Scaling reinforcement learning from human feedback with ai feedback , author=. arXiv preprint arXiv:2309.00267 , year=
-
[5]
Advances in Neural Information Processing Systems , volume=
Learning to summarize with human feedback , author=. Advances in Neural Information Processing Systems , volume=
-
[6]
International Conference on Machine Learning , pages=
Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[7]
Learning Factored Representations in a Deep Mixture of Experts
Learning factored representations in a deep mixture of experts , author=. arXiv preprint arXiv:1312.4314 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Learning and approximation capabilities of adaptive spline activation function neural networks , author=. Neural Networks , volume=. 1998 , publisher=
work page 1998
-
[9]
Advances in neural information processing systems , pages=
Mixtures of Gaussian processes , author=. Advances in neural information processing systems , pages=. 2001 , publisher=
work page 2001
-
[10]
Advances in neural information processing systems , pages=
Hidden Markov decision trees , author=. Advances in neural information processing systems , pages=. 1997 , publisher=
work page 1997
-
[11]
A parallel mixture of SVMs for very large scale problems , author=. Neural computation , volume=. 2002 , publisher=
work page 2002
-
[12]
arXiv preprint arXiv:2005.10190 , year=
Feature purification: How adversarial training performs robust deep learning , author=. arXiv preprint arXiv:2005.10190 , year=
-
[13]
arXiv preprint arXiv:2012.09816 , year=
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning , author=. arXiv preprint arXiv:2012.09816 , year=
-
[14]
International conference on machine learning , pages=
Language modeling with gated convolutional networks , author=. International conference on machine learning , pages=. 2017 , organization=
work page 2017
-
[15]
Advances in neural information processing systems , pages=
Attention is all you need , author=. Advances in neural information processing systems , pages=
-
[16]
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity , author=. arXiv preprint arXiv:2101.03961 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Adaptive mixtures of local experts , author=. Neural computation , volume=. 1991 , publisher=
work page 1991
-
[18]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Advances in Neural Information Processing Systems , pages=
Global convergence of langevin dynamics based algorithms for nonconvex optimization , author=. Advances in Neural Information Processing Systems , pages=
-
[20]
Journal of Functional Analysis , volume=
Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality , author=. Journal of Functional Analysis , volume=. 2000 , publisher=
work page 2000
-
[21]
arXiv preprint arXiv:1910.11508 , year=
Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations , author=. arXiv preprint arXiv:1910.11508 , year=
-
[22]
Gradient descent optimizes over-parameterized deep ReLU networks
Zou, Difan and Cao, Yuan and Zhou, Dongruo and Gu, Quanquan. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning. 2019
work page 2019
-
[23]
arXiv preprint arXiv:1904.04326 , year=
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics , author=. arXiv preprint arXiv:1904.04326 , year=
-
[24]
the Thirty-Fourth AAAI Conference on Artificial Intelligence , year=
Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , author=. the Thirty-Fourth AAAI Conference on Artificial Intelligence , year=
-
[25]
Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=
Globally optimal gradient descent for a convnet with gaussian inputs , author=. Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=. 2017 , organization=
work page 2017
-
[26]
International Conference on Machine Learning , pages=
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , author=. International Conference on Machine Learning , pages=
-
[27]
Training Over-parameterized Deep
Zhang, Huishuai and Yu, Da and Chen, Wei and Liu, Tie-Yan , journal=. Training Over-parameterized Deep
-
[28]
arXiv preprint arXiv:1902.07111 , year=
Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network , author=. arXiv preprint arXiv:1902.07111 , year=
-
[29]
Advances in neural information processing systems , pages=
Better mini-batch algorithms via accelerated gradient methods , author=. Advances in neural information processing systems , pages=
-
[30]
Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=
Gradient methods for minimizing functionals , author=. Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=. 1963 , publisher=
work page 1963
-
[31]
Journal of Machine Learning Research , volume=
Stochastic dual coordinate ascent methods for regularized loss minimization , author=. Journal of Machine Learning Research , volume=
-
[32]
Bell Labs Technical Journal , volume=
The one-sided barrier problem for Gaussian noise , author=. Bell Labs Technical Journal , volume=. 1962 , publisher=
work page 1962
-
[33]
Alex Krizhevsky , title =
-
[34]
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , author=. arXiv preprint arXiv:1312.6120 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
International Conference on Machine Learning , pages=
Gradient descent with identity initialization efficiently learns positive definite linear transformations , author=. International Conference on Machine Learning , pages=
-
[36]
Electronic Communications in Probability , volume=
A tail inequality for quadratic forms of subgaussian random vectors , author=. Electronic Communications in Probability , volume=. 2012 , publisher=
work page 2012
-
[37]
High-performance hardware for machine learning , author=. NIPS Tutorial , year=
-
[38]
Advances in neural information processing systems , pages=
Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , pages=
-
[39]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Going deeper with convolutions , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
- [40]
-
[41]
Neural Machine Translation by Jointly Learning to Align and Translate
Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
Approximation capabilities of multilayer feedforward networks , author=. Neural networks , volume=. 1991 , publisher=
work page 1991
-
[43]
Advances In Neural Information Processing Systems , pages=
Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity , author=. Advances In Neural Information Processing Systems , pages=
-
[44]
Conference on Learning Theory , pages=
On the expressive power of deep learning: A tensor analysis , author=. Conference on Learning Theory , pages=
-
[45]
International Conference on Machine Learning , pages=
Convolutional rectifier networks as generalized tensor decompositions , author=. International Conference on Machine Learning , pages=
-
[46]
On the Expressive Power of Deep Neural Networks
On the expressive power of deep neural networks , author=. arXiv preprint arXiv:1606.05336 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
Advances In Neural Information Processing Systems , pages=
Exponential expressivity in deep neural networks through transient chaos , author=. Advances In Neural Information Processing Systems , pages=
-
[48]
Advances in neural information processing systems , pages=
On the number of linear regions of deep neural networks , author=. Advances in neural information processing systems , pages=
-
[49]
Training a single sigmoidal neuron is hard , author=. Training , volume=. 2006 , publisher=
work page 2006
-
[50]
Advances in Neural Information Processing Systems , pages=
On the computational efficiency of training neural networks , author=. Advances in Neural Information Processing Systems , pages=
-
[51]
Distribution-Specific Hardness of Learning Neural Networks
Distribution-specific hardness of learning neural networks , author=. arXiv preprint arXiv:1609.01037 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[52]
International Conference on Machine Learning , pages=
Failures of gradient-based deep learning , author=. International Conference on Machine Learning , pages=
-
[53]
Weight Sharing is Crucial to Succesful Optimization
Weight Sharing is Crucial to Succesful Optimization , author=. arXiv preprint arXiv:1706.00687 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[54]
Advances in neural information processing systems , pages=
Training a 3-node neural network is NP-complete , author=. Advances in neural information processing systems , pages=
-
[55]
Goel, Surbhi and Kanade, Varun and Klivans, Adam and Thaler, Justin , journal=. Reliably learning the
-
[56]
Learning Halfspaces and Neural Networks with Random Initialization
Learning halfspaces and neural networks with random initialization , author=. arXiv preprint arXiv:1511.07948 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
Provable Methods for Training Neural Networks with Sparse Connectivity
Provable methods for training neural networks with sparse connectivity , author=. arXiv preprint arXiv:1412.2693 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[58]
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods , author=. arXiv preprint arXiv:1506.08473 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[59]
Conference on Learning Theory , pages=
Escaping from saddle points—online stochastic gradient for tensor decomposition , author=. Conference on Learning Theory , pages=
-
[60]
Advances in Neural Information Processing Systems , pages=
Provable efficient online matrix completion via non-convex stochastic gradient descent , author=. Advances in Neural Information Processing Systems , pages=
-
[61]
How to Escape Saddle Points Efficiently
How to Escape Saddle Points Efficiently , author=. arXiv preprint arXiv:1703.00887 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Depth Creates No Bad Local Minima
Depth Creates No Bad Local Minima , author=. arXiv preprint arXiv:1702.08580 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
International Conference on Learning Representations , year=
Topology and Geometry of Half-Rectified Network Optimization , author=. International Conference on Learning Representations , year=
-
[64]
How regularization affects the critical points in linear networks
How regularization affects the critical points in linear networks , author=. arXiv preprint arXiv:1709.09625 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
Porcupine Neural Networks: (Almost) All Local Optima are Global
Porcupine Neural Networks:(Almost) All Local Optima are Global , author=. arXiv preprint arXiv:1710.02196 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[66]
" Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , author=. arXiv preprint arXiv:1705.02766 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[67]
Artificial Intelligence and Statistics , pages=
The loss surfaces of multilayer networks , author=. Artificial Intelligence and Statistics , pages=
-
[68]
Provable learning of Noisy-or Networks
Provable learning of Noisy-or Networks , author=. arXiv preprint arXiv:1612.08795 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[69]
Artificial Intelligence and Statistics , pages=
On the Learnability of Fully-Connected Neural Networks , author=. Artificial Intelligence and Statistics , pages=
-
[70]
Conference on Learning Theory , pages=
Fast exact matrix completion with finite samples , author=. Conference on Learning Theory , pages=
-
[71]
International Conference on Machine Learning , pages=
Expressiveness of rectifier networks , author=. International Conference on Machine Learning , pages=
-
[72]
IEEE Transactions on Information theory , volume=
Universal approximation bounds for superpositions of a sigmoidal function , author=. IEEE Transactions on Information theory , volume=. 1993 , publisher=
work page 1993
-
[73]
The Landscape of Empirical Risk for Non-convex Losses
The landscape of empirical risk for non-convex losses , author=. arXiv preprint arXiv:1607.06534 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
Advances in neural information processing systems , pages=
Exponentially many local minima for single neurons , author=. Advances in neural information processing systems , pages=
-
[75]
Learning One-hidden-layer Neural Networks with Landscape Design
Learning One-hidden-layer Neural Networks with Landscape Design , author=. arXiv preprint arXiv:1711.00501 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [76]
-
[77]
arXiv preprint arXiv:1802.06463 , year=
Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression , author=. arXiv preprint arXiv:1802.06463 , year=
-
[78]
Advances in Neural Information Processing Systems , pages=
Efficient learning of generalized linear and single index models with isotonic regression , author=. Advances in Neural Information Processing Systems , pages=
-
[79]
Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages=
Low-rank matrix completion using alternating minimization , author=. Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages=. 2013 , organization=
work page 2013
-
[80]
International conference on machine learning , pages=
On the importance of initialization and momentum in deep learning , author=. International conference on machine learning , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.