Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

Daniel Soudry; Jason D. Lee; Mor Shpigel Nacson; Nathan Srebro; Suriya Gunasekar

arxiv: 1905.07325 · v1 · pith:7ENGHTO4new · submitted 2019-05-17 · 📊 stat.ML · cs.LG

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

Mor Shpigel Nacson , Suriya Gunasekar , Jason D. Lee , Nathan Srebro , Daniel Soudry This is my paper

classification 📊 stat.ML cs.LG

keywords homogeneousmodelslimitnon-homogeneoussolutiondeepdescentgradient

0 comments

read the original abstract

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models. To this end we study the limit of loss minimization with a diverging norm constraint (the "constrained path"), relate it to the limit of a "margin path" and characterize the resulting solution. For non-homogeneous ensemble models, which output is a sum of homogeneous sub-models, we show that this solution discards the shallowest sub-models if they are unnecessary. For homogeneous models, we show convergence to a "lexicographic max-margin solution", and provide conditions under which max-margin solutions are also attained as the limit of unconstrained gradient descent.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
cs.LG 2024-01 unverdicted novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...