Visualizing the Loss Landscape of Neural Nets

https://arxiv · 2017 · cs.LG · arXiv 1712.09913

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Ravines in quantum cost landscapes: opportunities for improved VQA predictions

quant-ph · 2026-07-01 · unverdicted · novelty 6.0

NEB-adapted ravine ensembles for QNNs classifying concentratable entanglement outperform naive methods when local-prediction variability is high and reduce costs, with ravines persisting under depth and qubit scaling.

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.

Sharpness-Aware Minimization for Efficiently Improving Generalization

cs.LG · 2020-10-03 · conditional · novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.

Multiscale reconstruction of protein conformations from cryo-EM images

eess.IV · 2026-06-16 · unverdicted · novelty 5.0

A multiscale optimization method using explicit protein backbone geometry reconstructs atomic models from cryo-EM data, showing improved RMSD and TM scores on three simulated datasets.

A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias

cs.LG · 2025-11-21 · unverdicted · novelty 5.0

A linear stability analysis introduces data coherence to explain why SGD and SAM prefer stable and simple minima in two-layer ReLU networks.

Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

hep-ex · 2023-03-25 · unverdicted · novelty 5.0

Adversarial training enhances robustness of jet tagging classifiers while preserving performance, with loss surface geometry providing insights into correlations and vulnerability.

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

GNC convolves stochastic gradient noise to smooth sharp minima in large-batch SGD, outperforming isotropic noise for better generalization in distributed deep learning.

Conditional Wasserstein GAN for Simulating Neutrino Event Summaries using Incident Energy of Electron Neutrinos

hep-ph · 2026-03-23 · unverdicted · novelty 4.0

A conditional Wasserstein GAN generates complete kinematic event summaries for IBD-CC, NC, and NuEElastic electron neutrino interactions that match GENIE distributions in 1D marginals and correlations.

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

cs.LG · 2022-12-18 · unverdicted · novelty 2.0

A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.

citing papers explorer

Showing 10 of 10 citing papers.

Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 57
Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
Ravines in quantum cost landscapes: opportunities for improved VQA predictions quant-ph · 2026-07-01 · unverdicted · none · ref 79 · internal anchor
NEB-adapted ravine ensembles for QNNs classifying concentratable entanglement outperform naive methods when local-prediction variability is high and reduce costs, with ravines persisting under depth and qubit scaling.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity cs.LG · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
Sharpness-Aware Minimization for Efficiently Improving Generalization cs.LG · 2020-10-03 · conditional · none · ref 30 · internal anchor
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
Multiscale reconstruction of protein conformations from cryo-EM images eess.IV · 2026-06-16 · unverdicted · none · ref 59 · internal anchor
A multiscale optimization method using explicit protein backbone geometry reconstructs atomic models from cryo-EM data, showing improved RMSD and TM scores on three simulated datasets.
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias cs.LG · 2025-11-21 · unverdicted · none · ref 3 · internal anchor
A linear stability analysis introduces data coherence to explain why SGD and SAM prefer stable and simple minima in two-layer ReLU networks.
Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface hep-ex · 2023-03-25 · unverdicted · none · ref 18 · internal anchor
Adversarial training enhances robustness of jet tagging classifiers while preserving performance, with loss surface geometry providing insights into correlations and vulnerability.
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD cs.LG · 2019-06-26 · unverdicted · none · ref 16 · internal anchor
GNC convolves stochastic gradient noise to smooth sharp minima in large-batch SGD, outperforming isotropic noise for better generalization in distributed deep learning.
Conditional Wasserstein GAN for Simulating Neutrino Event Summaries using Incident Energy of Electron Neutrinos hep-ph · 2026-03-23 · unverdicted · none · ref 46 · internal anchor
A conditional Wasserstein GAN generates complete kinematic event summaries for IBD-CC, NC, and NuEElastic electron neutrino interactions that match GENIE distributions in 1D marginals and correlations.
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics cs.LG · 2022-12-18 · unverdicted · none · ref 134 · internal anchor
A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.

Visualizing the Loss Landscape of Neural Nets

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer