Navigating Potholes with Geometry-Aware Sharpness Minimization

Aristide Baratin; Damien Scieur; Ioannis Mitliagkas; Mehrab Hamidi; Razvan Pascanu; Simon Dufort-Labb\'e

arxiv: 2605.16134 · v1 · pith:SZBKRAHXnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI

Navigating Potholes with Geometry-Aware Sharpness Minimization

Simon Dufort-Labb\'e , Mehrab Hamidi , Razvan Pascanu , Ioannis Mitliagkas , Damien Scieur , Aristide Baratin This is my paper

Pith reviewed 2026-05-20 19:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords sharpness-aware minimizationloss landscape geometrypreconditionerpotholesflat minimatwo-timescale optimizationneural network training

0 comments

The pith

A slow geometry preconditioner combined with sharpness-aware minimization amplifies escape from local loss potholes while keeping wide flat basins stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LLQR+SAM to improve sharpness-aware minimization by incorporating a learned preconditioner that captures smoothed loss landscape geometry. The preconditioner comes from the LLQR framework and updates as a slow exponential moving average, providing a low-resolution view of average curvature. SAM perturbations then run on top of this view at a faster timescale to boost signals for escaping directions that appear flat on average but prove sharp locally. If the two-timescale separation works as described, optimizers gain a way to navigate loss surfaces with hidden traps without destabilizing good regions. Empirical gains on vision and sequence tasks support treating slow geometry and fast sharpness correction as complementary rather than redundant.

Core claim

The central claim is that the preconditioner amplifies the SAM escape signal in directions that are flat under the average geometry but locally sharp, called potholes, while wide flat basins remain stable. This occurs because the preconditioner is updated sparsely as a slow exponential moving average from LLQR, capturing smoothed geometry, and the SAM perturbation probes curvature at a faster timescale on top of that geometry.

What carries the argument

The two-timescale structure of a slow LLQR-derived preconditioner maintained as an exponential moving average that supplies average loss geometry for a faster SAM perturbation to act on.

If this is right

The method produces consistent gains over both SAM and LLQR alone on standard vision and sequence modeling benchmarks.
The preconditioner selectively boosts escape from directions that look flat on average but are sharp at finer scale.
Wide flat basins stay stable under the combined updates rather than being destabilized.
Slow geometry learning and fast sharpness probing function as complementary mechanisms rather than alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same slow-fast separation could be tested with other adaptive or second-order optimizers to see if geometry awareness improves their sharpness handling.
If the low-resolution geometry picture holds, one could experiment with even slower update rates or different smoothing windows on very large models.
The pothole concept suggests checking whether similar local sharpness within flat regions appears in other optimization problems outside neural networks.

Load-bearing premise

The slow exponential moving average of the LLQR-derived preconditioner supplies a sufficiently accurate low-resolution picture of loss geometry that reliably distinguishes locally sharp potholes from stable flat basins.

What would settle it

A test on a synthetic loss surface containing explicit potholes and wide flat regions where LLQR+SAM shows no improvement in escaping the potholes relative to plain SAM would disprove the amplification effect.

Figures

Figures reproduced from arXiv: 2605.16134 by Aristide Baratin, Damien Scieur, Ioannis Mitliagkas, Mehrab Hamidi, Razvan Pascanu, Simon Dufort-Labb\'e.

**Figure 1.** Figure 1: Interaction between FSAM and LLQR on ResNet-50/ImageNet. Left: Top-1 error for SGDM and FSAM, a SAM variant, with and without LLQR. Although both SAM-style methods and LLQR can be interpreted as curvature-correcting mechanisms, their combination yields gains over either component alone, suggesting complementary effects. Right: Test loss versus elapsed training time. Despite the usual concern that second-or… view at source ↗

**Figure 2.** Figure 2: Toy sharp-well escape mechanism. The surface has a flat basin at the origin and a sharp annular basin near radius 5. All four optimizers use the same learning rate, and the SAM variants use the same radius, with starts chosen in the sharp basin, but not at minima. The non-SAM variants remain trapped, while the SAM variants leave the sharp well; the LLQR +SAM trajectory reaches the flat region with faster l… view at source ↗

**Figure 3.** Figure 3: Gradient-noise escape from the sharp minimum. All variants start at the bottom of the sharp well and receive a shared deterministic Gaussian perturbation schedule in the update gradient. The non-SAM variants remain near the sharp well, while SAM variants are ejected. At variance 10−9 , both SAM variants reach the flat basin, but LLQR +SAM has substantially shorter path length than Euclidean SAM, consistent… view at source ↗

**Figure 4.** Figure 4: IWSLT14 German-to-English convergence. Validation BLEU and token error curves for the fairseq Transformer benchmark. Pairing LLQR with SAM accelerates optimization while offering the modest best-performance gains reported in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: LLQR cadence sweep for ViT-B/16 and ViT-L/16. Each panel shows compile-adjusted [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Across ViT scales, preconditionerupdate time grows nearly linearly for both LLQR and LLQR+SAM, far from the quadratic scaling typically associated with second-order methods. We introduced LLQR+SAM, which pairs a slowly-updated LLQR preconditioner with a SAM perturbation evaluated and transported in the induced geometry. On a quadratic two-scale model the dynamics is closed-form: SAM prevents the iterate … view at source ↗

read the original abstract

Sharpness-aware minimization (SAM) encourages flat minima by perturbing parameters along directions of high loss curvature, but treats all parameter directions uniformly, ignoring the underlying loss geometry. We introduce LLQR+SAM, which combines SAM with a learned preconditioner obtained from the recently proposed LLQR framework, a second-order method that recasts steepest descent as a layerwise linear-quadratic regulator problem. The preconditioner is updated sparsely and maintained as a slow exponential moving average, so it captures a smoothed, low-resolution picture of the loss landscape geometry. The SAM perturbation then operates on top of this learned geometry, probing curvature at a faster timescale. We show that this two-timescale structure is not merely a computational convenience: theoretically, the preconditioner amplifies the SAM escape signal in directions that are flat under the average geometry but locally sharp (potholes). Wide, flat basins, by contrast, remain stable. Empirically, LLQR+SAM gives consistent gains over both SAM and LLQR alone across standard vision and sequence modeling benchmarks, supporting the view that slow learned geometry and fast sharpness correction are genuinely complementary.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLQR+SAM adds a slow LLQR EMA preconditioner to guide SAM perturbations toward local sharpness in average-flat directions, with empirical gains but weak support for the timescale separation.

read the letter

The main takeaway is that this paper adds a slow LLQR preconditioner to SAM so the sharpness perturbations can respect the average loss geometry. The preconditioner runs on a slow exponential moving average and is meant to highlight directions that look flat overall but have local sharpness, which they call potholes. What is new is the specific two-timescale setup where the learned geometry acts as a prior for the SAM step. The theory claims this amplifies the escape signal in those pothole directions while wide flat basins remain stable. On the practical side, they show consistent gains over SAM and LLQR alone across vision and sequence benchmarks. The work does well at framing the combination as complementary rather than just another tweak. Using LLQR's layerwise linear-quadratic regulator view to get a smoothed geometry picture is a reasonable way to inject second-order information without full Hessian costs. The soft spots are in the theory. The analysis assumes the slow EMA stays a stable low-resolution view even though it is updated from gradients that include the fast SAM perturbations. Without a bound on the feedback error or a fixed-point argument, the separation could break down and the preconditioner might incorporate local sharpness information. The abstract does not provide the derivation steps or explicit assumptions, which makes the amplification result hard to assess from the given text. This paper is for researchers working on optimizer improvements in deep learning, particularly those interested in sharpness-aware methods or preconditioning. A reader who follows SAM literature would get value from seeing how geometry awareness can be added with modest changes and some empirical upside. It deserves a serious referee because the idea is grounded in recent work and the experiments are on standard tasks. I recommend sending it for peer review, but ask the authors to add a perturbation analysis for the timescale separation.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LLQR+SAM, which augments sharpness-aware minimization (SAM) with a learned preconditioner derived from the LLQR framework. The preconditioner is updated sparsely via a slow exponential moving average to capture a smoothed view of loss geometry at a slower timescale, while SAM operates at a faster timescale. The central theoretical claim is that this two-timescale structure amplifies the SAM escape signal specifically in directions that are flat under the average geometry but locally sharp (potholes), whereas wide flat basins remain stable. Empirical results are reported to show consistent gains over SAM and LLQR alone on vision and sequence modeling benchmarks.

Significance. If the two-timescale separation can be rigorously justified, the work provides a concrete mechanism for making SAM geometry-aware without uniform treatment of directions, which could improve the discovery of flat minima in deep networks. The empirical consistency across benchmarks, if reproducible with full experimental details, would indicate that slow learned geometry and fast sharpness correction are complementary rather than redundant.

major comments (2)

[Section 3] Section 3 (theoretical analysis): the derivation treats the LLQR-derived preconditioner as fixed over the fast SAM timescale and claims amplification of escape signals in pothole directions, but provides no perturbation analysis, error bound, or fixed-point argument quantifying how much the slow EMA incorporates information from the fast SAM perturbations. This assumption is load-bearing for the central claim that the preconditioner reliably distinguishes locally sharp potholes from stable wide basins.
[Abstract and Section 3] Abstract and Section 3: the theoretical amplification result is stated without explicit assumptions, derivation steps, or the precise definition of the two-timescale separation; it is therefore unclear whether the result introduces independent grounding or reduces to quantities already present in the LLQR framework.

minor comments (2)

Provide the exact EMA decay rate, update frequency of the preconditioner, and full experimental protocol (including hyperparameter ranges and number of runs) so that the reported gains can be reproduced.
Define all LLQR-specific notation (e.g., the form of the preconditioner) at first use in the theoretical section to improve readability for readers unfamiliar with the base framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments on the theoretical analysis are well-taken and point to opportunities for strengthening the presentation of the two-timescale argument. We address each major comment below and commit to revisions that improve rigor and clarity without altering the core claims.

read point-by-point responses

Referee: [Section 3] Section 3 (theoretical analysis): the derivation treats the LLQR-derived preconditioner as fixed over the fast SAM timescale and claims amplification of escape signals in pothole directions, but provides no perturbation analysis, error bound, or fixed-point argument quantifying how much the slow EMA incorporates information from the fast SAM perturbations. This assumption is load-bearing for the central claim that the preconditioner reliably distinguishes locally sharp potholes from stable wide basins.

Authors: We agree that the current derivation would benefit from an explicit perturbation analysis to justify treating the preconditioner as approximately fixed. In the revised manuscript we will add a first-order perturbation argument showing that the contribution of fast SAM steps to the slow EMA update is bounded by the timescale separation ratio (specifically O(α / η), where α is the EMA decay and η the perturbation step size). This bound confirms that the preconditioner continues to reflect the averaged geometry at leading order, thereby preserving the differential amplification between pothole directions and wide basins. The added analysis will be placed in Section 3 with supporting calculations in an appendix. revision: yes
Referee: [Abstract and Section 3] Abstract and Section 3: the theoretical amplification result is stated without explicit assumptions, derivation steps, or the precise definition of the two-timescale separation; it is therefore unclear whether the result introduces independent grounding or reduces to quantities already present in the LLQR framework.

Authors: We acknowledge that the assumptions and derivation steps can be stated more explicitly. The two-timescale separation is defined as the preconditioner update frequency being at least an order of magnitude slower than the SAM perturbation frequency, with the EMA decay factor satisfying α ≪ 1. Under this separation the amplification result follows from a direct expansion of the preconditioned sharpness term, which isolates an extra positive contribution precisely in directions that are flat on average but locally sharp. This interaction term is not present in the original LLQR analysis and therefore supplies independent grounding. In the revision we will list all assumptions at the beginning of Section 3, expand the derivation with intermediate steps, and move supporting lemmas to an appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper's central theoretical claim—that the slow EMA preconditioner from the LLQR framework amplifies the SAM escape signal specifically in directions flat on average but locally sharp—is presented as a new derivation in Section 3 that exploits the two-timescale separation. This does not reduce by construction to a self-definition, a fitted parameter, or an unverified self-citation chain; the LLQR framework is invoked as an external base method whose properties are used to ground the new amplification result, while the paper supplies its own analysis of the interaction with SAM perturbations. Empirical gains on vision and sequence benchmarks supply independent external support. No load-bearing step equates the claimed prediction to its inputs by definition or forces the outcome through renaming or ansatz smuggling.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the LLQR framework as a black-box source of geometry and on the modeling choice that a slow EMA supplies a useful average geometry distinct from instantaneous curvature.

free parameters (1)

EMA decay rate for preconditioner
Controls how slowly the low-resolution geometry picture is updated; its value is chosen to separate timescales.

axioms (1)

domain assumption Loss landscape admits a meaningful separation between average geometry (captured by slow EMA) and local curvature (probed by fast SAM).
Invoked to justify why the preconditioner amplifies escape only in pothole directions.

pith-pipeline@v0.9.0 · 5747 in / 1287 out tokens · 51589 ms · 2026-05-20T19:34:43.879465+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the preconditioner amplifies the SAM escape signal in directions that are flat under the average geometry but locally sharp (potholes). Wide, flat basins, by contrast, remain stable.
IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

two-timescale structure... slow exponential moving average... fast timescale

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

9th International Conference on Learning Representations,

Pierre Foret and Ariel Kleiner and Hossein Mobahi and Behnam Neyshabur , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021
[2]

International Conference on Machine Learning,

Yang Zhao and Hao Zhang and Xiuyuan Hu , title =. International Conference on Machine Learning,. 2022 , url =

work page 2022
[3]

2026 , eprint=

Layerwise LQR for Geometry-Aware Optimization of Deep Networks , author=. 2026 , eprint=

work page 2026
[4]

Hospedales , title =

Minyoung Kim and Da Li and Shell Xu Hu and Timothy M. Hospedales , title =. International Conference on Machine Learning,. 2022 , url =

work page 2022
[5]

Riemannian

Jihun Yun and Eunho Yang , editor =. Riemannian. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

work page 2023
[6]

Peter Holderrieth, Yilun Xu, and Tommi Jaakkola

Sepp Hochreiter and J. Flat Minima , journal =. 1997 , url =. doi:10.1162/neco.1997.9.1.1 , biburl =

work page doi:10.1162/neco.1997.9.1.1 1997
[7]

5th International Conference on Learning Representations,

Nitish Shirish Keskar and Dheevatsa Mudigere and Jorge Nocedal and Mikhail Smelyanskiy and Ping Tak Peter Tang , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017
[8]

Proceedings of the 34th International Conference on Machine Learning,

Laurent Dinh and Razvan Pascanu and Samy Bengio and Yoshua Bengio , title =. Proceedings of the 34th International Conference on Machine Learning,. 2017 , url =

work page 2017
[9]

Proceedings of the 38th International Conference on Machine Learning,

Jungmin Kwon and Jeongseop Kim and Hyunseo Park and In Kwon Choi , title =. Proceedings of the 38th International Conference on Machine Learning,. 2021 , url =

work page 2021
[10]

Emogen: Emotional image content generation with text-to-image diffusion models,

Tao Li and Pan Zhou and Zhengbao He and Xinwen Cheng and Xiaolin Huang , title =. 2024 , url =. doi:10.1109/CVPR52733.2024.00538 , timestamp =

work page doi:10.1109/cvpr52733.2024.00538 2024
[11]

Natural Gradient Works Efficiently in Learning , journal =

Shun. Natural Gradient Works Efficiently in Learning , journal =. 1998 , url =. doi:10.1162/089976698300017746 , timestamp =

work page doi:10.1162/089976698300017746 1998
[12]

Grosse , title =

James Martens and Roger B. Grosse , title =. Proceedings of the 32nd International Conference on Machine Learning,. 2015 , url =

work page 2015
[13]

Proceedings of the 35th International Conference on Machine Learning,

Vineet Gupta and Tomer Koren and Yoram Singer , title =. Proceedings of the 35th International Conference on Machine Learning,. 2018 , url =

work page 2018
[14]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kaijie Zhu and Xixu Hu and Jindong Wang and Xing Xie and Ge Yang , title =. 2023 , url =. doi:10.1109/ICCV51070.2023.00408 , timestamp =

work page doi:10.1109/iccv51070.2023.00408 2023
[15]

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages =

Behnam Neyshabur and Ruslan Salakhutdinov and Nathan Srebro , title =. Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages =. 2015 , url =

work page 2015
[16]

Spectral Norm Regularization for Improving the Generalizability of Deep Learning

Yuichi Yoshida and Takeru Miyato , title =. CoRR , volume =. 2017 , url =. 1705.10941 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr

Aladin Virmaux and Kevin Scaman , title =. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr. 2018 , timestamp =

work page 2018
[18]

Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr

Hao Li and Zheng Xu and Gavin Taylor and Christoph Studer and Tom Goldstein , title =. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr. 2018 , timestamp =

work page 2018
[19]

Alex Krizhevsky and Geoffrey Hinton , title =

work page
[20]

3rd International Conference on Learning Representations,

Karen Simonyan and Andrew Zisserman , title =. 3rd International Conference on Learning Representations,. 2015 , url =

work page 2015
[21]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =. 2016. 2016 , url =. doi:10.1109/CVPR.2016.90 , timestamp =

work page doi:10.1109/cvpr.2016.90 2016
[22]

Proceedings of the British Machine Vision Conference 2016,

Sergey Zagoruyko and Nikos Komodakis , title =. Proceedings of the British Machine Vision Conference 2016,. 2016 , url =

work page 2016
[23]

Dongyoon Han and Jiwhan Kim and Junmo Kim , title =. 2017. 2017 , url =. doi:10.1109/CVPR.2017.668 , timestamp =

work page doi:10.1109/cvpr.2017.668 2017
[24]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance Devries and Graham W. Taylor , title =. CoRR , volume =. 2017 , url =. 1708.04552 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Giannakis , title =

Bingcong Li and Georgios B. Giannakis , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

work page 2023
[26]

Peng Mi and Li Shen and Tianhe Ren and Yiyi Zhou and Xiaoshuai Sun and Rongrong Ji and Dacheng Tao , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 , year =

work page 2022
[27]

Proceedings of the Third Conference on Machine Translation: Research Papers,

Myle Ott and Sergey Edunov and David Grangier and Michael Auli , title =. Proceedings of the Third Conference on Machine Translation: Research Papers,. 2018 , url =. doi:10.18653/V1/W18-6301 , timestamp =

work page doi:10.18653/v1/w18-6301 2018
[28]

, author Dong, W

Jia Deng and Wei Dong and Richard Socher and Li. ImageNet:. 2009. 2009 , url =. doi:10.1109/CVPR.2009.5206848 , timestamp =

work page doi:10.1109/cvpr.2009.5206848 2009
[29]

9th International Conference on Learning Representations,

Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021

[1] [1]

9th International Conference on Learning Representations,

Pierre Foret and Ariel Kleiner and Hossein Mobahi and Behnam Neyshabur , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021

[2] [2]

International Conference on Machine Learning,

Yang Zhao and Hao Zhang and Xiuyuan Hu , title =. International Conference on Machine Learning,. 2022 , url =

work page 2022

[3] [3]

2026 , eprint=

Layerwise LQR for Geometry-Aware Optimization of Deep Networks , author=. 2026 , eprint=

work page 2026

[4] [4]

Hospedales , title =

Minyoung Kim and Da Li and Shell Xu Hu and Timothy M. Hospedales , title =. International Conference on Machine Learning,. 2022 , url =

work page 2022

[5] [5]

Riemannian

Jihun Yun and Eunho Yang , editor =. Riemannian. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

work page 2023

[6] [6]

Peter Holderrieth, Yilun Xu, and Tommi Jaakkola

Sepp Hochreiter and J. Flat Minima , journal =. 1997 , url =. doi:10.1162/neco.1997.9.1.1 , biburl =

work page doi:10.1162/neco.1997.9.1.1 1997

[7] [7]

5th International Conference on Learning Representations,

Nitish Shirish Keskar and Dheevatsa Mudigere and Jorge Nocedal and Mikhail Smelyanskiy and Ping Tak Peter Tang , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017

[8] [8]

Proceedings of the 34th International Conference on Machine Learning,

Laurent Dinh and Razvan Pascanu and Samy Bengio and Yoshua Bengio , title =. Proceedings of the 34th International Conference on Machine Learning,. 2017 , url =

work page 2017

[9] [9]

Proceedings of the 38th International Conference on Machine Learning,

Jungmin Kwon and Jeongseop Kim and Hyunseo Park and In Kwon Choi , title =. Proceedings of the 38th International Conference on Machine Learning,. 2021 , url =

work page 2021

[10] [10]

Emogen: Emotional image content generation with text-to-image diffusion models,

Tao Li and Pan Zhou and Zhengbao He and Xinwen Cheng and Xiaolin Huang , title =. 2024 , url =. doi:10.1109/CVPR52733.2024.00538 , timestamp =

work page doi:10.1109/cvpr52733.2024.00538 2024

[11] [11]

Natural Gradient Works Efficiently in Learning , journal =

Shun. Natural Gradient Works Efficiently in Learning , journal =. 1998 , url =. doi:10.1162/089976698300017746 , timestamp =

work page doi:10.1162/089976698300017746 1998

[12] [12]

Grosse , title =

James Martens and Roger B. Grosse , title =. Proceedings of the 32nd International Conference on Machine Learning,. 2015 , url =

work page 2015

[13] [13]

Proceedings of the 35th International Conference on Machine Learning,

Vineet Gupta and Tomer Koren and Yoram Singer , title =. Proceedings of the 35th International Conference on Machine Learning,. 2018 , url =

work page 2018

[14] [14]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kaijie Zhu and Xixu Hu and Jindong Wang and Xing Xie and Ge Yang , title =. 2023 , url =. doi:10.1109/ICCV51070.2023.00408 , timestamp =

work page doi:10.1109/iccv51070.2023.00408 2023

[15] [15]

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages =

Behnam Neyshabur and Ruslan Salakhutdinov and Nathan Srebro , title =. Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada , pages =. 2015 , url =

work page 2015

[16] [16]

Spectral Norm Regularization for Improving the Generalizability of Deep Learning

Yuichi Yoshida and Takeru Miyato , title =. CoRR , volume =. 2017 , url =. 1705.10941 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr

Aladin Virmaux and Kevin Scaman , title =. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr. 2018 , timestamp =

work page 2018

[18] [18]

Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr

Hao Li and Zheng Xu and Gavin Taylor and Christoph Studer and Tom Goldstein , title =. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr. 2018 , timestamp =

work page 2018

[19] [19]

Alex Krizhevsky and Geoffrey Hinton , title =

work page

[20] [20]

3rd International Conference on Learning Representations,

Karen Simonyan and Andrew Zisserman , title =. 3rd International Conference on Learning Representations,. 2015 , url =

work page 2015

[21] [21]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =. 2016. 2016 , url =. doi:10.1109/CVPR.2016.90 , timestamp =

work page doi:10.1109/cvpr.2016.90 2016

[22] [22]

Proceedings of the British Machine Vision Conference 2016,

Sergey Zagoruyko and Nikos Komodakis , title =. Proceedings of the British Machine Vision Conference 2016,. 2016 , url =

work page 2016

[23] [23]

Dongyoon Han and Jiwhan Kim and Junmo Kim , title =. 2017. 2017 , url =. doi:10.1109/CVPR.2017.668 , timestamp =

work page doi:10.1109/cvpr.2017.668 2017

[24] [24]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance Devries and Graham W. Taylor , title =. CoRR , volume =. 2017 , url =. 1708.04552 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Giannakis , title =

Bingcong Li and Georgios B. Giannakis , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

work page 2023

[26] [26]

Peng Mi and Li Shen and Tianhe Ren and Yiyi Zhou and Xiaoshuai Sun and Rongrong Ji and Dacheng Tao , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 , year =

work page 2022

[27] [27]

Proceedings of the Third Conference on Machine Translation: Research Papers,

Myle Ott and Sergey Edunov and David Grangier and Michael Auli , title =. Proceedings of the Third Conference on Machine Translation: Research Papers,. 2018 , url =. doi:10.18653/V1/W18-6301 , timestamp =

work page doi:10.18653/v1/w18-6301 2018

[28] [28]

, author Dong, W

Jia Deng and Wei Dong and Richard Socher and Li. ImageNet:. 2009. 2009 , url =. doi:10.1109/CVPR.2009.5206848 , timestamp =

work page doi:10.1109/cvpr.2009.5206848 2009

[29] [29]

9th International Conference on Learning Representations,

Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , title =. 9th International Conference on Learning Representations,. 2021 , url =

work page 2021