Integrable Elasticity via Neural Demand Potentials

Carlos Heredia; Daniel Roncel

arxiv: 2605.22820 · v1 · pith:JTVQC7L4new · submitted 2026-05-21 · 💻 cs.LG

Integrable Elasticity via Neural Demand Potentials

Carlos Heredia , Daniel Roncel This is my paper

Pith reviewed 2026-05-22 06:28 UTC · model grok-4.3

classification 💻 cs.LG

keywords demand estimationneural networksprice elasticitiesretail demandcross-price effectsintegrable modelsmultiproduct pricingscanner data

0 comments

The pith

Neural network learns smooth log-demand to derive exact, stable elasticities

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes modeling multiproduct retail demand by training a neural network to output log-demand as a smooth function of log-prices, where the function is also conditioned on context. Elasticities are then obtained by taking derivatives of this single learned surface, which enforces consistency across own-price and cross-price responses. On the Dominick's beer dataset the resulting model produces better predictions on held-out observations than a standard directed log-log regression and returns more stable estimates for cross-price effects that are hard to identify from data alone. A reader would care because retailers rely on these numbers for pricing decisions that account for substitution between products.

Core claim

The Integrable Context-Dependent Demand Network learns log-demand as a smooth, context-conditioned function of log-prices, allowing elasticities to be derived exactly from the learned demand surface. On the Dominick's beer dataset, ICDN improves out-of-sample generalization over a directed log-log benchmark and yields more stable, economically plausible elasticity estimates, especially for weakly identified cross-price effects.

What carries the argument

The Integrable Context-Dependent Demand Network (ICDN), a neural model that represents log-demand directly as a function of log-prices and context so that all elasticities follow from differentiation of one consistent surface.

If this is right

Out-of-sample demand predictions improve relative to directed log-log models.
Cross-price elasticity estimates gain stability when identification from data is weak.
Derived elasticities align more closely with economic expectations for substitution patterns.
A single demand surface supplies all own-price and cross-price responses without separate regressions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be applied to scanner data from other product categories to check whether stability gains hold beyond beer.
If the integrability property scales, neural demand surfaces might replace parametric systems in larger industrial-organization datasets.
Adding richer context such as promotions or seasonal indicators would test whether the smoothness constraint still yields usable elasticities.

Load-bearing premise

The neural network can learn a sufficiently accurate and smooth log-demand surface from the available data such that the derived elasticities remain stable and economically meaningful rather than reflecting model artifacts or overfitting.

What would settle it

Applying ICDN and the log-log benchmark to the Dominick's beer dataset and finding neither improved out-of-sample prediction accuracy nor reduced instability in cross-price elasticity estimates would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2605.22820 by Carlos Heredia, Daniel Roncel.

**Figure 2.** Figure 2: ICDN forward pass. Context tokens are encoded into one latent representation per SKU, which generate own-price response terms, sparse attention-weighted cross-price response terms, and the parameters of the structured demand potential. Spline bases and analytic derivatives of log-prices are combined with these parameters to produce log-demand predictions and elasticities by exact differentiation. 19 [PITH… view at source ↗

**Figure 3.** Figure 3: Generalization comparison between ICDN and the benchmark. Left: fold-level [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗

**Figure 4.** Figure 4: Own-price elasticity stability diagnostics. Left: bootstrap confidence interval width for ICDN [PITH_FULL_IMAGE:figures/full_fig_p031_4.png] view at source ↗

**Figure 5.** Figure 5: Cross-price elasticity diagnostics. Top left: distribution of bootstrap mean cross-price elasticities [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗

**Figure 6.** Figure 6: Own-price resampling-based uncertainty diagnostics for bootstrap confidence intervals. Left: [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗

read the original abstract

We propose the Integrable Context-Dependent Demand Network (ICDN), a demand-first neural model for multiproduct retail demand. The model learns log-demand as a smooth, context-conditioned function of log-prices, allowing elasticities to be derived exactly from the learned demand surface. On the Dominick's beer dataset, ICDN improves out-of-sample generalization over a directed log-log benchmark and yields more stable, economically plausible elasticity estimates, especially for weakly identified cross-price effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Integrable Context-Dependent Demand Network (ICDN), a neural model that learns log-demand as a smooth, context-conditioned function of log-prices for multiproduct retail settings. Elasticities are obtained exactly via differentiation of the learned surface. On the Dominick's beer dataset, ICDN is claimed to improve out-of-sample generalization relative to a directed log-log benchmark while producing more stable and economically plausible elasticity estimates, especially for weakly identified cross-price effects.

Significance. If the central claims hold after verification, the work provides a practical route to enforcing integrability in flexible neural demand models, which could improve elasticity estimation in retail data where cross-price identification is challenging. The approach leverages demand potentials to maintain theoretical consistency while retaining neural flexibility.

major comments (2)

[§4] §4 (Empirical results on Dominick's beer): No empirical check is reported that the derived own- and cross-price elasticities satisfy Slutsky symmetry (or negative semi-definiteness of the Slutsky matrix) on held-out price vectors. This verification is load-bearing for the claim that integrability yields more plausible elasticities rather than artifacts of architecture or regularization.
[§3] §3 (Model description): The architecture, loss function, regularization, and training procedure for the neural log-demand surface are described only at a high level. Without these details it is impossible to assess whether the smoothness required for stable differentiation is reliably achieved or whether finite-sample training preserves the integrability property.

minor comments (2)

The abstract and introduction would benefit from an explicit comparison to existing integrable demand systems in the econometrics literature (e.g., those based on indirect utility or expenditure functions).
[Results tables] Results tables should report standard errors or confidence intervals for the reported gains in out-of-sample fit and elasticity stability to allow assessment of statistical significance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects for strengthening the empirical validation and reproducibility of the ICDN model. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [§4] §4 (Empirical results on Dominick's beer): No empirical check is reported that the derived own- and cross-price elasticities satisfy Slutsky symmetry (or negative semi-definiteness of the Slutsky matrix) on held-out price vectors. This verification is load-bearing for the claim that integrability yields more plausible elasticities rather than artifacts of architecture or regularization.

Authors: We agree that verifying Slutsky symmetry and negative semi-definiteness on held-out price vectors is a valuable addition to substantiate the benefits of the integrability constraint. In the revised manuscript, we will add this analysis by sampling a set of held-out price vectors from the Dominick's beer dataset, deriving the corresponding Slutsky matrices via automatic differentiation of the learned demand surface, and reporting quantitative metrics such as the mean absolute deviation from symmetry across off-diagonal elements and the fraction of negative eigenvalues to confirm negative semi-definiteness. This will provide direct evidence that the observed improvements in elasticity stability are tied to the theoretical properties rather than incidental effects of the architecture. revision: yes
Referee: [§3] §3 (Model description): The architecture, loss function, regularization, and training procedure for the neural log-demand surface are described only at a high level. Without these details it is impossible to assess whether the smoothness required for stable differentiation is reliably achieved or whether finite-sample training preserves the integrability property.

Authors: We accept that the current presentation in §3 is insufficiently detailed for full reproducibility and assessment. The revised manuscript will expand §3 with complete specifications, including the precise neural network architecture (layer counts, hidden dimensions, and activation functions chosen to promote smoothness), the full loss function (including the primary demand-fitting term and any explicit regularization components such as gradient penalties or Hessian regularization to enforce smoothness), and the training details (optimizer, learning-rate schedule, epoch count, batch size, and any mechanisms such as constrained optimization or post-training projection used to maintain the integrability property in finite samples). These additions will enable readers to evaluate the reliability of the differentiation step and the preservation of theoretical consistency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; elasticities derived from data-fitted neural surface

full rationale

The paper's core construction fits a neural network to learn a smooth log-demand surface from observed data on the Dominick's dataset, then obtains elasticities by exact differentiation of that surface. This is a standard supervised learning pipeline with no reduction of the claimed predictions to the inputs by construction. Integrability is imposed architecturally via demand potentials rather than being asserted as an empirical outcome that is then smuggled back in. No self-citation load-bearing steps, fitted-input-as-prediction patterns, or ansatz smuggling appear in the derivation chain. The out-of-sample generalization and stability comparisons to the log-log benchmark constitute independent empirical content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Ledger entries are inferred from the abstract description only; full paper may introduce additional fitted elements or background assumptions.

free parameters (1)

Neural network weights and biases
Parameters fitted to the sales data to approximate the log-demand surface.

axioms (1)

domain assumption Log-demand can be represented as a smooth, context-conditioned function of log-prices that a neural network can learn accurately enough for derivative-based elasticities to be reliable.
This premise is required for the exact-derivation property and the claim of improved stability.

invented entities (1)

ICDN (Integrable Context-Dependent Demand Network) no independent evidence
purpose: Neural architecture that enforces integrability so elasticities follow directly from the demand surface.
New model introduced in the work; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5590 in / 1397 out tokens · 58863 ms · 2026-05-22T06:28:26.073508+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

If one models log-demand directly and defines elasticities by differentiation, Eij=∂lnpj lnv i, then ωi=dlnv i holds by construction, and hence dωi=0 follows automatically under the required smoothness conditions.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 2 … bωθ,i(u,x) is exact … the row-wise closure conditions hold automatically

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

124 extracted references · 124 canonical work pages · 4 internal anchors

[1]

Four transformations on the Catalan triangle

Sean J. Taylor and Benjamin Letham , title =. The American Statistician , volume =. 2018 , publisher =. doi:10.1080/00031305.2017.1380080 , eprint =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2017.1380080 2018
[2]

Journal of Machine Learning Research , volume=

Dimensionality reduction: A comparative review , author=. Journal of Machine Learning Research , volume=

work page
[3]

2014 , note =

Categorical Principal Component Logistic Regression: A Case Study for Housing Loan Approval , journal =. 2014 , note =. doi:https://doi.org/10.1016/j.sbspro.2013.12.537 , author =

work page doi:10.1016/j.sbspro.2013.12.537 2014
[4]

Machine learning , volume=

An introduction to MCMC for machine learning , author=. Machine learning , volume=. 2003 , publisher=

work page 2003
[5]

and Kwiatkowski, Ariel and Balis, John U

Towers, Mark and Terry, Jordan K. and Kwiatkowski, Ariel and Balis, John U. and Cola, Gianluca de and Deleu, Tristan and Goulão, Manuel and Kallinteris, Andreas and KG, Arjun and Krimmel, Markus and Perez-Vicente, Rodrigo and Pierré, Andrea and Schulhoff, Sander and Tai, Jun Jet and Shen, Andrew Tan Jin and Younis, Omar G. , month = mar, year =. Gymnasium...

work page
[6]

Journal of Machine Learning Research , year =

Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann , title =. Journal of Machine Learning Research , year =

work page
[7]

2012 , publisher=

Handbook of Markov Decision Processes: Methods and Applications , author=. 2012 , publisher=

work page 2012
[8]

Thirty-Fifth Conference on Neural Information Processing Systems , year=

Noether Networks: meta-learning useful conserved quantities , author=. Thirty-Fifth Conference on Neural Information Processing Systems , year=

work page
[9]

2021 , eprint=

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , author=. 2021 , eprint=

work page 2021
[10]

2021 , eprint=

Noether: The More Things Change, the More Stay the Same , author=. 2021 , eprint=

work page 2021
[11]

Advances in Neural Information Processing Systems , volume=

Noether’s learning dynamics: Role of symmetry breaking in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

The Journal of Machine Learning Research , volume=

A general system of differential equations to model first-order adaptive algorithms , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

work page 2020
[13]

Advances in Neural Information Processing Systems , volume=

On the SDEs and scaling rules for adaptive gradient algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

proceedings of the National Academy of Sciences , volume=

A variational perspective on accelerated methods in optimization , author=. proceedings of the National Academy of Sciences , volume=. 2016 , publisher=

work page 2016
[15]

AdamW Optimizer , howpublished =

work page
[16]

Decoupled Weight Decay Regularization

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

International Conference on Machine Learning , pages=

Scaling vision transformers to 22 billion parameters , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[19]

Wortsman, P

Small-scale proxies for large-scale Transformer training instabilities , author=. arXiv preprint arXiv:2309.14322 , year=

work page arXiv
[20]

and Srivastava, Santosh and Gupta, Maya R

Frigyik, Bela A. and Srivastava, Santosh and Gupta, Maya R. , booktitle=. Functional Bregman divergence , year=

work page
[21]

2023 , eprint=

Nonlocal Lagrangian formalism , author=. 2023 , eprint=

work page 2023
[22]

Non-local Lagrangian mechanics: Noether’s theorem and Hamiltonian formalism , volume=

Heredia, Carlos and Llosa, Josep , year=. Non-local Lagrangian mechanics: Noether’s theorem and Hamiltonian formalism , volume=. Journal of Physics A: Mathematical and Theoretical , publisher=. doi:10.1088/1751-8121/ac265c , number=

work page doi:10.1088/1751-8121/ac265c
[23]

Mechanics with fractional derivatives , author =. Phys. Rev. E , volume =. 1997 , month =

work page 1997
[24]

Nonconservative Lagrangian and Hamiltonian mechanics , author =. Phys. Rev. E , volume =. 1996 , month =

work page 1996
[25]

and Avkar, T

Baleanu, D. and Avkar, T. , year=. Lagrangians with linear velocities within Riemann-Liouville fractional derivatives , volume=. Il Nuovo Cimento B , publisher=. doi:10.1393/ncb/i2003-10062-y , number=

work page doi:10.1393/ncb/i2003-10062-y
[26]

2008 , issn =

New applications of fractional variational principles , journal =. 2008 , issn =. doi:https://doi.org/10.1016/S0034-4877(08)80007-9 , author =

work page doi:10.1016/s0034-4877(08)80007-9 2008
[27]

2002 , issn =

Formulation of Euler–Lagrange equations for fractional variational problems , journal =. 2002 , issn =. doi:https://doi.org/10.1016/S0022-247X(02)00180-4 , author =

work page doi:10.1016/s0022-247x(02)00180-4 2002
[28]

2021 , eprint=

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics , author=. 2021 , eprint=

work page 2021
[29]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

work page 2019
[30]

Journal of Machine Learning Research , volume=

Continuous time analysis of momentum methods , author=. Journal of Machine Learning Research , volume=

work page
[31]

2015 , eprint=

On Accelerated Methods in Optimization , author=. 2015 , eprint=

work page 2015
[32]

Learning internal representations by error propagation , author=

work page
[33]

Large-Scale Machine Learning with Stochastic Gradient Descent

Bottou, L \'e on. Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT'2010. 2010

work page 2010
[34]

Journal of Machine Learning Research , year =

John Duchi and Elad Hazan and Yoram Singer , title =. Journal of Machine Learning Research , year =

work page
[35]

An overview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms , author=. arXiv preprint arXiv:1609.04747 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

2012 , howpublished =

Geoffrey Hinton , title =. 2012 , howpublished =

work page 2012
[37]

Action Principle and Nonlocal Field Theories , author =. Phys. Rev. D , volume =. 1973 , month =

work page 1973
[38]

and Llosa, J

Jaen, X. and Llosa, J. and Molina, A. A Reduction of order two for infinite order lagrangians. Phys. Rev. D. 1986. doi:10.1103/PhysRevD.34.2302

work page doi:10.1103/physrevd.34.2302 1986
[39]

2011 , publisher=

Generalized Classical Mechanics and Field Theory: A Geometrical Approach of Lagrangian and Hamiltonian Formalisms Involving Higher Order Derivatives , author=. 2011 , publisher=

work page 2011
[40]

1976 , publisher=

Mechanics: Volume 1 , author=. 1976 , publisher=

work page 1976
[41]

Energy-momentum tensor for the electromagnetic field in a dispersive medium , volume=

Heredia, Carlos and Llosa, Josep , year=. Energy-momentum tensor for the electromagnetic field in a dispersive medium , volume=. Journal of Physics Communications , publisher=. doi:10.1088/2399-6528/abfd14 , number=

work page doi:10.1088/2399-6528/abfd14
[42]

Nonlocal Lagrangian fields and the second Noether theorem

Heredia, Carlos and Llosa, Josep , year=. Nonlocal Lagrangian fields and the second Noether theorem. Non-commutative U(1) gauge theory , volume=. Journal of High Energy Physics , publisher=. doi:10.1007/jhep04(2024)021 , number=

work page doi:10.1007/jhep04(2024)021 2024
[43]

Ostrogradskii, M , title =. Mem. Acad. St. Petersburg , volume =. 1850 , pages =

work page
[44]

Vladimirov, V. S. , booktitle =. Generalized functions in mathematical physics , year =

work page
[45]

2006 , publisher =

Josep Peñarrocha Gantes and Arcadi Santamaria and Jordi Vidal , title =. 2006 , publisher =

work page 2006
[46]

2021 , publisher=

Handbook of differential equations , author=. 2021 , publisher=

work page 2021
[47]

, date-added =

Noether, E. , date-added =. Invariante Variationsprobleme , volume =. Nachrichten von der Gesellschaft der Wissenschaften zu G. 1918 , bdsk-url-1 =

work page 1918
[48]

2004 , publisher=

Convex optimization , author=. 2004 , publisher=

work page 2004
[49]

Nonlocal Lagrangian fields: Noether’s theorem and Hamiltonian formalism , volume=

Heredia, Carlos and Llosa, Josep , year=. Nonlocal Lagrangian fields: Noether’s theorem and Hamiltonian formalism , volume=. Physical Review D , publisher=. doi:10.1103/physrevd.105.126002 , number=

work page doi:10.1103/physrevd.105.126002
[50]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Neural Ordinary Differential Equations , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[51]

1982 , publisher=

Analysis, Manifolds and Physics Revised Edition , author=. 1982 , publisher=

work page 1982
[52]

Brown and M

A. Brown and M. C. Bartholomew-Biggs , title =. Journal of Optimization Theory and Applications , volume =

work page
[53]

, booktitle=

Romero, Orlando and Benosman, Mouhacine and Pappas, George J. , booktitle=. ODE Discretization Schemes as Optimization Algorithms , year=

work page
[54]

1970 , publisher=

Lectures in Analytical Mechanics: Translated from the Russian by George Yankovsky , author=. 1970 , publisher=

work page 1970
[55]

H. K. Khalil , title =. 2002 , address =

work page 2002
[56]

Soviet Mathematics Doklady , volume=

A method of solving a convex programming problem with convergence rate o(1/k^2) , author=. Soviet Mathematics Doklady , volume=

work page
[57]

Weijie Su and Stephen Boyd and Emmanuel J. Cand. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , journal =. 2016 , volume =

work page 2016
[58]

2022 , eprint=

Neural Integro-Differential Equations , author=. 2022 , eprint=

work page 2022
[59]

Advances in neural information processing systems , volume=

Neural ordinary differential equations , author=. Advances in neural information processing systems , volume=

work page
[60]

SIAM Journal on Financial Mathematics , volume=

Stochastic gradient descent in continuous time , author=. SIAM Journal on Financial Mathematics , volume=. 2017 , publisher=

work page 2017
[61]

2014 , issn =

IDSOLVER: A general purpose solver for nth-order integro-differential equations , journal =. 2014 , issn =. doi:doi.org/10.1016/j.cpc.2013.09.008 , author =

work page doi:10.1016/j.cpc.2013.09.008 2014
[62]

2024 , eprint=

Are nonlocal Lagrangian systems fatally unstable? , author=. 2024 , eprint=

work page 2024
[63]

Proceedings of the 34th International Conference on Machine Learning , pages =

Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017
[64]

Communications in Mathematics and Statistics , year =

Weinan E , title =. Communications in Mathematics and Statistics , year =

work page
[65]

2018 , journal =

Michael Betancourt , title =. 2018 , journal =

work page 2018
[66]

Journal of Mathematical Imaging and Vision , year =

Lars Ruthotto and Eldad Haber , title =. Journal of Mathematical Imaging and Vision , year =

work page
[67]

Advances in neural information processing systems , volume=

Hamiltonian neural networks , author=. Advances in neural information processing systems , volume=

work page
[68]

Lagrangian neural networks,

Lagrangian neural networks , author=. arXiv preprint arXiv:2003.04630 , year=

work page arXiv 2003
[69]

Advances in Neural Information Processing Systems , volume=

Noether networks: meta-learning useful conserved quantities , author=. Advances in Neural Information Processing Systems , volume=

work page
[70]

1976 , isbn =

Walter Rudin , title =. 1976 , isbn =

work page 1976
[71]

Bauschke and Patrick L

Heinz H. Bauschke and Patrick L. Combettes , title =. 2017 , doi =

work page 2017
[72]

1998 , booktitle =

Chapter One - Linear Integral Inequalities , editor =. 1998 , booktitle =. doi:https://doi.org/10.1016/S0076-5392(98)80003-9 , author =

work page doi:10.1016/s0076-5392(98)80003-9 1998
[73]

2004 , isbn =

Yurii Nesterov , title =. 2004 , isbn =

work page 2004
[74]

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- ojasiewicz Condition

Karimi, Hamed and Nutini, Julie and Schmidt, Mark. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- ojasiewicz Condition. Machine Learning and Knowledge Discovery in Databases. 2016

work page 2016
[75]

Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-

Attouch, H\'. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-. Math. Oper. Res. , month = may, pages =. 2010 , issue_date =. doi:10.1287/moor.1100.0449 , abstract =

work page doi:10.1287/moor.1100.0449 2010
[76]

Lipschitz Functions , subtitle =

Ştefan Cobzaş and Radu Miculescu and Adriana Nicolae , series =. Lipschitz Functions , subtitle =. 2019 , isbn =. doi:10.1007/978-3-030-16489-8 , pages =

work page doi:10.1007/978-3-030-16489-8 2019
[77]

Variations on Barbălat’s Lemma , volume=

Bálint Farkas and Sven-Ake Wegner , year=. Variations on Barbălat’s Lemma , volume=. The American Mathematical Monthly , publisher=. doi:10.4169/amer.math.monthly.123.8.825 , number=

work page doi:10.4169/amer.math.monthly.123.8.825
[78]

Barbalat , title =

I. Barbalat , title =. Revue Math. 1959 , pages =

work page 1959
[79]

Journal of Machine Learning Research , volume=

Adaptive subgradient methods for online learning and stochastic optimization , author=. Journal of Machine Learning Research , volume=

work page
[80]

2025 , eprint=

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations , author=. 2025 , eprint=

work page 2025

Showing first 80 references.

[1] [1]

Four transformations on the Catalan triangle

Sean J. Taylor and Benjamin Letham , title =. The American Statistician , volume =. 2018 , publisher =. doi:10.1080/00031305.2017.1380080 , eprint =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2017.1380080 2018

[2] [2]

Journal of Machine Learning Research , volume=

Dimensionality reduction: A comparative review , author=. Journal of Machine Learning Research , volume=

work page

[3] [3]

2014 , note =

Categorical Principal Component Logistic Regression: A Case Study for Housing Loan Approval , journal =. 2014 , note =. doi:https://doi.org/10.1016/j.sbspro.2013.12.537 , author =

work page doi:10.1016/j.sbspro.2013.12.537 2014

[4] [4]

Machine learning , volume=

An introduction to MCMC for machine learning , author=. Machine learning , volume=. 2003 , publisher=

work page 2003

[5] [5]

and Kwiatkowski, Ariel and Balis, John U

Towers, Mark and Terry, Jordan K. and Kwiatkowski, Ariel and Balis, John U. and Cola, Gianluca de and Deleu, Tristan and Goulão, Manuel and Kallinteris, Andreas and KG, Arjun and Krimmel, Markus and Perez-Vicente, Rodrigo and Pierré, Andrea and Schulhoff, Sander and Tai, Jun Jet and Shen, Andrew Tan Jin and Younis, Omar G. , month = mar, year =. Gymnasium...

work page

[6] [6]

Journal of Machine Learning Research , year =

Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann , title =. Journal of Machine Learning Research , year =

work page

[7] [7]

2012 , publisher=

Handbook of Markov Decision Processes: Methods and Applications , author=. 2012 , publisher=

work page 2012

[8] [8]

Thirty-Fifth Conference on Neural Information Processing Systems , year=

Noether Networks: meta-learning useful conserved quantities , author=. Thirty-Fifth Conference on Neural Information Processing Systems , year=

work page

[9] [9]

2021 , eprint=

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , author=. 2021 , eprint=

work page 2021

[10] [10]

2021 , eprint=

Noether: The More Things Change, the More Stay the Same , author=. 2021 , eprint=

work page 2021

[11] [11]

Advances in Neural Information Processing Systems , volume=

Noether’s learning dynamics: Role of symmetry breaking in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

The Journal of Machine Learning Research , volume=

A general system of differential equations to model first-order adaptive algorithms , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

work page 2020

[13] [13]

Advances in Neural Information Processing Systems , volume=

On the SDEs and scaling rules for adaptive gradient algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page

[14] [14]

proceedings of the National Academy of Sciences , volume=

A variational perspective on accelerated methods in optimization , author=. proceedings of the National Academy of Sciences , volume=. 2016 , publisher=

work page 2016

[15] [15]

AdamW Optimizer , howpublished =

work page

[16] [16]

Decoupled Weight Decay Regularization

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

International Conference on Machine Learning , pages=

Scaling vision transformers to 22 billion parameters , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[19] [19]

Wortsman, P

Small-scale proxies for large-scale Transformer training instabilities , author=. arXiv preprint arXiv:2309.14322 , year=

work page arXiv

[20] [20]

and Srivastava, Santosh and Gupta, Maya R

Frigyik, Bela A. and Srivastava, Santosh and Gupta, Maya R. , booktitle=. Functional Bregman divergence , year=

work page

[21] [21]

2023 , eprint=

Nonlocal Lagrangian formalism , author=. 2023 , eprint=

work page 2023

[22] [22]

Non-local Lagrangian mechanics: Noether’s theorem and Hamiltonian formalism , volume=

Heredia, Carlos and Llosa, Josep , year=. Non-local Lagrangian mechanics: Noether’s theorem and Hamiltonian formalism , volume=. Journal of Physics A: Mathematical and Theoretical , publisher=. doi:10.1088/1751-8121/ac265c , number=

work page doi:10.1088/1751-8121/ac265c

[23] [23]

Mechanics with fractional derivatives , author =. Phys. Rev. E , volume =. 1997 , month =

work page 1997

[24] [24]

Nonconservative Lagrangian and Hamiltonian mechanics , author =. Phys. Rev. E , volume =. 1996 , month =

work page 1996

[25] [25]

and Avkar, T

Baleanu, D. and Avkar, T. , year=. Lagrangians with linear velocities within Riemann-Liouville fractional derivatives , volume=. Il Nuovo Cimento B , publisher=. doi:10.1393/ncb/i2003-10062-y , number=

work page doi:10.1393/ncb/i2003-10062-y

[26] [26]

2008 , issn =

New applications of fractional variational principles , journal =. 2008 , issn =. doi:https://doi.org/10.1016/S0034-4877(08)80007-9 , author =

work page doi:10.1016/s0034-4877(08)80007-9 2008

[27] [27]

2002 , issn =

Formulation of Euler–Lagrange equations for fractional variational problems , journal =. 2002 , issn =. doi:https://doi.org/10.1016/S0022-247X(02)00180-4 , author =

work page doi:10.1016/s0022-247x(02)00180-4 2002

[28] [28]

2021 , eprint=

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics , author=. 2021 , eprint=

work page 2021

[29] [29]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

work page 2019

[30] [30]

Journal of Machine Learning Research , volume=

Continuous time analysis of momentum methods , author=. Journal of Machine Learning Research , volume=

work page

[31] [31]

2015 , eprint=

On Accelerated Methods in Optimization , author=. 2015 , eprint=

work page 2015

[32] [32]

Learning internal representations by error propagation , author=

work page

[33] [33]

Large-Scale Machine Learning with Stochastic Gradient Descent

Bottou, L \'e on. Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT'2010. 2010

work page 2010

[34] [34]

Journal of Machine Learning Research , year =

John Duchi and Elad Hazan and Yoram Singer , title =. Journal of Machine Learning Research , year =

work page

[35] [35]

An overview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms , author=. arXiv preprint arXiv:1609.04747 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

2012 , howpublished =

Geoffrey Hinton , title =. 2012 , howpublished =

work page 2012

[37] [37]

Action Principle and Nonlocal Field Theories , author =. Phys. Rev. D , volume =. 1973 , month =

work page 1973

[38] [38]

and Llosa, J

Jaen, X. and Llosa, J. and Molina, A. A Reduction of order two for infinite order lagrangians. Phys. Rev. D. 1986. doi:10.1103/PhysRevD.34.2302

work page doi:10.1103/physrevd.34.2302 1986

[39] [39]

2011 , publisher=

Generalized Classical Mechanics and Field Theory: A Geometrical Approach of Lagrangian and Hamiltonian Formalisms Involving Higher Order Derivatives , author=. 2011 , publisher=

work page 2011

[40] [40]

1976 , publisher=

Mechanics: Volume 1 , author=. 1976 , publisher=

work page 1976

[41] [41]

Energy-momentum tensor for the electromagnetic field in a dispersive medium , volume=

Heredia, Carlos and Llosa, Josep , year=. Energy-momentum tensor for the electromagnetic field in a dispersive medium , volume=. Journal of Physics Communications , publisher=. doi:10.1088/2399-6528/abfd14 , number=

work page doi:10.1088/2399-6528/abfd14

[42] [42]

Nonlocal Lagrangian fields and the second Noether theorem

Heredia, Carlos and Llosa, Josep , year=. Nonlocal Lagrangian fields and the second Noether theorem. Non-commutative U(1) gauge theory , volume=. Journal of High Energy Physics , publisher=. doi:10.1007/jhep04(2024)021 , number=

work page doi:10.1007/jhep04(2024)021 2024

[43] [43]

Ostrogradskii, M , title =. Mem. Acad. St. Petersburg , volume =. 1850 , pages =

work page

[44] [44]

Vladimirov, V. S. , booktitle =. Generalized functions in mathematical physics , year =

work page

[45] [45]

2006 , publisher =

Josep Peñarrocha Gantes and Arcadi Santamaria and Jordi Vidal , title =. 2006 , publisher =

work page 2006

[46] [46]

2021 , publisher=

Handbook of differential equations , author=. 2021 , publisher=

work page 2021

[47] [47]

, date-added =

Noether, E. , date-added =. Invariante Variationsprobleme , volume =. Nachrichten von der Gesellschaft der Wissenschaften zu G. 1918 , bdsk-url-1 =

work page 1918

[48] [48]

2004 , publisher=

Convex optimization , author=. 2004 , publisher=

work page 2004

[49] [49]

Nonlocal Lagrangian fields: Noether’s theorem and Hamiltonian formalism , volume=

Heredia, Carlos and Llosa, Josep , year=. Nonlocal Lagrangian fields: Noether’s theorem and Hamiltonian formalism , volume=. Physical Review D , publisher=. doi:10.1103/physrevd.105.126002 , number=

work page doi:10.1103/physrevd.105.126002

[50] [50]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Neural Ordinary Differential Equations , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[51] [51]

1982 , publisher=

Analysis, Manifolds and Physics Revised Edition , author=. 1982 , publisher=

work page 1982

[52] [52]

Brown and M

A. Brown and M. C. Bartholomew-Biggs , title =. Journal of Optimization Theory and Applications , volume =

work page

[53] [53]

, booktitle=

Romero, Orlando and Benosman, Mouhacine and Pappas, George J. , booktitle=. ODE Discretization Schemes as Optimization Algorithms , year=

work page

[54] [54]

1970 , publisher=

Lectures in Analytical Mechanics: Translated from the Russian by George Yankovsky , author=. 1970 , publisher=

work page 1970

[55] [55]

H. K. Khalil , title =. 2002 , address =

work page 2002

[56] [56]

Soviet Mathematics Doklady , volume=

A method of solving a convex programming problem with convergence rate o(1/k^2) , author=. Soviet Mathematics Doklady , volume=

work page

[57] [57]

Weijie Su and Stephen Boyd and Emmanuel J. Cand. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , journal =. 2016 , volume =

work page 2016

[58] [58]

2022 , eprint=

Neural Integro-Differential Equations , author=. 2022 , eprint=

work page 2022

[59] [59]

Advances in neural information processing systems , volume=

Neural ordinary differential equations , author=. Advances in neural information processing systems , volume=

work page

[60] [60]

SIAM Journal on Financial Mathematics , volume=

Stochastic gradient descent in continuous time , author=. SIAM Journal on Financial Mathematics , volume=. 2017 , publisher=

work page 2017

[61] [61]

2014 , issn =

IDSOLVER: A general purpose solver for nth-order integro-differential equations , journal =. 2014 , issn =. doi:doi.org/10.1016/j.cpc.2013.09.008 , author =

work page doi:10.1016/j.cpc.2013.09.008 2014

[62] [62]

2024 , eprint=

Are nonlocal Lagrangian systems fatally unstable? , author=. 2024 , eprint=

work page 2024

[63] [63]

Proceedings of the 34th International Conference on Machine Learning , pages =

Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017

[64] [64]

Communications in Mathematics and Statistics , year =

Weinan E , title =. Communications in Mathematics and Statistics , year =

work page

[65] [65]

2018 , journal =

Michael Betancourt , title =. 2018 , journal =

work page 2018

[66] [66]

Journal of Mathematical Imaging and Vision , year =

Lars Ruthotto and Eldad Haber , title =. Journal of Mathematical Imaging and Vision , year =

work page

[67] [67]

Advances in neural information processing systems , volume=

Hamiltonian neural networks , author=. Advances in neural information processing systems , volume=

work page

[68] [68]

Lagrangian neural networks,

Lagrangian neural networks , author=. arXiv preprint arXiv:2003.04630 , year=

work page arXiv 2003

[69] [69]

Advances in Neural Information Processing Systems , volume=

Noether networks: meta-learning useful conserved quantities , author=. Advances in Neural Information Processing Systems , volume=

work page

[70] [70]

1976 , isbn =

Walter Rudin , title =. 1976 , isbn =

work page 1976

[71] [71]

Bauschke and Patrick L

Heinz H. Bauschke and Patrick L. Combettes , title =. 2017 , doi =

work page 2017

[72] [72]

1998 , booktitle =

Chapter One - Linear Integral Inequalities , editor =. 1998 , booktitle =. doi:https://doi.org/10.1016/S0076-5392(98)80003-9 , author =

work page doi:10.1016/s0076-5392(98)80003-9 1998

[73] [73]

2004 , isbn =

Yurii Nesterov , title =. 2004 , isbn =

work page 2004

[74] [74]

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- ojasiewicz Condition

Karimi, Hamed and Nutini, Julie and Schmidt, Mark. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- ojasiewicz Condition. Machine Learning and Knowledge Discovery in Databases. 2016

work page 2016

[75] [75]

Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-

Attouch, H\'. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-. Math. Oper. Res. , month = may, pages =. 2010 , issue_date =. doi:10.1287/moor.1100.0449 , abstract =

work page doi:10.1287/moor.1100.0449 2010

[76] [76]

Lipschitz Functions , subtitle =

Ştefan Cobzaş and Radu Miculescu and Adriana Nicolae , series =. Lipschitz Functions , subtitle =. 2019 , isbn =. doi:10.1007/978-3-030-16489-8 , pages =

work page doi:10.1007/978-3-030-16489-8 2019

[77] [77]

Variations on Barbălat’s Lemma , volume=

Bálint Farkas and Sven-Ake Wegner , year=. Variations on Barbălat’s Lemma , volume=. The American Mathematical Monthly , publisher=. doi:10.4169/amer.math.monthly.123.8.825 , number=

work page doi:10.4169/amer.math.monthly.123.8.825

[78] [78]

Barbalat , title =

I. Barbalat , title =. Revue Math. 1959 , pages =

work page 1959

[79] [79]

Journal of Machine Learning Research , volume=

Adaptive subgradient methods for online learning and stochastic optimization , author=. Journal of Machine Learning Research , volume=

work page

[80] [80]

2025 , eprint=

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations , author=. 2025 , eprint=

work page 2025