pith. sign in

arxiv: 2502.08534 · v1 · pith:5TL6CZBHnew · submitted 2025-02-12 · 💻 cs.CE · cs.AI

Input convex neural networks: universal approximation theorem and implementation for isotropic polyconvex hyperelastic energies

Pith reviewed 2026-05-23 03:57 UTC · model grok-4.3

classification 💻 cs.CE cs.AI
keywords input convex neural networkspolyconvex hyperelasticityuniversal approximationframe-indifferenceisotropic materialsdeformation gradienthyperelastic energiespolyconvex hulls
0
0 comments X

The pith

Input convex neural networks can approximate any frame-indifferent isotropic polyconvex energy when large enough.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a neural network architecture for hyperelastic materials that automatically satisfies frame-indifference and polyconvexity while also meeting the universal approximation theorem. It combines an input-convex network structure with a representation of the energy using elementary polynomials of the signed singular values of the deformation gradient. This representation meets a sufficient and necessary criterion for the target class of functions. Consequently, any frame-indifferent isotropic polyconvex energy can be approximated to arbitrary accuracy by a sufficiently large network of this form. The same construction also enforces additional constraints such as balance of angular momentum and growth conditions.

Core claim

The proposed input-convex neural network, when formulated using the elementary polynomials of the signed singular values of the deformation gradient, satisfies the universal approximation theorem for the class of frame-indifferent, isotropic polyconvex energies and can therefore approximate any such energy (provided the network is large enough).

What carries the argument

Input-convex neural network architecture operating on the elementary polynomials of the signed singular values of the deformation gradient, which supplies a sufficient and necessary representation for frame-indifferent isotropic polyconvex functions.

If this is right

  • The networks rigorously enforce frame-indifference, polyconvexity, balance of angular momentum, and growth conditions.
  • They remain capable of approximating non-polyconvex energies.
  • They support computation of polyconvex hulls.
  • They exhibit measurable advantages over prior network constructions in comparative tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The architecture could be inserted directly into existing finite-element codes without additional constraint-enforcement steps.
  • Similar input-convex constructions might be tested on anisotropic or non-hyperelastic constitutive models.
  • The networks could serve as drop-in surrogates for expensive polyconvex hull calculations in optimization loops.

Load-bearing premise

The elementary polynomials of the signed singular values of the deformation gradient form a sufficient and necessary representation for every frame-indifferent isotropic polyconvex function.

What would settle it

An explicit example of a frame-indifferent isotropic polyconvex energy that cannot be approximated to arbitrary accuracy by any finite input-convex network built on these polynomials.

Figures

Figures reproduced from arXiv: 2502.08534 by David Wiedemann, Gian-Luca Geuken, J\"orn Mosler, Patrick Kurzeja.

Figure 1
Figure 1. Figure 1: Illustration of the convex signed singular value neural network (CSSV-NN) framework. [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean squared error of the training data set from each architecture and random initializations for [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Polyconvex signed singular value energy (Eq. (32)): Energy and stress predictions from the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Non-polyconvex Hencky energy (Eq. (33)): Energy and stress predictions from the CSSV-NN [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Approximation of a polyconvex hull ((Eq. (34)): Comparison of the ground-truth (non-polyconvex) [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

This paper presents a novel framework of neural networks for isotropic hyperelasticity that enforces necessary physical and mathematical constraints while simultaneously satisfying the universal approximation theorem. The two key ingredients are an input convex network architecture and a formulation in the elementary polynomials of the signed singular values of the deformation gradient. In line with previously published networks, it can rigorously capture frame-indifference and polyconvexity - as well as further constraints like balance of angular momentum and growth conditions. However and in contrast to previous networks, a universal approximation theorem for the proposed approach is proven. To be more explicit, the proposed network can approximate any frame-indifferent, isotropic polyconvex energy (provided the network is large enough). This is possible by working with a sufficient and necessary criterion for frame-indifferent, isotropic polyconvex functions. Comparative studies with existing approaches identify the advantages of the proposed method, particularly in approximating non-polyconvex energies as well as computing polyconvex hulls.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an input-convex neural network (ICNN) architecture for isotropic hyperelastic energies, formulated using elementary polynomials of the signed singular values of the deformation gradient F. It enforces frame-indifference, polyconvexity, balance of angular momentum, and growth conditions, and claims to prove a universal approximation theorem (UAT) showing that sufficiently large networks of this form can approximate any frame-indifferent isotropic polyconvex energy. Comparative numerical studies are included against prior networks.

Significance. If the UAT holds via a complete representation theorem, the work would provide the first ICNN-based model for this function class that is both constraint-enforcing and dense in the admissible energies, strengthening the theoretical foundation for physics-informed neural networks in computational mechanics. The explicit use of a sufficient-and-necessary criterion for polyconvexity distinguishes it from earlier architectures that lack proven density.

major comments (2)
  1. [Section stating the sufficient and necessary criterion (likely §3 or the theorem preceding the UAT proof)] The UAT in the abstract and main theorem rests on the claim that the chosen elementary polynomials of the signed singular values constitute a sufficient and necessary representation (every frame-indifferent isotropic polyconvex W is convex in these inputs, and conversely). The manuscript must explicitly prove this equivalence holds pointwise on the domain of admissible F (with det F > 0), including verification that no admissible polyconvex energies are missed and that convexity in the polynomials implies polyconvexity without over-restriction; any gap here directly falsifies the density claim.
  2. [UAT proof and preceding representation lemma] The proof of the UAT (presumably via the standard ICNN density result on convex functions) must be connected rigorously to the representation: after mapping F to the polynomial inputs, the ICNN approximates any convex function on that input space, but the manuscript needs to confirm that the input map is surjective onto the relevant domain and that the resulting energies satisfy all required growth and frame-indifference conditions without additional restrictions.
minor comments (2)
  1. [Formulation section] Notation for the elementary polynomials (e.g., how the signed singular values are combined) should be defined once with explicit formulas rather than referenced across sections.
  2. [Numerical experiments] The comparative studies would benefit from reporting the network sizes and training details used for the polyconvex-hull computations to allow direct replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the presentation of our theoretical results. We address each major comment below.

read point-by-point responses
  1. Referee: [Section stating the sufficient and necessary criterion (likely §3 or the theorem preceding the UAT proof)] The UAT in the abstract and main theorem rests on the claim that the chosen elementary polynomials of the signed singular values constitute a sufficient and necessary representation (every frame-indifferent isotropic polyconvex W is convex in these inputs, and conversely). The manuscript must explicitly prove this equivalence holds pointwise on the domain of admissible F (with det F > 0), including verification that no admissible polyconvex energies are missed and that convexity in the polynomials implies polyconvexity without over-restriction; any gap here directly falsifies the density claim.

    Authors: Theorem 3.1 in the manuscript establishes the pointwise equivalence on {F : det F > 0} by direct appeal to the representation of isotropic polyconvex functions via the signed singular values and their elementary symmetric polynomials; convexity in these inputs is necessary by the standard definition and sufficient because the map recovers the full set of polyconvex invariants without omission or extraneous constraints. We will add an explicit corollary restating the equivalence and domain to make the argument more self-contained. revision: yes

  2. Referee: [UAT proof and preceding representation lemma] The proof of the UAT (presumably via the standard ICNN density result on convex functions) must be connected rigorously to the representation: after mapping F to the polynomial inputs, the ICNN approximates any convex function on that input space, but the manuscript needs to confirm that the input map is surjective onto the relevant domain and that the resulting energies satisfy all required growth and frame-indifference conditions without additional restrictions.

    Authors: The UAT proof (Theorem 4.1) composes the surjective input map (Lemma 3.3) with the known ICNN density result for convex functions on the image domain; frame-indifference and growth conditions hold by construction of the inputs and network, with no further restrictions imposed. We will insert a short paragraph in the proof explicitly verifying surjectivity and preservation of the physical constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity; UAT rests on external representation criterion

full rationale

The derivation invokes a sufficient and necessary criterion for frame-indifferent isotropic polyconvex functions to justify the input space for the ICNN, allowing the standard universal approximation property of input-convex networks to transfer to the target class. No quoted step reduces the central claim to a self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain; the representation is treated as an independent mathematical fact. The approach is therefore self-contained against external benchmarks for polyconvexity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the elementary polynomials of signed singular values provide a complete basis for the target function class; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption The elementary polynomials of the signed singular values of the deformation gradient form a sufficient and necessary criterion for frame-indifferent isotropic polyconvex functions.
    This criterion is invoked to enable the universal approximation result.

pith-pipeline@v0.9.0 · 5702 in / 1209 out tokens · 39105 ms · 2026-05-23T03:57:30.554310+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adaptive Material Fingerprinting for the fast discovery of polyconvex feature combinations in isotropic and anisotropic hyperelasticity

    cs.CE 2026-04 unverdicted novelty 7.0

    An adaptive database and iterative pattern recognition algorithm lets Material Fingerprinting discover arbitrary linear combinations of polyconvex isotropic and anisotropic hyperelastic features from experimental data.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    & Fusco, N

    Acerbi, E. & Fusco, N. (1984). Semicontinuity problems in the calculus of variations. Arch. Rational Mech. Anal., 86(2), 125–145. https://doi.org/10.1007/BF00275731

  2. [2]

    Amos, B., Xu, L., & Kolter, J. Z. (2017). Input convex neural networks. Proceedings of the 34th Interna- tional Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 146–155. https://proceedings.mlr.press/v70/amos17b.html As’ad, F., Avery, P., & Farhat, C. (2022). A mechanics-informed artificial neural network approach in ...

  3. [3]

    Ball, J. M. (1976). On the calculus of variations and sequentially weakly continuous maps. Ordinary and partial differential equations (Proc. Fourth Conf., Univ. Dundee, Dundee, 1976), volume Vol. 564 ofLecture Notes in Math. , 13–25. Springer, Berlin-New York. 18

  4. [4]

    Ball, J. M. (1977). Constitutive inequalities and existence theorems in nonlinear elastostatics. Nonlinear analysis and mechanics: Heriot-Watt symposium , volume 1, 187–241

  5. [5]

    Bartel, T., Harnisch, M., Schweizer, B., & Menzel, A. (2023). A data-driven approach for plasticity using history surrogates: Theory and application in the context of truss structures. Computer Methods in Applied Mechanics and Engineering, 414, 116138. https://doi.org/https://doi.org/10.1016/j.cma. 2023.116138

  6. [6]

    Benady, A., Baranger, E., & Chamoin, L. (2024). Nn-mcre: A modified constitutive relation error framework for unsupervised learning of nonlinear state laws with physics-augmented neural networks. International Journal for Numerical Methods in Engineering , 125(8), e7439. https://doi.org/https://doi.org/10. 1002/nme.7439

  7. [7]

    M., & Warner, M

    Bladon, P., Terentjev, E. M., & Warner, M. (1993). Transitions and instabilities in liquid crystal elastomers. Phys. Rev. E , 47, R3838–R3840. https://doi.org/10.1103/PhysRevE.47.R3838

  8. [8]

    T., Xiao, H., & Meyers, A

    Bruhns, O. T., Xiao, H., & Meyers, A. (2001). Constitutive inequalities for an isotropic elastic strain-energy function based on hencky’s logarithmic strain tensor.Proceedings: Mathematical, Physical and Engineering Sciences, 457(2013), 2207–2226. https://doi.org/http://doi.org/10.1098/rspa.2001.0818

  9. [9]

    Canadija, M., Kosmerl, V., Zlatic, M., Vrtovsnik, D., & Munjas, N. (2024). A computational framework for nanotrusses: Input convex neural networks approach. European Journal of Mechanics - A/Solids , 103, 105195. https://doi.org/https://doi.org/10.1016/j.euromechsol.2023.105195

  10. [10]

    & Guilleminot, J

    Chen, P. & Guilleminot, J. (2022). Polyconvex neural networks for hyperelastic constitutive models: A rectification approach. Mechanics Research Communications , 125, 103993. https://doi.org/https: //doi.org/10.1016/j.mechrescom.2022.103993

  11. [11]

    Chen, Y., Shi, Y., & Zhang, B. (2019). Optimal control via neural networks: A convex approach. https: //arxiv.org/abs/1805.11835

  12. [12]

    Dacorogna, B. (2008). Direct methods in the calculus of variations (second ed.), volume 78 of Applied Mathematical Sciences. Springer, New York

  13. [13]

    & Koshigoe, H

    Dacorogna, B. & Koshigoe, H. (1993). On the different notions of convexity for rotationally invariant functions. Annales de la Facult´ e des sciences de Toulouse : Math´ ematiques, Ser. 6, 2(2), 163–184. http: //www.numdam.org/item/AFST_1993_6_2_2_163_0/

  14. [14]

    & Dolzmann, G

    DeSimone, A. & Dolzmann, G. (2002). Macroscopic Response of Nematic Elastomers via Relaxation of a Class of SO(3)-Invariant Energies. Archive for Rational Mechanics and Analysis , 161(3), 181–204. https://doi.org/10.1007/s002050100174

  15. [15]

    N., Jones, R

    Fuhg, J. N., Jones, R. E., & Bouklas, N. (2024). Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics. Computer Methods in Applied Mechanics and Engineering, 426, 116973. https://doi.org/https://doi.org/10.1016/j.cma.2024.116973

  16. [16]

    Geuken, G.-L., Mosler, J., & Kurzeja, P. (2024). Incorporating sufficient physical information into artificial neural networks: A guaranteed improvement via physics-based rao-blackwellization. Computer Methods in Applied Mechanics and Engineering , 423, 116848. https://doi.org/https://doi.org/10.1016/j. cma.2024.116848 19

  17. [17]

    Henkes, A., Wessels, H., & Mahnken, R. (2022). Physics informed neural networks for continuum microme- chanics. Computer Methods in Applied Mechanics and Engineering , 393, 114790. https://doi.org/10. 1016/j.cma.2022.114790

  18. [18]

    Hu, H., Qi, L., & Chao, X. (2024). Physics-informed neural networks (pinn) for computational solid mechanics: Numerical frameworks and applications. Thin-Walled Structures , 205, 112495. https: //doi.org/https://doi.org/10.1016/j.tws.2024.112495

  19. [19]

    Huang, C., Chen, R. T. Q., Tsirigotis, C., & Courville, A. C. (2020). Convex potential flows: Universal probability distributions with optimal transport and convex optimization. CoRR, abs/2012.05942. https: //arxiv.org/abs/2012.05942

  20. [20]

    & Karniadakis, G

    Jagtap, A. & Karniadakis, G. (2020). Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equa- tions. Communications in Computational Physics , 28, 2002–2041. https://doi.org/10.4208/cicp. OA-2020-0164

  21. [21]

    & Mukerji, T

    Kashefi, A. & Mukerji, T. (2022). Physics-informed pointnet: A deep learning solver for steady-state in- compressible flows and thermal fields on multiple sets of irregular geometries. Journal of Computational Physics, 468, 111510. https://doi.org/10.1016/j.jcp.2022.111510

  22. [22]

    & Ortiz, M

    Kirchdoerfer, T. & Ortiz, M. (2016). Data-driven computational mechanics. Computer Methods in Applied Mechanics and Engineering, 304, 81–101. https://doi.org/10.1016/j.cma.2016.02.001

  23. [23]

    K., Fern´ andez, M., Martin, R

    Klein, D. K., Fern´ andez, M., Martin, R. J., Neff, P., & Weeger, O. (2022). Polyconvex anisotropic hyperelasticity with neural networks. Journal of the Mechanics and Physics of Solids , 159, 104703. https://doi.org/10.1016/j.jmps.2021.104703

  24. [24]

    K., Roth, F

    Klein, D. K., Roth, F. J., Valizadeh, I., & Weeger, O. (2023). Parametrized polyconvex hyperelasticity with physics-augmented neural networks. Data-Centric Engineering, 4, e25. https://doi.org/10.1017/dce. 2023.21

  25. [25]

    & Kochmann, D

    Kumar, S. & Kochmann, D. M. (2022). What Machine Learning Can Do for Computational Solid Mechanics, 275–285. Springer International Publishing. https://doi.org/10.1007/978-3-030-87312-7_27

  26. [26]

    Lagaris, I., Likas, A., & Fotiadis, D. (1998). Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks , 9(5), 987–1000. https://doi.org/10. 1109/72.712178

  27. [27]

    Li, Z., Li, X., Chen, Y., & Zhang, C. (2023). A mechanics-informed machine learning approach for modeling the elastoplastic behavior of fiber-reinforced composites. Composite Structures, 323, 117473. https: //doi.org/https://doi.org/10.1016/j.compstruct.2023.117473

  28. [28]

    & Chandrashekhara, K

    Liang, G. & Chandrashekhara, K. (2008). Neural network based constitutive model for elastomeric foams. Engineering Structures, 30, 2002–2011. https://doi.org/10.1016/j.engstruct.2007.12.021

  29. [29]

    K., Kalina, K

    Linden, L., Klein, D. K., Kalina, K. A., Brummund, J., Weeger, O., & K¨ astner, M. (2023). Neural networks meet hyperelasticity: A guide to enforcing physics. Journal of the Mechanics and Physics of Solids , 179, 105363. https://doi.org/https://doi.org/10.1016/j.jmps.2023.105363

  30. [30]

    P., Aydin, R

    Linka, K., Hillg¨ artner, M., Abdolazizi, K. P., Aydin, R. C., Itskov, M., & Cyron, C. J. (2021). Constitutive artificial neural networks: A fast and general approach to predictive data-driven constitutive modeling by deep learning. Journal of Computational Physics , 429, 110010. https://doi.org/10.1016/j.jcp.2020. 110010 20

  31. [31]

    & Kuhl, E

    Linka, K. & Kuhl, E. (2023). A new family of constitutive artificial neural networks towards automated model discovery. Computer Methods in Applied Mechanics and Engineering , 403, 115731. https://doi. org/https://doi.org/10.1016/j.cma.2022.115731

  32. [32]

    Liu, Z., Chen, Y., Du, Y., & Tegmark, M. (2021). Physics-augmented learning: A new paradigm beyond physics-informed learning. https://arxiv.org/abs/2109.13901

  33. [33]

    Marcellini, P. (1985). Approximation of quasiconvex functions, and lower semicontinuity of multiple integrals. Manuscripta Math., 51(1-3), 1–28. https://doi.org/10.1007/BF01168345

  34. [34]

    Meyer, K. A. & Ekre, F. (2023). Thermodynamically consistent neural network plasticity modeling and discovery of evolution laws. Journal of the Mechanics and Physics of Solids , 180, 105416. https://doi. org/https://doi.org/10.1016/j.jmps.2023.105416

  35. [35]

    Meyers, N. G. (1965). Quasi-convexity and lower semi-continuity of multiple variational integrals of any order. Trans. Amer. Math. Soc., 119, 125–149. https://doi.org/10.1090/S0002-9947-1965-0188838-3

  36. [36]

    Morrey, Jr., C. B. (1952). Quasi-convexity and the lower semicontinuity of multiple integrals. Pacific J. Math., 2, 25–53. http://projecteuclid.org/euclid.pjm/1103051941

  37. [37]

    Morrey, Jr., C. B. (1966).Multiple integrals in the calculus of variations, volume Band 130 ofDie Grundlehren der mathematischen Wissenschaften . Springer-Verlag New York, Inc., New York

  38. [38]

    Moseley, B. (2022). Physics-informed machine learning: from concepts to real-world applications. https: //api.semanticscholar.org/CorpusID:254638738

  39. [39]

    A., Peterseim, D., & Wiedemann, D

    Neumeier, T., Peter, M. A., Peterseim, D., & Wiedemann, D. (2024). Computational polyconvexification of isotropic functions. Multiscale Modeling & Simulation , 22(4), 1402–1420. https://doi.org/10.1137/ 23M1589773

  40. [40]

    W., Perdikaris, P., Petzold, L., & Kuhl, E

    Karniadakis, G., Lytton, W. W., Perdikaris, P., Petzold, L., & Kuhl, E. (2021). Multiscale Modeling Meets Machine Learning: What Can We Learn? Archives of Computational Methods in Engineering , 28(3), 1017–1037. https://doi.org/10.1007/s11831-020-09405-5

  41. [41]

    Raissi, M., Perdikaris, P., & Karniadakis, G. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics , 378, 686–707. https://doi.org/10.1016/j.jcp.2018.10.045

  42. [42]

    Raoult, A. (1986). Non-polyconvexity of the stored energy function of a saint venant-kirchhoff material. Aplikace matematiky, 31(6), 417–419. https://doi.org/10.21136/AM.1986.104220

  43. [43]

    Settgast, C., H¨ utter, G., Kuna, M., & Abendroth, M. (2020). A hybrid approach to simulate the homogenized irreversible elastic–plastic deformations and damage of foams by neural networks. International Journal of Plasticity, 126, 102624. https://doi.org/10.1016/j.ijplas.2019.11.003

  44. [44]

    F., & Oliver, L

    Shen, Y., Chandrashekhara, K., Breig, W. F., & Oliver, L. R. (2004). Neural network based constitutive model for rubber material. Rubber Chemistry and Technology , 77(2), 257–277. https://doi.org/10. 5254/1.3547822 St. Pierre, S. R., Linka, K., & Kuhl, E. (2023). Principal-stretch-based constitutive neural networks au- tonomously discover a subclass of og...

  45. [45]

    Thakolkaran, P., Joshi, A., Zheng, Y., Flaschel, M., De Lorenzis, L., & Kumar, S. (2022). Nn-euclid: Deep- learning hyperelasticity without stress data. Journal of the Mechanics and Physics of Solids , 169, 105076. https://doi.org/https://doi.org/10.1016/j.jmps.2022.105076

  46. [46]

    & Noll, W

    Truesdell, C. & Noll, W. (1965). The non-linear field theories of mechanics . Springer

  47. [47]

    & Peter, M

    Wiedemann, D. & Peter, M. A. (2023). Characterization of polyconvex isotropic functions. https://arxiv. org/abs/2304.08385

  48. [48]

    & Canadija, M

    Zlatic, M. & Canadija, M. (2024). Recovering mullins damage hyperelastic behaviour with physics augmented neural networks. Journal of the Mechanics and Physics of Solids , 193, 105839. https://doi.org/https: //doi.org/10.1016/j.jmps.2024.105839 22