Input convex neural networks: universal approximation theorem and implementation for isotropic polyconvex hyperelastic energies
Pith reviewed 2026-05-23 03:57 UTC · model grok-4.3
The pith
Input convex neural networks can approximate any frame-indifferent isotropic polyconvex energy when large enough.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed input-convex neural network, when formulated using the elementary polynomials of the signed singular values of the deformation gradient, satisfies the universal approximation theorem for the class of frame-indifferent, isotropic polyconvex energies and can therefore approximate any such energy (provided the network is large enough).
What carries the argument
Input-convex neural network architecture operating on the elementary polynomials of the signed singular values of the deformation gradient, which supplies a sufficient and necessary representation for frame-indifferent isotropic polyconvex functions.
If this is right
- The networks rigorously enforce frame-indifference, polyconvexity, balance of angular momentum, and growth conditions.
- They remain capable of approximating non-polyconvex energies.
- They support computation of polyconvex hulls.
- They exhibit measurable advantages over prior network constructions in comparative tests.
Where Pith is reading between the lines
- The architecture could be inserted directly into existing finite-element codes without additional constraint-enforcement steps.
- Similar input-convex constructions might be tested on anisotropic or non-hyperelastic constitutive models.
- The networks could serve as drop-in surrogates for expensive polyconvex hull calculations in optimization loops.
Load-bearing premise
The elementary polynomials of the signed singular values of the deformation gradient form a sufficient and necessary representation for every frame-indifferent isotropic polyconvex function.
What would settle it
An explicit example of a frame-indifferent isotropic polyconvex energy that cannot be approximated to arbitrary accuracy by any finite input-convex network built on these polynomials.
Figures
read the original abstract
This paper presents a novel framework of neural networks for isotropic hyperelasticity that enforces necessary physical and mathematical constraints while simultaneously satisfying the universal approximation theorem. The two key ingredients are an input convex network architecture and a formulation in the elementary polynomials of the signed singular values of the deformation gradient. In line with previously published networks, it can rigorously capture frame-indifference and polyconvexity - as well as further constraints like balance of angular momentum and growth conditions. However and in contrast to previous networks, a universal approximation theorem for the proposed approach is proven. To be more explicit, the proposed network can approximate any frame-indifferent, isotropic polyconvex energy (provided the network is large enough). This is possible by working with a sufficient and necessary criterion for frame-indifferent, isotropic polyconvex functions. Comparative studies with existing approaches identify the advantages of the proposed method, particularly in approximating non-polyconvex energies as well as computing polyconvex hulls.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an input-convex neural network (ICNN) architecture for isotropic hyperelastic energies, formulated using elementary polynomials of the signed singular values of the deformation gradient F. It enforces frame-indifference, polyconvexity, balance of angular momentum, and growth conditions, and claims to prove a universal approximation theorem (UAT) showing that sufficiently large networks of this form can approximate any frame-indifferent isotropic polyconvex energy. Comparative numerical studies are included against prior networks.
Significance. If the UAT holds via a complete representation theorem, the work would provide the first ICNN-based model for this function class that is both constraint-enforcing and dense in the admissible energies, strengthening the theoretical foundation for physics-informed neural networks in computational mechanics. The explicit use of a sufficient-and-necessary criterion for polyconvexity distinguishes it from earlier architectures that lack proven density.
major comments (2)
- [Section stating the sufficient and necessary criterion (likely §3 or the theorem preceding the UAT proof)] The UAT in the abstract and main theorem rests on the claim that the chosen elementary polynomials of the signed singular values constitute a sufficient and necessary representation (every frame-indifferent isotropic polyconvex W is convex in these inputs, and conversely). The manuscript must explicitly prove this equivalence holds pointwise on the domain of admissible F (with det F > 0), including verification that no admissible polyconvex energies are missed and that convexity in the polynomials implies polyconvexity without over-restriction; any gap here directly falsifies the density claim.
- [UAT proof and preceding representation lemma] The proof of the UAT (presumably via the standard ICNN density result on convex functions) must be connected rigorously to the representation: after mapping F to the polynomial inputs, the ICNN approximates any convex function on that input space, but the manuscript needs to confirm that the input map is surjective onto the relevant domain and that the resulting energies satisfy all required growth and frame-indifference conditions without additional restrictions.
minor comments (2)
- [Formulation section] Notation for the elementary polynomials (e.g., how the signed singular values are combined) should be defined once with explicit formulas rather than referenced across sections.
- [Numerical experiments] The comparative studies would benefit from reporting the network sizes and training details used for the polyconvex-hull computations to allow direct replication.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments, which help clarify the presentation of our theoretical results. We address each major comment below.
read point-by-point responses
-
Referee: [Section stating the sufficient and necessary criterion (likely §3 or the theorem preceding the UAT proof)] The UAT in the abstract and main theorem rests on the claim that the chosen elementary polynomials of the signed singular values constitute a sufficient and necessary representation (every frame-indifferent isotropic polyconvex W is convex in these inputs, and conversely). The manuscript must explicitly prove this equivalence holds pointwise on the domain of admissible F (with det F > 0), including verification that no admissible polyconvex energies are missed and that convexity in the polynomials implies polyconvexity without over-restriction; any gap here directly falsifies the density claim.
Authors: Theorem 3.1 in the manuscript establishes the pointwise equivalence on {F : det F > 0} by direct appeal to the representation of isotropic polyconvex functions via the signed singular values and their elementary symmetric polynomials; convexity in these inputs is necessary by the standard definition and sufficient because the map recovers the full set of polyconvex invariants without omission or extraneous constraints. We will add an explicit corollary restating the equivalence and domain to make the argument more self-contained. revision: yes
-
Referee: [UAT proof and preceding representation lemma] The proof of the UAT (presumably via the standard ICNN density result on convex functions) must be connected rigorously to the representation: after mapping F to the polynomial inputs, the ICNN approximates any convex function on that input space, but the manuscript needs to confirm that the input map is surjective onto the relevant domain and that the resulting energies satisfy all required growth and frame-indifference conditions without additional restrictions.
Authors: The UAT proof (Theorem 4.1) composes the surjective input map (Lemma 3.3) with the known ICNN density result for convex functions on the image domain; frame-indifference and growth conditions hold by construction of the inputs and network, with no further restrictions imposed. We will insert a short paragraph in the proof explicitly verifying surjectivity and preservation of the physical constraints. revision: yes
Circularity Check
No significant circularity; UAT rests on external representation criterion
full rationale
The derivation invokes a sufficient and necessary criterion for frame-indifferent isotropic polyconvex functions to justify the input space for the ICNN, allowing the standard universal approximation property of input-convex networks to transfer to the target class. No quoted step reduces the central claim to a self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain; the representation is treated as an independent mathematical fact. The approach is therefore self-contained against external benchmarks for polyconvexity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The elementary polynomials of the signed singular values of the deformation gradient form a sufficient and necessary criterion for frame-indifferent isotropic polyconvex functions.
Forward citations
Cited by 1 Pith paper
-
Adaptive Material Fingerprinting for the fast discovery of polyconvex feature combinations in isotropic and anisotropic hyperelasticity
An adaptive database and iterative pattern recognition algorithm lets Material Fingerprinting discover arbitrary linear combinations of polyconvex isotropic and anisotropic hyperelastic features from experimental data.
Reference graph
Works this paper leans on
-
[1]
Acerbi, E. & Fusco, N. (1984). Semicontinuity problems in the calculus of variations. Arch. Rational Mech. Anal., 86(2), 125–145. https://doi.org/10.1007/BF00275731
-
[2]
Amos, B., Xu, L., & Kolter, J. Z. (2017). Input convex neural networks. Proceedings of the 34th Interna- tional Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 146–155. https://proceedings.mlr.press/v70/amos17b.html As’ad, F., Avery, P., & Farhat, C. (2022). A mechanics-informed artificial neural network approach in ...
-
[3]
Ball, J. M. (1976). On the calculus of variations and sequentially weakly continuous maps. Ordinary and partial differential equations (Proc. Fourth Conf., Univ. Dundee, Dundee, 1976), volume Vol. 564 ofLecture Notes in Math. , 13–25. Springer, Berlin-New York. 18
work page 1976
-
[4]
Ball, J. M. (1977). Constitutive inequalities and existence theorems in nonlinear elastostatics. Nonlinear analysis and mechanics: Heriot-Watt symposium , volume 1, 187–241
work page 1977
-
[5]
Bartel, T., Harnisch, M., Schweizer, B., & Menzel, A. (2023). A data-driven approach for plasticity using history surrogates: Theory and application in the context of truss structures. Computer Methods in Applied Mechanics and Engineering, 414, 116138. https://doi.org/https://doi.org/10.1016/j.cma. 2023.116138
-
[6]
Benady, A., Baranger, E., & Chamoin, L. (2024). Nn-mcre: A modified constitutive relation error framework for unsupervised learning of nonlinear state laws with physics-augmented neural networks. International Journal for Numerical Methods in Engineering , 125(8), e7439. https://doi.org/https://doi.org/10. 1002/nme.7439
work page 2024
-
[7]
Bladon, P., Terentjev, E. M., & Warner, M. (1993). Transitions and instabilities in liquid crystal elastomers. Phys. Rev. E , 47, R3838–R3840. https://doi.org/10.1103/PhysRevE.47.R3838
-
[8]
Bruhns, O. T., Xiao, H., & Meyers, A. (2001). Constitutive inequalities for an isotropic elastic strain-energy function based on hencky’s logarithmic strain tensor.Proceedings: Mathematical, Physical and Engineering Sciences, 457(2013), 2207–2226. https://doi.org/http://doi.org/10.1098/rspa.2001.0818
-
[9]
Canadija, M., Kosmerl, V., Zlatic, M., Vrtovsnik, D., & Munjas, N. (2024). A computational framework for nanotrusses: Input convex neural networks approach. European Journal of Mechanics - A/Solids , 103, 105195. https://doi.org/https://doi.org/10.1016/j.euromechsol.2023.105195
-
[10]
Chen, P. & Guilleminot, J. (2022). Polyconvex neural networks for hyperelastic constitutive models: A rectification approach. Mechanics Research Communications , 125, 103993. https://doi.org/https: //doi.org/10.1016/j.mechrescom.2022.103993
-
[11]
Chen, Y., Shi, Y., & Zhang, B. (2019). Optimal control via neural networks: A convex approach. https: //arxiv.org/abs/1805.11835
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[12]
Dacorogna, B. (2008). Direct methods in the calculus of variations (second ed.), volume 78 of Applied Mathematical Sciences. Springer, New York
work page 2008
-
[13]
Dacorogna, B. & Koshigoe, H. (1993). On the different notions of convexity for rotationally invariant functions. Annales de la Facult´ e des sciences de Toulouse : Math´ ematiques, Ser. 6, 2(2), 163–184. http: //www.numdam.org/item/AFST_1993_6_2_2_163_0/
work page 1993
-
[14]
DeSimone, A. & Dolzmann, G. (2002). Macroscopic Response of Nematic Elastomers via Relaxation of a Class of SO(3)-Invariant Energies. Archive for Rational Mechanics and Analysis , 161(3), 181–204. https://doi.org/10.1007/s002050100174
-
[15]
Fuhg, J. N., Jones, R. E., & Bouklas, N. (2024). Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics. Computer Methods in Applied Mechanics and Engineering, 426, 116973. https://doi.org/https://doi.org/10.1016/j.cma.2024.116973
-
[16]
Geuken, G.-L., Mosler, J., & Kurzeja, P. (2024). Incorporating sufficient physical information into artificial neural networks: A guaranteed improvement via physics-based rao-blackwellization. Computer Methods in Applied Mechanics and Engineering , 423, 116848. https://doi.org/https://doi.org/10.1016/j. cma.2024.116848 19
work page doi:10.1016/j 2024
- [17]
-
[18]
Hu, H., Qi, L., & Chao, X. (2024). Physics-informed neural networks (pinn) for computational solid mechanics: Numerical frameworks and applications. Thin-Walled Structures , 205, 112495. https: //doi.org/https://doi.org/10.1016/j.tws.2024.112495
- [19]
-
[20]
Jagtap, A. & Karniadakis, G. (2020). Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equa- tions. Communications in Computational Physics , 28, 2002–2041. https://doi.org/10.4208/cicp. OA-2020-0164
-
[21]
Kashefi, A. & Mukerji, T. (2022). Physics-informed pointnet: A deep learning solver for steady-state in- compressible flows and thermal fields on multiple sets of irregular geometries. Journal of Computational Physics, 468, 111510. https://doi.org/10.1016/j.jcp.2022.111510
-
[22]
Kirchdoerfer, T. & Ortiz, M. (2016). Data-driven computational mechanics. Computer Methods in Applied Mechanics and Engineering, 304, 81–101. https://doi.org/10.1016/j.cma.2016.02.001
-
[23]
K., Fern´ andez, M., Martin, R
Klein, D. K., Fern´ andez, M., Martin, R. J., Neff, P., & Weeger, O. (2022). Polyconvex anisotropic hyperelasticity with neural networks. Journal of the Mechanics and Physics of Solids , 159, 104703. https://doi.org/10.1016/j.jmps.2021.104703
-
[24]
Klein, D. K., Roth, F. J., Valizadeh, I., & Weeger, O. (2023). Parametrized polyconvex hyperelasticity with physics-augmented neural networks. Data-Centric Engineering, 4, e25. https://doi.org/10.1017/dce. 2023.21
work page doi:10.1017/dce 2023
-
[25]
Kumar, S. & Kochmann, D. M. (2022). What Machine Learning Can Do for Computational Solid Mechanics, 275–285. Springer International Publishing. https://doi.org/10.1007/978-3-030-87312-7_27
-
[26]
Lagaris, I., Likas, A., & Fotiadis, D. (1998). Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks , 9(5), 987–1000. https://doi.org/10. 1109/72.712178
work page 1998
-
[27]
Li, Z., Li, X., Chen, Y., & Zhang, C. (2023). A mechanics-informed machine learning approach for modeling the elastoplastic behavior of fiber-reinforced composites. Composite Structures, 323, 117473. https: //doi.org/https://doi.org/10.1016/j.compstruct.2023.117473
-
[28]
Liang, G. & Chandrashekhara, K. (2008). Neural network based constitutive model for elastomeric foams. Engineering Structures, 30, 2002–2011. https://doi.org/10.1016/j.engstruct.2007.12.021
-
[29]
Linden, L., Klein, D. K., Kalina, K. A., Brummund, J., Weeger, O., & K¨ astner, M. (2023). Neural networks meet hyperelasticity: A guide to enforcing physics. Journal of the Mechanics and Physics of Solids , 179, 105363. https://doi.org/https://doi.org/10.1016/j.jmps.2023.105363
-
[30]
Linka, K., Hillg¨ artner, M., Abdolazizi, K. P., Aydin, R. C., Itskov, M., & Cyron, C. J. (2021). Constitutive artificial neural networks: A fast and general approach to predictive data-driven constitutive modeling by deep learning. Journal of Computational Physics , 429, 110010. https://doi.org/10.1016/j.jcp.2020. 110010 20
-
[31]
Linka, K. & Kuhl, E. (2023). A new family of constitutive artificial neural networks towards automated model discovery. Computer Methods in Applied Mechanics and Engineering , 403, 115731. https://doi. org/https://doi.org/10.1016/j.cma.2022.115731
- [32]
-
[33]
Marcellini, P. (1985). Approximation of quasiconvex functions, and lower semicontinuity of multiple integrals. Manuscripta Math., 51(1-3), 1–28. https://doi.org/10.1007/BF01168345
-
[34]
Meyer, K. A. & Ekre, F. (2023). Thermodynamically consistent neural network plasticity modeling and discovery of evolution laws. Journal of the Mechanics and Physics of Solids , 180, 105416. https://doi. org/https://doi.org/10.1016/j.jmps.2023.105416
-
[35]
Meyers, N. G. (1965). Quasi-convexity and lower semi-continuity of multiple variational integrals of any order. Trans. Amer. Math. Soc., 119, 125–149. https://doi.org/10.1090/S0002-9947-1965-0188838-3
- [36]
-
[37]
Morrey, Jr., C. B. (1966).Multiple integrals in the calculus of variations, volume Band 130 ofDie Grundlehren der mathematischen Wissenschaften . Springer-Verlag New York, Inc., New York
work page 1966
-
[38]
Moseley, B. (2022). Physics-informed machine learning: from concepts to real-world applications. https: //api.semanticscholar.org/CorpusID:254638738
work page 2022
-
[39]
A., Peterseim, D., & Wiedemann, D
Neumeier, T., Peter, M. A., Peterseim, D., & Wiedemann, D. (2024). Computational polyconvexification of isotropic functions. Multiscale Modeling & Simulation , 22(4), 1402–1420. https://doi.org/10.1137/ 23M1589773
work page 2024
-
[40]
W., Perdikaris, P., Petzold, L., & Kuhl, E
Karniadakis, G., Lytton, W. W., Perdikaris, P., Petzold, L., & Kuhl, E. (2021). Multiscale Modeling Meets Machine Learning: What Can We Learn? Archives of Computational Methods in Engineering , 28(3), 1017–1037. https://doi.org/10.1007/s11831-020-09405-5
-
[41]
Raissi, M., Perdikaris, P., & Karniadakis, G. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics , 378, 686–707. https://doi.org/10.1016/j.jcp.2018.10.045
-
[42]
Raoult, A. (1986). Non-polyconvexity of the stored energy function of a saint venant-kirchhoff material. Aplikace matematiky, 31(6), 417–419. https://doi.org/10.21136/AM.1986.104220
-
[43]
Settgast, C., H¨ utter, G., Kuna, M., & Abendroth, M. (2020). A hybrid approach to simulate the homogenized irreversible elastic–plastic deformations and damage of foams by neural networks. International Journal of Plasticity, 126, 102624. https://doi.org/10.1016/j.ijplas.2019.11.003
-
[44]
Shen, Y., Chandrashekhara, K., Breig, W. F., & Oliver, L. R. (2004). Neural network based constitutive model for rubber material. Rubber Chemistry and Technology , 77(2), 257–277. https://doi.org/10. 5254/1.3547822 St. Pierre, S. R., Linka, K., & Kuhl, E. (2023). Principal-stretch-based constitutive neural networks au- tonomously discover a subclass of og...
-
[45]
Thakolkaran, P., Joshi, A., Zheng, Y., Flaschel, M., De Lorenzis, L., & Kumar, S. (2022). Nn-euclid: Deep- learning hyperelasticity without stress data. Journal of the Mechanics and Physics of Solids , 169, 105076. https://doi.org/https://doi.org/10.1016/j.jmps.2022.105076
- [46]
-
[47]
Wiedemann, D. & Peter, M. A. (2023). Characterization of polyconvex isotropic functions. https://arxiv. org/abs/2304.08385
-
[48]
Zlatic, M. & Canadija, M. (2024). Recovering mullins damage hyperelastic behaviour with physics augmented neural networks. Journal of the Mechanics and Physics of Solids , 193, 105839. https://doi.org/https: //doi.org/10.1016/j.jmps.2024.105839 22
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.