Decision-Aware Evaluation of Physics-Informed Surrogates

Andrzej Czy\.zewski; Daniel Cie\'slak

arxiv: 2606.07146 · v1 · pith:FKIQ4Q3Dnew · submitted 2026-06-05 · 💻 cs.LG · cs.CE

Decision-Aware Evaluation of Physics-Informed Surrogates

Daniel Cie\'slak , Andrzej Czy\.zewski This is my paper

Pith reviewed 2026-06-27 22:14 UTC · model grok-4.3

classification 💻 cs.LG cs.CE

keywords physics-informed neural networkssurrogate modelslattice designdecision metricsbenchmarkingmaterial transferregression errorphysics-informed losses

0 comments

The pith

Standard curve-error metrics frequently fail to identify useful lattice designs for engineering decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Physics-informed surrogates for lattice design are usually judged by how closely their predicted force curves match reality. The paper demonstrates that this is not enough when the real goal is to rank candidate designs, reject infeasible ones, and minimize the cost of choosing the wrong one. It introduces a benchmark called pinn-gym that adds decision metrics on top of the usual error measures, using a crush-and-impact simulator and actual polymer materials. Results across different training settings show that physics-informed losses do not improve every metric together and that making the inputs dimensionless helps comparisons without making material transfer work in both directions.

Core claim

Low nRMSE on response curves is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs among metrics rather than monotonically improving all of them, and dimensionless conditioning improves comparability without making transfer symmetric.

What carries the argument

The pinn-gym benchmark protocol that measures curve fidelity, physical admissibility, top-k retrieval accuracy, and mass regret on top of a reduced-order oracle for crush and impact.

Load-bearing premise

The transparent reduced-order crush-and-impact oracle, together with the five printable polymer cards and the defined protocol, serves as a valid proxy for real engineering decision outcomes in lattice design.

What would settle it

Finding a surrogate with higher nRMSE that consistently produces better top-k designs or lower regret than a lower-nRMSE model in physical validation tests would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.07146 by Andrzej Czy\.zewski, Daniel Cie\'slak.

**Figure 1.** Figure 1: Overview of the pinn-gym benchmark. A material-aware sampler draws candidate lattice geometries for five printable polymer cards; a declared reduced-order crush-and-impact oracle labels every candidate with a force–displacement curve, absorbed energy, peak force and a feasibility outcome. Force and displacement are nondimensionalised to the target ˆf(ε) = F/(σyAenv), which makes the polymer an input rather… view at source ↗

**Figure 2.** Figure 2: One oracle, five design regimes. (a) Fraction of oracle-feasible candidates in each held-out pool under the fixed fixture; the compliant TPU card (highlighted) is feasible for 64.6% of candidates, whereas the stiff PA-CF card is feasible for only 12.8%. (b) The corresponding candidate-mass envelope (minimum, median and maximum) shows that the cards also occupy different regions of the mass axis. Because fe… view at source ↗

**Figure 3.** Figure 3: Curve accuracy and design utility disagree within a single material card. Per-material surrogate metrics for the three model families across (a) curve fidelity (nRMSE, lower is better), (b) top-10 feasible retrieval (P@10, higher is better) and (c) oracle-violation rate (lower is better). The highlighted case marks a card on which the lowest-curve-error model retrieves no feasible design in its top ten. Re… view at source ↗

**Figure 4.** Figure 4: Pooled material-conditioned evaluation. A single nondimensional surrogate can fit multiple material cards, but curve error, oracle violations and top-k retrieval remain partially decoupled [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Dimensionless conditioning aligns scales but leaves cross-material transfer strongly asymmetric. Cross-material transfer of the PINN-energy model across five material cards. Rows denote the material used for training and columns the material used for evaluation; diagonal entries are omitted because self-material performance is reported separately. Panel A shows curve-prediction error as nRMSE (highlighted … view at source ↗

read the original abstract

Physics-informed machine learning is often assessed by curve error, although engineering use depends on downstream decisions: ranking candidates, avoiding infeasible designs and limiting regret. We introduce pinn-gym, an open benchmark for material-conditioned lattice design that couples a transparent reduced-order crush-and-impact oracle with five printable polymer cards, dimensionless force-response targets and a protocol spanning curve fidelity, physical admissibility, top-k retrieval and mass regret. Across per-material, pooled and cross-material settings, low nRMSE is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs rather than monotonically improving all metrics, and dimensionless conditioning improves comparability without making transfer symmetric. The benchmark is not a certified material model; within the released oracle, candidate generator and material cards, pinn-gym provides a reproducible testbed for evaluating PIML surrogates as decision systems rather than curve predictors alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ships a new open benchmark (pinn-gym) that tests whether curve error tracks decision quality in material-conditioned lattice design, and the empirical mismatch it reports is worth referee attention even if the oracle remains a proxy.

read the letter

The main thing here is the release of pinn-gym: a reproducible testbed that pairs a reduced-order crush-and-impact oracle with five printable polymer cards and a protocol that scores models on nRMSE, physical admissibility, top-k retrieval, and mass regret. It runs the same surrogates in per-material, pooled, and cross-material regimes and shows that low curve error frequently fails to surface the best designs while physics-informed losses change the trade-offs rather than improving everything at once. Dimensionless conditioning helps comparability but does not symmetrize transfer. That combination of decision metrics plus the coupled oracle is not already packaged in the literature the abstract cites.

The work is transparent about its limits: the abstract states the oracle is not a certified material model. The stress-test concern about omitted rate-dependent effects or contact nonlinearities is therefore on point; any mismatch between error and regret could be tied to this particular proxy. Without the full methods, data splits, and statistical tests it is also hard to judge how stable the reported insufficiencies of nRMSE actually are.

The paper is aimed at people who already build or evaluate physics-informed surrogates for engineering tasks and want a concrete way to check whether their models are useful for ranking or regret minimization. It is worth sending to referees because the benchmark itself is new, the protocol is specified, and the central observation (curve error is not enough) is falsifiable within the released setup. A revision could tighten the oracle validation or add sensitivity checks, but the core contribution stands on its own.

Referee Report

1 major / 2 minor

Summary. The paper introduces pinn-gym, an open benchmark coupling a transparent reduced-order crush-and-impact oracle with five printable polymer material cards and dimensionless force-response targets. It evaluates physics-informed surrogates across per-material, pooled, and cross-material settings using protocols for curve fidelity (nRMSE), physical admissibility, top-k retrieval, and mass regret. The central empirical claims are that low nRMSE frequently fails to identify useful design selections, that physics-informed losses alter (rather than uniformly improve) decision metrics, and that dimensionless conditioning improves cross-material comparability without symmetric transfer.

Significance. If the reported mismatches between curve error and decision quality hold within the released components, the work supplies a reproducible, decision-oriented testbed that directly addresses a recognized gap in PIML evaluation. The explicit disclaimer that the oracle is not a certified material model, together with the open release of the oracle, candidate generator, and cards, constitutes a concrete strength that enables community scrutiny and extension. The findings provide falsifiable, quantitative evidence that standard error metrics are often insufficient proxies for engineering utility.

major comments (1)

[§4 (experimental protocol) and appendix] The load-bearing assumption for the central claims is the reduced-order oracle's adequacy as a proxy for lattice design decisions. While the abstract and introduction correctly note that the benchmark is not a certified model, the manuscript would benefit from an explicit sensitivity study (e.g., in §4 or the appendix) showing how the reported gaps between nRMSE and top-k/admissibility/regret metrics respond to plausible variations in the oracle's contact or rate-dependent parameters.

minor comments (2)

[§3] Notation for the five polymer cards and the exact definition of the dimensionless conditioning should be consolidated in a single table or subsection to improve readability for readers implementing the benchmark.
[§5.3] The cross-material transfer results would be clearer if the symmetry (or lack thereof) were quantified with an explicit asymmetry metric rather than described qualitatively.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive assessment and for recognizing the benchmark's value as a reproducible testbed. We address the major comment below.

read point-by-point responses

Referee: [§4 (experimental protocol) and appendix] The load-bearing assumption for the central claims is the reduced-order oracle's adequacy as a proxy for lattice design decisions. While the abstract and introduction correctly note that the benchmark is not a certified model, the manuscript would benefit from an explicit sensitivity study (e.g., in §4 or the appendix) showing how the reported gaps between nRMSE and top-k/admissibility/regret metrics respond to plausible variations in the oracle's contact or rate-dependent parameters.

Authors: The central empirical claims are explicitly conditioned on the released reduced-order oracle, candidate generator, and material cards, as stated in the abstract and introduction. The benchmark is presented as a transparent, non-certified proxy to enable reproducible evaluation of decision metrics within these components; the open release is intended to support community extensions such as sensitivity analyses. A full sensitivity study on contact or rate-dependent parameters would require additional experiments and computational effort beyond the manuscript's scope of establishing the benchmark and protocols. We therefore do not view such a study as necessary to support the reported mismatches within the provided testbed. revision: no

Circularity Check

0 steps flagged

No circularity: empirical benchmark with external oracle

full rationale

The paper introduces pinn-gym as an open benchmark consisting of a reduced-order crush-and-impact oracle, five polymer material cards, dimensionless targets, and evaluation protocols for curve fidelity, admissibility, top-k retrieval and mass regret. All reported findings (low nRMSE insufficient for design selection, physics-informed losses altering trade-offs, dimensionless conditioning effects) are direct empirical observations on this externally defined testbed. No derivation, fitted parameter, uniqueness theorem, or ansatz is presented that reduces to the authors' own prior quantities or self-citations. The work is self-contained against the released oracle and cards; the skeptic concern about oracle fidelity is a correctness question, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields limited visibility into internal assumptions; the central claim rests on the validity of the reduced-order oracle as a decision proxy and on the representativeness of the five polymer cards.

axioms (1)

domain assumption The reduced-order crush-and-impact oracle accurately represents the physical behaviors relevant to design decisions
The benchmark treats oracle outputs as ground truth for admissibility and regret calculations.

invented entities (1)

pinn-gym benchmark no independent evidence
purpose: Testbed for decision-aware evaluation of PIML surrogates
Newly defined in the work as the coupling of oracle, material cards and protocol.

pith-pipeline@v0.9.1-grok · 5673 in / 1397 out tokens · 27290 ms · 2026-06-27T22:14:43.808481+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 4 canonical work pages

[1]

& Karniadakis, G

Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378, 686–707 (2019). 2.Karniadakis, G. E.et al.Physics-informed machine learning.Nat. Rev. Phys.3, 422–440 (2021)

2019
[2]

An Expert’s Guide to Training Physics-informed Neural Networks, August 2023

Cuomo, S., Schiano di Cola, V ., Giampaolo, F., Rozza, G., Raissi, M. & Piccialli, F. Scientific machine learning through physics-informed neural networks: Where we are and what’s next.J. Sci. Comput.92, 88 (2022). 4.Wang, S., Sankaran, S., Wang, H. & Perdikaris, P. An expert’s guide to training physics-informed neural networks.arXiv preprintarXiv:2308.08...

work page arXiv 2022
[3]

D., Oommen, V ., Varghese, A

Toscano, J. D., Oommen, V ., Varghese, A. J., Zou, Z., Daryakenari, N. A., Wu, C. & Karniadakis, G. E. From PINNs to PIKANs: recent advances in physics-informed machine learning.arXiv preprintarXiv:2410.13228 (2024). 6.Gibson, L. J. & Ashby, M. F.Cellular Solids: Structure and Properties, 2nd edn. Cambridge University Press (1997). 7.Ashby, M. F. The prop...

work page arXiv 2024
[4]

& van Hecke, M

Bertoldi, K., Vitelli, V ., Christensen, J. & van Hecke, M. Flexible mechanical metamaterials.Nat. Rev. Mater .2, 17066 (2017)

2017
[5]

Liu, R.et al.A review on factors affecting the mechanical properties of additively manufactured lattice structures.J. Mater . Eng. Perform.33, 1–25 (2024)

2024
[6]

X., Chen, C.-T., Richmond, D

Gu, G. X., Chen, C.-T., Richmond, D. J. & Buehler, M. J. Bioinspired hierarchical composite design using machine learning: simulation, additive manufacturing, and experiment.Mater . Horiz.5, 939–945 (2018)

2018
[7]

& Watanabe, I

Zheng, X., Zhang, X., Chen, T.-T. & Watanabe, I. Deep learning in mechanical metamaterials: from prediction and generation to inverse design.Adv. Mater .35, 2302530 (2023)

2023
[8]

& Berto, F

Maurizi, M., Gao, C. & Berto, F. Predicting stress, strain and deformation fields in materials and structures with graph neural networks.Sci. Reports12, 21834 (2022). 11/12

2022
[9]

On physically similar systems; illustrations of the use of dimensional equations.Phys

Buckingham, E. On physically similar systems; illustrations of the use of dimensional equations.Phys. Rev.4, 345–376 (1914). 14.Barenblatt, G. I.Scaling, Self-similarity, and Intermediate Asymptotics. Cambridge University Press (1996)

1914
[10]

& Niepert, M

Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D. & Niepert, M. PDEBench: An extensive benchmark for scientific machine learning. InAdvances in Neural Information Processing Systems35(NeurIPS 2022)

2022
[11]

& Karniadakis, G

Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z. & Karniadakis, G. E. A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data.Comput. Methods Appl. Mech. Eng.393, 114778 (2022)

2022
[12]

Subramanian, S., Harrington, P., Keutzer, K., Bhimji, W., Morozov, D., Mahoney, M. W. & Gholami, A. Towards foundation models for scientific machine learning: characterizing scaling and transfer behavior. InAdvances in Neural Information Processing Systems36(NeurIPS 2023)

2023
[13]

& Perdikaris, P

Wang, S., Yu, X. & Perdikaris, P. When and why PINNs fail to train: a neural tangent kernel perspective.J. Comput. Phys. 449, 110768 (2022)

2022
[14]

& Perdikaris, P

Wang, S., Sankaran, S. & Perdikaris, P. Respecting causality is all you need for training physics-informed neural networks. arXiv preprintarXiv:2203.07404 (2022)

work page arXiv 2022
[15]

S., Gholami, A., Zhe, S., Kirby, R

Krishnapriyan, A. S., Gholami, A., Zhe, S., Kirby, R. & Mahoney, M. W. Characterizing possible failure modes in physics-informed neural networks. InAdvances in Neural Information Processing Systems34(NeurIPS 2021)

2021
[16]

& Karpatne, A

Daw, A., Bu, J., Wang, S., Perdikaris, P. & Karpatne, A. Mitigating propagation failures in physics-informed neural networks using Retain-Resample-Release (R3) sampling. InProceedings of the 40th International Conference on Machine Learning(ICML 2023)

2023
[17]

& Kraus, M

Bischof, R. & Kraus, M. A. Multi-objective loss balancing for physics-informed deep learning.arXiv preprint arXiv:2110.09813 (2021)

work page arXiv 2021
[18]

& Karniadakis, G

Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nat. Mach. Intell.3, 218–229 (2021)

2021
[19]

B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A

Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. & Anandkumar, A. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations(ICLR 2021)

2021
[20]

B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A

Kovachki, N. B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A. & Anandkumar, A. Neural operator: learning maps between function spaces with applications to PDEs.J. Mach. Learn. Res.24, 1–97 (2023)

2023
[21]

F., Meng, X., Zou, Z., Guo, L

Psaros, A. F., Meng, X., Zou, Z., Guo, L. & Karniadakis, G. E. Uncertainty quantification in scientific machine learning: methods, metrics, and comparisons.J. Comput. Phys.477, 111902 (2023)

2023
[22]

Zou, Z., Meng, X., Psaros, A. F. & Karniadakis, G. E. NeuralUQ: a comprehensive library for uncertainty quantification in neural differential equations and operators.SIAM Rev.66, 161–190 (2024). 12/12

2024

[1] [1]

& Karniadakis, G

Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378, 686–707 (2019). 2.Karniadakis, G. E.et al.Physics-informed machine learning.Nat. Rev. Phys.3, 422–440 (2021)

2019

[2] [2]

An Expert’s Guide to Training Physics-informed Neural Networks, August 2023

Cuomo, S., Schiano di Cola, V ., Giampaolo, F., Rozza, G., Raissi, M. & Piccialli, F. Scientific machine learning through physics-informed neural networks: Where we are and what’s next.J. Sci. Comput.92, 88 (2022). 4.Wang, S., Sankaran, S., Wang, H. & Perdikaris, P. An expert’s guide to training physics-informed neural networks.arXiv preprintarXiv:2308.08...

work page arXiv 2022

[3] [3]

D., Oommen, V ., Varghese, A

Toscano, J. D., Oommen, V ., Varghese, A. J., Zou, Z., Daryakenari, N. A., Wu, C. & Karniadakis, G. E. From PINNs to PIKANs: recent advances in physics-informed machine learning.arXiv preprintarXiv:2410.13228 (2024). 6.Gibson, L. J. & Ashby, M. F.Cellular Solids: Structure and Properties, 2nd edn. Cambridge University Press (1997). 7.Ashby, M. F. The prop...

work page arXiv 2024

[4] [4]

& van Hecke, M

Bertoldi, K., Vitelli, V ., Christensen, J. & van Hecke, M. Flexible mechanical metamaterials.Nat. Rev. Mater .2, 17066 (2017)

2017

[5] [5]

Liu, R.et al.A review on factors affecting the mechanical properties of additively manufactured lattice structures.J. Mater . Eng. Perform.33, 1–25 (2024)

2024

[6] [6]

X., Chen, C.-T., Richmond, D

Gu, G. X., Chen, C.-T., Richmond, D. J. & Buehler, M. J. Bioinspired hierarchical composite design using machine learning: simulation, additive manufacturing, and experiment.Mater . Horiz.5, 939–945 (2018)

2018

[7] [7]

& Watanabe, I

Zheng, X., Zhang, X., Chen, T.-T. & Watanabe, I. Deep learning in mechanical metamaterials: from prediction and generation to inverse design.Adv. Mater .35, 2302530 (2023)

2023

[8] [8]

& Berto, F

Maurizi, M., Gao, C. & Berto, F. Predicting stress, strain and deformation fields in materials and structures with graph neural networks.Sci. Reports12, 21834 (2022). 11/12

2022

[9] [9]

On physically similar systems; illustrations of the use of dimensional equations.Phys

Buckingham, E. On physically similar systems; illustrations of the use of dimensional equations.Phys. Rev.4, 345–376 (1914). 14.Barenblatt, G. I.Scaling, Self-similarity, and Intermediate Asymptotics. Cambridge University Press (1996)

1914

[10] [10]

& Niepert, M

Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D. & Niepert, M. PDEBench: An extensive benchmark for scientific machine learning. InAdvances in Neural Information Processing Systems35(NeurIPS 2022)

2022

[11] [11]

& Karniadakis, G

Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z. & Karniadakis, G. E. A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data.Comput. Methods Appl. Mech. Eng.393, 114778 (2022)

2022

[12] [12]

Subramanian, S., Harrington, P., Keutzer, K., Bhimji, W., Morozov, D., Mahoney, M. W. & Gholami, A. Towards foundation models for scientific machine learning: characterizing scaling and transfer behavior. InAdvances in Neural Information Processing Systems36(NeurIPS 2023)

2023

[13] [13]

& Perdikaris, P

Wang, S., Yu, X. & Perdikaris, P. When and why PINNs fail to train: a neural tangent kernel perspective.J. Comput. Phys. 449, 110768 (2022)

2022

[14] [14]

& Perdikaris, P

Wang, S., Sankaran, S. & Perdikaris, P. Respecting causality is all you need for training physics-informed neural networks. arXiv preprintarXiv:2203.07404 (2022)

work page arXiv 2022

[15] [15]

S., Gholami, A., Zhe, S., Kirby, R

Krishnapriyan, A. S., Gholami, A., Zhe, S., Kirby, R. & Mahoney, M. W. Characterizing possible failure modes in physics-informed neural networks. InAdvances in Neural Information Processing Systems34(NeurIPS 2021)

2021

[16] [16]

& Karpatne, A

Daw, A., Bu, J., Wang, S., Perdikaris, P. & Karpatne, A. Mitigating propagation failures in physics-informed neural networks using Retain-Resample-Release (R3) sampling. InProceedings of the 40th International Conference on Machine Learning(ICML 2023)

2023

[17] [17]

& Kraus, M

Bischof, R. & Kraus, M. A. Multi-objective loss balancing for physics-informed deep learning.arXiv preprint arXiv:2110.09813 (2021)

work page arXiv 2021

[18] [18]

& Karniadakis, G

Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nat. Mach. Intell.3, 218–229 (2021)

2021

[19] [19]

B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A

Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. & Anandkumar, A. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations(ICLR 2021)

2021

[20] [20]

B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A

Kovachki, N. B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A. & Anandkumar, A. Neural operator: learning maps between function spaces with applications to PDEs.J. Mach. Learn. Res.24, 1–97 (2023)

2023

[21] [21]

F., Meng, X., Zou, Z., Guo, L

Psaros, A. F., Meng, X., Zou, Z., Guo, L. & Karniadakis, G. E. Uncertainty quantification in scientific machine learning: methods, metrics, and comparisons.J. Comput. Phys.477, 111902 (2023)

2023

[22] [22]

Zou, Z., Meng, X., Psaros, A. F. & Karniadakis, G. E. NeuralUQ: a comprehensive library for uncertainty quantification in neural differential equations and operators.SIAM Rev.66, 161–190 (2024). 12/12

2024