pith. sign in

arxiv: 2605.19263 · v1 · pith:5YKGYW7Unew · submitted 2026-05-19 · 💻 cs.LG · cs.NA· math.NA

From Simple to Complex: Curriculum-Guided Physics-Informed Neural Networks via Gaussian Mixture Models

Pith reviewed 2026-05-20 07:20 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA
keywords physics-informed neural networkscurriculum learningGaussian mixture modelspartial differential equationsresidual-based trainingadaptive loss weighting
0
0 comments X

The pith

Gaussian mixture models on PDE residuals enable curriculum learning that cuts physics-informed neural network errors by up to 97.8 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CGMPINN to address poor convergence in standard physics-informed neural networks when solving PDEs with nonlinearity or sharp features. It periodically fits a Gaussian mixture model to the current residual errors across the domain to measure which locations are easier or harder for the network to learn. Training then follows a smooth curriculum that focuses first on low-difficulty regions before shifting to high-difficulty ones, with extra modulation to down-weight uncertain clusters early on. Theoretical results establish that the time-varying loss still converges sublinearly and remains equivalent to the original PDE loss. Experiments on six benchmark problems confirm the method reaches lower L2 and max errors than baselines at comparable training cost.

Core claim

By fitting a Gaussian mixture model to the PDE residual distribution at regular intervals, the method quantifies spatially varying learning difficulty and applies a shared-parameter curriculum schedule that progressively reweights the loss toward harder regions while suppressing unreliable clusters; this produces a time-varying loss whose gradient norm converges sublinearly, remains uniformly equivalent to the standard PDE loss, and yields a generalization bound that explicitly accounts for the induced weighting bias.

What carries the argument

Gaussian mixture model fitted to the PDE residual distribution, which identifies clusters of learning difficulty and supplies weights for the dynamic curriculum schedule.

Load-bearing premise

Fitting a Gaussian mixture model to the current residual distribution reliably identifies spatially varying difficulty levels and the resulting curriculum schedule improves convergence without adding harmful bias or instability.

What would settle it

On any of the six benchmark PDEs, run both CGMPINN and a standard PINN to the same number of epochs and check whether the relative L2 error of CGMPINN is not at least 50 percent lower than the baseline.

Figures

Figures reproduced from arXiv: 2605.19263 by Fujun Cao, Jianan Yang, Junmin Liu, Shuai Li, Xuefei Yan, Yiran Wang.

Figure 1
Figure 1. Figure 1: Architecture of the CGMPINN framework: a neural net￾work approximator, automatic differentiation for PDE-compliant derivatives, the CGM module (GMM fitting, precision modulation, and curriculum scheduling), and an optional self-adaptive loss balancing mechanism. 2.1. Problem Formulation We consider a general initial-boundary value problem on a bounded Lipschitz domain Ω ⊂ R d (d is the spatial dimen￾sion) … view at source ↗
Figure 2
Figure 2. Figure 2: Optimizer comparison for 1D Poisson equation: (a) Training loss curves with different optimizers; (b)–(d) Predictions of CGMPINN with Adam, L-BFGS, and Adam→L-BFGS optimiz￾ers respectively, compared with the exact solution (the pointwise error in log scale is shown as the auxiliary y-axis) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison of different PINN variants for 1D Poisson equation: (a) Training loss curves of different models; (b) Absolute error probability density distribution of CGMPINN; (c) Pointwise error (log scale) of CGMPINN in the spatial do￾main; (d) Spatial absolute error distribution and error range of CGMPINN. Dirichlet boundary conditions: ( uxx(x, y) + uyy(x, y) = f(x, y), (x, y) ∈ Ω, u(x, y) = g… view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of different PINN variants for 2D Poisson equation: (a) Training loss curves of different models; (b) Absolute error probability density distribution of CGMPINN; (c) Mean and max absolute error of CGMPINN in the x spatial direction; (d) Mean and max absolute error of CGMPINN in the y spatial direction. 3.3. Heat Equation We next consider a time-dependent parabolic problem: the 1D hea… view at source ↗
Figure 7
Figure 7. Figure 7: Performance comparison of different PINN variants for 1D heat equation: (a) Training loss curves of different models; (b) Absolute error probability density distribution of CGMPINN; (c) Temporal evolution of the mean and max absolute error of CGMPINN; (d) Mean and max absolute error of CGMPINN in the x spatial direction. 3.4. Damped Wave Equation We proceed to a second-order hyperbolic problem: the 1D damp… view at source ↗
Figure 8
Figure 8. Figure 8: Optimizer comparison for 1D damped wave equation: (a) Training loss curves of different optimizers; (b) Prediction result of CGMPINN with Adam→L-BFGS optimizer; (c) Exact solution of the equation; (d) Absolute error distribution of the prediction [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Performance comparison of different PINN variants for 1D damped wave equation: (a) Training loss curves of different models; (b) Spatial profiles of CGMPINN prediction at different time steps; (c) Temporal evolution of CGMPINN prediction at x = 0.5 compared with the analytical solution; (d) Absolute and relative error of CGMPINN at x = 0.5. 3.5. Advection-Diffusion Equation We consider a 1D advection-diffu… view at source ↗
Figure 10
Figure 10. Figure 10: Optimizer comparison for 1D advection-diffusion equa￾tion: (a) Training loss curves of different optimizers; (b) Prediction result of CGMPINN with Adam→L-BFGS optimizer; (c) Exact solution of the equation; (d) Absolute error distribution of the prediction [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Term balance, solution and residual visualization for 1D advection-diffusion equation: (a)(c)(e) Balance of temporal derivative, advection and diffusion terms at t = 0.25, t = 0.5 and t = 0.75; (b)(d)(f) Predicted solution and PDE residual distribu￾tion at the corresponding time steps. is    ut = D uxx + r u(1 − u), (x, t) ∈ Ω × [0, 2], u(x, 0) = u0(x), x ∈ Ω, u(x, t) = g(x, t), (x, t) ∈ ∂Ω × [0, 2],… view at source ↗
Figure 14
Figure 14. Figure 14: Performance comparison of different PINN variants for 1D Fisher-KPP equation: (a) Training loss curves of different models; (b) Wavefront tracking results of different PINN models; (c) Wave speed estimation results of different PINN models; (d) Temporal evolution of wave speed relative error for different mod￾els. dynamics during front propagation. The PDE residual re￾mains at the O(10−3 )–O(10−4 ) level … view at source ↗
Figure 13
Figure 13. Figure 13: Optimizer comparison for 1D Fisher-KPP equation: (a) Training loss curves of different optimizers; (b) Prediction result of CGMPINN with Adam→L-BFGS optimizer; (c) Exact solution of the equation; (d) Absolute error distribution of the prediction [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
read the original abstract

Physics-informed neural networks (PINNs) offer a mesh-free framework for solving partial differential equations (PDEs), yet training often suffers from gradient pathologies, spectral bias, and poor convergence, especially for problems with strong nonlinearity, sharp gradients, or multiscale features. We propose the Curriculum-Guided Gaussian Mixture Physics-Informed Neural Network (CGMPINN), which integrates Gaussian mixture modeling with dynamic curriculum learning. Specifically, a GMM is periodically fitted to the PDE residual distribution to quantify spatially varying learning difficulty. A smooth curriculum schedule progressively shifts training focus from easy to harder regions, while precision-based variance modulation suppresses unreliable clusters during early optimization. This dual curriculum is governed by a shared curriculum parameter and can be combined with self-adaptive loss balancing. We further establish theoretical guarantees, including sublinear convergence of the gradient norm for the induced time-varying loss, uniform equivalence between the curriculum-weighted and standard PDE losses, and a generalization bound with an explicit weighting-induced bias characterization. Experiments on six benchmark PDEs spanning elliptic, parabolic, hyperbolic, advection-dominated, and nonlinear reaction-diffusion types show that CGMPINN consistently achieves the lowest relative $L_2$ and maximum absolute errors among all compared methods, reducing relative $L_2$ error by up to 97.8\% over the standard PINN at comparable cost. Our code is publicly available at https://github.com/Mathematics-Yang/CGMPINN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Curriculum-Guided Gaussian Mixture Physics-Informed Neural Networks (CGMPINN) that periodically fit a Gaussian Mixture Model to the PDE residual distribution to quantify spatially varying learning difficulty, then apply a smooth curriculum schedule (governed by a shared curriculum parameter) to progressively emphasize harder regions while using precision-based variance modulation to suppress unreliable clusters. The method can be combined with self-adaptive loss balancing. Theoretical claims include sublinear convergence of the gradient norm for the induced time-varying loss, uniform equivalence between the curriculum-weighted and standard PDE losses, and a generalization bound with explicit weighting-induced bias characterization. Experiments across six benchmark PDEs (elliptic, parabolic, hyperbolic, advection-dominated, and nonlinear reaction-diffusion) report that CGMPINN achieves the lowest relative L2 and maximum absolute errors, with reductions up to 97.8% versus standard PINNs at comparable cost. Public code is provided.

Significance. If the core premise holds—that GMM clustering on residuals produces clusters whose ordering by precision or variance corresponds to genuine optimization difficulty without harmful bias or instability—the approach could meaningfully advance PINN training for problems with sharp gradients or multiscale features. The combination of dynamic curriculum, theoretical guarantees (sublinear convergence, uniform equivalence, generalization bound), and open-source implementation would be a positive contribution to the field.

major comments (2)
  1. [Abstract (paragraph on GMM fitting and curriculum schedule)] The central empirical claim (lowest errors on six PDEs, up to 97.8% L2 reduction) rests on the premise that periodically refitting a GMM to the current residual field produces clusters whose ordering corresponds to genuine spatially varying optimization difficulty. The abstract states that precision-based variance modulation suppresses unreliable clusters, yet provides no derivation showing that the GMM parameters remain stable across fitting intervals or that the shared curriculum parameter avoids over-weighting noisy early residuals. If the residual landscape is dominated by initialization artifacts rather than PDE features, the curriculum could systematically delay learning in critical regions.
  2. [Abstract (description of dual curriculum)] The curriculum parameter is shared and governs both GMM weighting and loss balancing. If the schedule is tuned to the same residuals used for evaluation, the reported gains could partly reflect fitting rather than independent prediction. The manuscript should clarify whether the curriculum schedule is determined independently of the evaluation residuals or provide an ablation isolating this effect.
minor comments (2)
  1. Clarify the exact loss formulations and how the time-varying curriculum weights are incorporated into the overall objective; consistent notation across equations would improve readability.
  2. The generalization bound includes an explicit weighting-induced bias characterization; a brief discussion of how this bias scales with the number of GMM components or fitting frequency would strengthen the theoretical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript proposing CGMPINN. We address each of the major comments below, providing clarifications and indicating planned revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract (paragraph on GMM fitting and curriculum schedule)] The central empirical claim (lowest errors on six PDEs, up to 97.8% L2 reduction) rests on the premise that periodically refitting a GMM to the current residual field produces clusters whose ordering corresponds to genuine spatially varying optimization difficulty. The abstract states that precision-based variance modulation suppresses unreliable clusters, yet provides no derivation showing that the GMM parameters remain stable across fitting intervals or that the shared curriculum parameter avoids over-weighting noisy early residuals. If the residual landscape is dominated by initialization artifacts rather than PDE features, the curriculum could systematically delay learning in critical regions.

    Authors: We appreciate the referee's point regarding the potential influence of initialization artifacts on the residual landscape and the need for stability in GMM fitting. In the full manuscript, the GMM is refitted at regular intervals to the current PDE residual distribution, allowing the clusters to evolve with the optimization process rather than being fixed from the initial noisy residuals. The precision-based variance modulation explicitly downweights clusters with high variance (indicating unreliability), which mitigates the impact of early-stage noise. Regarding the shared curriculum parameter, it is designed to provide a unified progression from easy to hard regions across both weighting and balancing components. While we do not provide a formal derivation of GMM parameter stability in the current version, empirical results across multiple PDEs demonstrate consistent improvements, suggesting that the adaptive fitting captures genuine difficulty variations. We will revise the abstract and add a section discussing the evolution of residuals and GMM stability to address this concern. revision: partial

  2. Referee: [Abstract (description of dual curriculum)] The curriculum parameter is shared and governs both GMM weighting and loss balancing. If the schedule is tuned to the same residuals used for evaluation, the reported gains could partly reflect fitting rather than independent prediction. The manuscript should clarify whether the curriculum schedule is determined independently of the evaluation residuals or provide an ablation isolating this effect.

    Authors: The curriculum schedule is governed by a shared parameter that evolves according to a smooth, predefined progression (e.g., increasing emphasis on harder clusters over training epochs), independent of the specific evaluation residuals used for final error reporting. The GMM fitting occurs during training on the training residuals, while evaluation is performed post-training on held-out or full-domain points. To further isolate the effect and rule out any overfitting to evaluation data, we will include an additional ablation study in the revised manuscript comparing the dynamic curriculum against a static or independently scheduled curriculum. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces CGMPINN by periodically fitting a GMM to the PDE residual field and using the resulting clusters to drive a curriculum schedule governed by a shared parameter. The claimed theoretical results (sublinear gradient-norm convergence for the time-varying loss, uniform equivalence to the standard PINN loss, and a generalization bound with explicit bias term) are presented as derived consequences of the weighted loss formulation and standard optimization analysis. No load-bearing step reduces by construction to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled through prior work. The empirical error reductions on the six benchmark PDEs are reported as independent experimental outcomes rather than tautological consequences of the fitting procedure itself. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that residual-based GMM clustering provides a faithful difficulty measure and that the induced curriculum does not alter the underlying PDE solution set. No new physical entities are introduced. One free parameter (shared curriculum parameter) controls the schedule.

free parameters (1)
  • shared curriculum parameter
    Controls the progressive shift from easy to hard regions and the precision-based variance modulation; its value is not derived from first principles.
axioms (1)
  • standard math Neural networks can approximate solutions to the target PDEs under standard regularity assumptions.
    Implicit background assumption for all PINN methods; invoked when claiming the curriculum-weighted loss remains equivalent to the standard PDE loss.

pith-pipeline@v0.9.0 · 5806 in / 1430 out tokens · 33370 ms · 2026-05-20T07:20:16.770729+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    Maziar Raissi, Paris Perdikaris, and George Em- manouil Karniadakis. Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving nonlinear partial dif- ferential equations.Journal of Computational Physics, 378:686–707, 2019

  2. [2]

    Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

    George Emmanouil Karniadakis, Ioannis George Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

  3. [3]

    McGraw-Hill, London; New York, 3rd edi- tion, 1977

    Olgierd Cecil Zienkiewicz.The Finite Element Method. McGraw-Hill, London; New York, 3rd edi- tion, 1977

  4. [4]

    Cambridge Monographs on Applied and Computational Mathematics

    Bengt Fornberg.A Practical Guide to Pseudospectral Methods. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 1996

  5. [5]

    Jiequn Han, Arnulf Jentzen, and Weinan E. Solv- ing high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences of the United States of America, 115(34):8505–8510, 2018

  6. [6]

    DGM: A deep learning algorithm for solving partial differ- ential equations.Journal of Computational Physics, 375:1339–1364, 2018

    Justin Sirignano and Konstantinos Spiliopoulos. DGM: A deep learning algorithm for solving partial differ- ential equations.Journal of Computational Physics, 375:1339–1364, 2018

  7. [7]

    Artificial neural networks for solving ordinary and partial differential equations

    Isaac Elias Lagaris, Aristidis Likas, and Dim- itrios Ioannis Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987– 1000, 1998

  8. [8]

    M. W. M. Gamini Dissanayake and Nhan Phan-Thien. Neural-network-based approximations for solving par- tial differential equations.Communications in Numer- ical Methods in Engineering, 10(3):195–201, 1994

  9. [9]

    Tengmao Yang, Zhihao Qian, Nianzhi Hang, and Moubin Liu. S-PINN: Stabilized physics-informed neural networks for alleviating barriers between multi- level co-optimization.Computer Methods in Applied Mechanics and Engineering, 447:118348, 2025

  10. [10]

    Zhaoyang Zhang and Qingwang Wang. allaPINNs: A physics-informed neural network with improvement of information representation and loss optimization for solving partial differential equations.Acta Physica Sinica, 74(18):188701, 2025

  11. [11]

    Hidden fluid mechanics: Learn- ing velocity and pressure fields from flow visualiza- tions.Science, 367(6481):1026–1030, 2020

    Maziar Raissi, Alireza Yazdani, and George Em- manouil Karniadakis. Hidden fluid mechanics: Learn- ing velocity and pressure fields from flow visualiza- tions.Science, 367(6481):1026–1030, 2020

  12. [12]

    Physics-informed neural networks for cardiac activation mapping.Fron- tiers in Physics, 8:42, 2020

    Francisco Sahli Costabal, Yibo Yang, Paris Perdikaris, Daniel Hurtado, and Ellen Kuhl. Physics-informed neural networks for cardiac activation mapping.Fron- tiers in Physics, 8:42, 2020

  13. [13]

    Kazuya Ishitsuka, Keiichi Ishizu, Norihiro Watan- abe, Yusuke Yamaya, Anna Suzuki, Toshiyuki Bandai, Yusuke Ohta, Toru Mogi, Hiroshi Asanuma, Takuya Kajiwara, and Takeshi Sugimoto. Reliable and practical inverse modeling of natural-state geother- mal systems using physics-informed neural networks: Three-dimensional model construction and assimila- tion wi...

  14. [14]

    Physics-informed neural net- works for inverse problems in nano-optics and meta- materials.Optics Express, 28(8):11618, 2020

    Yuyao Chen, Lu Lu, George Emmanouil Karniadakis, and Luca Dal Negro. Physics-informed neural net- works for inverse problems in nano-optics and meta- materials.Optics Express, 28(8):11618, 2020

  15. [15]

    Scientific machine learning through physics-informed neural networks: Where we are and what’s next.Journal of Scientific Computing, 92(3):88, 2022

    Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi, and Francesco Piccialli. Scientific machine learning through physics-informed neural networks: Where we are and what’s next.Journal of Scientific Computing, 92(3):88, 2022

  16. [16]

    Under- standing and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

    Sifan Wang, Yujun Teng, and Paris Perdikaris. Under- standing and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

  17. [17]

    When and why PINNs fail to train: A neural tangent ker- nel perspective.Journal of Computational Physics, 449:110768, 2022

    Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why PINNs fail to train: A neural tangent ker- nel perspective.Journal of Computational Physics, 449:110768, 2022

  18. [18]

    Characterizing possible failure modes in physics- informed neural networks

    Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Martin Kirby, and Michael Warren Mahoney. Characterizing possible failure modes in physics- informed neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 26548–26560, 2021

  19. [19]

    Limitations of physics informed machine learning for nonlinear two-phase transport in porous media.Journal of Machine Learn- ing for Modeling and Computing, 1(1):19–37, 2020

    Olga Fuks and Hamdi Tchelepi. Limitations of physics informed machine learning for nonlinear two-phase transport in porous media.Journal of Machine Learn- ing for Modeling and Computing, 1(1):19–37, 2020

  20. [20]

    Hamprecht, Yoshua Bengio, and Aaron Courville

    Nasim Rahaman, Aristide Baratin, Devansh Arpit, Fe- lix Dräxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InProceedings of the 36th Interna- tional Conference on Machine Learning, ICML ’19, pages 5301–5310, 2019

  21. [21]

    Frequency principle: Fourier analysis sheds light on deep neural networks.Commu- nications in Computational Physics, 28(5):1746–1767, 2020

    Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks.Commu- nications in Computational Physics, 28(5):1746–1767, 2020

  22. [22]

    Mitigating propagation failures in physics-informed neural networks using retain- resample-release (R3) sampling

    Arka Daw, Jie Bu, Sifan Wang, Paris Perdikaris, and Anuj Karpatne. Mitigating propagation failures in physics-informed neural networks using retain- resample-release (R3) sampling. InProceedings of the 40th International Conference on Machine Learn- ing, ICML ’23, pages 7264–7302, 2023

  23. [23]

    Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics- informed neural networks.Computer Methods in Ap- plied Mechanics and Engineering, 403:115671, 2023

  24. [24]

    An adaptive weight physics-informed neural network for vortex-induced vibration problems.Buildings, 15(9):1533, 2025

    Ping Zhu, Zhonglin Liu, Ziqing Xu, and Junxue Lv. An adaptive weight physics-informed neural network for vortex-induced vibration problems.Buildings, 15(9):1533, 2025

  25. [25]

    Self- adaptive loss balanced physics-informed neural net- works.Neurocomputing, 496:11–34, 2022

    Zixue Xiang, Wei Peng, Xu Liu, and Wen Yao. Self- adaptive loss balanced physics-informed neural net- works.Neurocomputing, 496:11–34, 2022

  26. [26]

    GradNorm: Gradient normal- ization for adaptive loss balancing in deep multitask networks

    Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normal- ization for adaptive loss balancing in deep multitask networks. InProceedings of the 35th International Conference on Machine Learning, ICML ’18, pages 794–803, 2018

  27. [27]

    Gradient-enhanced physics- informed neural networks for forward and inverse PDE problems.Computer Methods in Applied Mechanics and Engineering, 393:114823, 2022

    Jeremy Yu, Lu Lu, Xuhui Meng, and George Em- manouil Karniadakis. Gradient-enhanced physics- informed neural networks for forward and inverse PDE problems.Computer Methods in Applied Mechanics and Engineering, 393:114823, 2022

  28. [28]

    LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

    Ze Tao, Hanxuan Wang, and Fujun Liu. LNN- PINN: A unified physics-only training framework with liquid residual blocks, 2025. arXiv preprint arXiv:2508.08935

  29. [29]

    A stacked adaptive residual PINN (STAR- PINN) approach to 2D time-domain magnetic diffu- sion in nonlinear materials.IEEE Access, 13:141380– 141394, 2025

    Shayan Dodge, Sami Barmada, and Alessandro Formisano. A stacked adaptive residual PINN (STAR- PINN) approach to 2D time-domain magnetic diffu- sion in nonlinear materials.IEEE Access, 13:141380– 141394, 2025

  30. [30]

    Efficient training of physics- informed neural networks via importance sampling

    Mohammad Amin Nabian, Rini Jasmine Gladstone, and Hadi Meidani. Efficient training of physics- informed neural networks via importance sampling. Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021

  31. [31]

    Annealed adap- tive importance sampling method in PINNs for solving high dimensional partial differential equations.Jour- nal of Computational Physics, 521:113561, 2025

    Zhengqi Zhang, Jing Li, and Bin Liu. Annealed adap- tive importance sampling method in PINNs for solving high dimensional partial differential equations.Jour- nal of Computational Physics, 521:113561, 2025

  32. [32]

    A Gaussian mixture distribution- based adaptive sampling method for physics-informed neural networks.Engineering Applications of Artificial Intelligence, 135:108770, 2024

    Yuling Jiao, Di Li, Xiliang Lu, Jerry Zhijian Yang, and Cheng Yuan. A Gaussian mixture distribution- based adaptive sampling method for physics-informed neural networks.Engineering Applications of Artificial Intelligence, 135:108770, 2024

  33. [33]

    Parallel physics-informed neural networks via domain decomposition.Journal of Com- putational Physics, 447:110683, 2021

    Khemraj Shukla, Ameya Dilip Jagtap, and George Em- manouil Karniadakis. Parallel physics-informed neural networks via domain decomposition.Journal of Com- putational Physics, 447:110683, 2021. 15 From Simple to Complex: Curriculum-Guided Physics-Informed Neural Networks via Gaussian Mixture Models

  34. [34]

    Ameya Dilip Jagtap and George Emmanouil Karni- adakis. Extended physics-informed neural networks (XPINNs): A generalized space-time domain decom- position based deep learning framework for nonlinear partial differential equations.Communications in Com- putational Physics, 28(5):1605–1641, 2020

  35. [35]

    Respecting causality for training physics-informed neural networks.Computer Methods in Applied Me- chanics and Engineering, 421:116813, 2024

    Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks.Computer Methods in Applied Me- chanics and Engineering, 421:116813, 2024

  36. [36]

    Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems.Journal of Computa- tional Physics, 397:108850, 2019

    Dongkun Zhang, Lu Lu, Ling Guo, and George Em- manouil Karniadakis. Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems.Journal of Computa- tional Physics, 397:108850, 2019

  37. [37]

    Adversarial uncer- tainty quantification in physics-informed neural net- works.Journal of Computational Physics, 394:136– 152, 2019

    Yibo Yang and Paris Perdikaris. Adversarial uncer- tainty quantification in physics-informed neural net- works.Journal of Computational Physics, 394:136– 152, 2019

  38. [38]

    Curriculum learning

    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceed- ings of the 26th International Conference on Machine Learning, ICML ’09, pages 41–48, 2009

  39. [39]

    Training physics-informed neural networks: One learning to rule them all?Results in Engineering, 18:101023, 2023

    Simone Monaco and Daniele Apiletti. Training physics-informed neural networks: One learning to rule them all?Results in Engineering, 18:101023, 2023

  40. [40]

    Dynamic curricu- lum regularization for enhanced training of physics- informed neural networks

    Callum Duffy and Gergana Velikova. Dynamic curricu- lum regularization for enhanced training of physics- informed neural networks. InNeurIPS 2024 Work- shop on Machine Learning and the Physical Sciences (ML4PS), 2024

  41. [41]

    Curriculum-enhanced adaptive sampling for physics-informed neural net- works: A robust framework for stiff PDEs.Mathemat- ics, 13(24):3996, 2025

    Hasan Cetinkaya, Fahrettin Ay, Mehmet Tunçel, Hazem Nounou, Mohamed Numan Nounou, Hasan Kurban, and Erchin Serpedin. Curriculum-enhanced adaptive sampling for physics-informed neural net- works: A robust framework for stiff PDEs.Mathemat- ics, 13(24):3996, 2025

  42. [42]

    Adaptive task decomposition physics- informed neural networks.Computer Methods in Ap- plied Mechanics and Engineering, 418:116561, 2024

    Jianchuan Yang, Xuanqi Liu, Yu Diao, Xi Chen, and Haikuo Hu. Adaptive task decomposition physics- informed neural networks.Computer Methods in Ap- plied Mechanics and Engineering, 418:116561, 2024

  43. [43]

    Automatic differentiation in machine learning: A survey.Journal of Machine Learning Research, 18(153):1–43, 2018

    Atılım Güne¸ s Baydin, Barak Avrum Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: A survey.Journal of Machine Learning Research, 18(153):1–43, 2018

  44. [44]

    A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics.Computer Methods in Applied Mechanics and Engineering, 379:113741, 2021

    Ehsan Haghighat, Maziar Raissi, Adrian Moure, Hec- tor Gomez, and Ruben Juanes. A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics.Computer Methods in Applied Mechanics and Engineering, 379:113741, 2021. 16 From Simple to Complex: Curriculum-Guided Physics-Informed Neural Networks via Gaussian Mixture Models...