Two-scale neural networks for optimal control of linear convection-dominated equations
Pith reviewed 2026-05-20 01:21 UTC · model grok-4.3
The pith
Two-scale neural networks with separate state and adjoint networks and rescaled features solve optimal control problems for convection-dominated equations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that augmenting spatial inputs with rescaled features and employing separate neural networks for the state and adjoint variables, with centers chosen to align with their respective layer locations, combined with successive training by decreasing the diffusion coefficient, enables effective numerical solution of optimal control problems governed by convection-dominated equations.
What carries the argument
The two-scale neural network architecture that augments the spatial input with rescaled features and uses separate networks for state and adjoint with different center points.
Load-bearing premise
The assumption that suitably chosen center points for the two networks and rescaled features will align with the actual layer locations for both state and adjoint across the range of diffusion coefficients considered.
What would settle it
If numerical tests with very small diffusion coefficients, such as 10 to the power of -8, show that the networks fail to resolve the layers even with adjusted centers, the effectiveness of the two-scale approach would be questioned.
Figures
read the original abstract
We propose a two-scale neural network method for optimal control problems governed by convection-dominated convection-diffusion-reaction equations. Building on two-scale architectures developed for singularly perturbed forward problems, we augment the spatial input with suitably rescaled features that become increasingly important as the diffusion coefficient becomes small. The approach employs separate neural networks for the state and adjoint state variables of the optimality system, reflecting the fact that these quantities develop sharp layers in different parts of the domain due to opposite convection fields. By choosing different center points for the two networks, the architecture naturally aligns with the layer location of each variable. We present two formulations of the method, one based on the first-order optimality conditions and another using penalization of the PDE constraint, and combine them with a successive training strategy that gradually decreases the diffusion coefficient toward its target value. Numerical experiments on benchmark problems illustrate the effectiveness and behavior of the proposed approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-scale neural network method for optimal control problems governed by linear convection-dominated convection-diffusion-reaction equations. Separate networks are used for the state and adjoint, each augmented with rescaled features that gain importance for small diffusion; distinct center points are chosen to align with the opposing layer locations induced by the convection fields. Two formulations are given (first-order optimality conditions and PDE-constraint penalization), both paired with a successive training schedule that lowers the diffusion coefficient toward the target value. Effectiveness is illustrated by numerical experiments on standard benchmark problems.
Significance. If the reported performance is robust, the work supplies a concrete architectural adaptation of two-scale networks to optimality systems, addressing the fact that state and adjoint develop layers in different subdomains. The successive-training strategy and explicit separation of networks constitute a practical response to the known difficulties of standard PINNs with sharp interior layers. The benchmark results, if reproducible, would constitute useful evidence that the method can resolve the coupled system without requiring mesh adaptation.
major comments (2)
- [§3.1] §3.1 (Architecture description): the claim that different center points 'naturally align with the layer location of each variable' is load-bearing for the two-scale advantage, yet the centers appear to be fixed a priori and chosen by hand for the reported benchmarks. Because the true layer positions depend on the unknown optimal control and become increasingly sensitive as the diffusion coefficient decreases, it is unclear whether the same fixed centers remain effective when the control changes or when diffusion is lowered further in the successive-training loop.
- [§4] §4 (Numerical experiments): the reported error tables compare the two-scale method only against a standard single-network PINN on the same benchmark set; no ablation is shown that isolates the contribution of the rescaled features versus the choice of centers. Without such controls it is difficult to confirm that the observed improvement stems from the two-scale construction rather than from the successive-training schedule alone.
minor comments (2)
- [§2.2] Notation for the rescaled feature functions is introduced without an explicit formula in the main text; a compact definition (perhaps as an equation) would improve readability.
- [§3.3] In the penalization formulation, the weighting parameter between the PDE residual and the cost functional is stated to be fixed, but its dependence (or lack thereof) on the diffusion coefficient is not discussed; a brief remark on this choice would clarify robustness.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable suggestions. We address each major comment in detail below and outline the changes we plan to make in the revised manuscript.
read point-by-point responses
-
Referee: [§3.1] §3.1 (Architecture description): the claim that different center points 'naturally align with the layer location of each variable' is load-bearing for the two-scale advantage, yet the centers appear to be fixed a priori and chosen by hand for the reported benchmarks. Because the true layer positions depend on the unknown optimal control and become increasingly sensitive as the diffusion coefficient decreases, it is unclear whether the same fixed centers remain effective when the control changes or when diffusion is lowered further in the successive-training loop.
Authors: The convection field is prescribed and fixed in the problem formulation, and the locations of the sharp layers in the state and adjoint variables are determined by the convection direction and the domain boundaries. The optimal control acts as a forcing term that affects the magnitude of the solution but does not shift the layer positions in this linear setting. Therefore, the a priori choice of distinct center points for the state and adjoint networks, aligned with the opposing convection directions, remains valid independently of the specific control. In the successive training procedure, the centers are held fixed as the diffusion coefficient is decreased, and the numerical experiments confirm that the approximation quality is maintained. We will revise Section 3.1 to include a more detailed explanation of this choice and its independence from the control. revision: partial
-
Referee: [§4] §4 (Numerical experiments): the reported error tables compare the two-scale method only against a standard single-network PINN on the same benchmark set; no ablation is shown that isolates the contribution of the rescaled features versus the choice of centers. Without such controls it is difficult to confirm that the observed improvement stems from the two-scale construction rather than from the successive-training schedule alone.
Authors: We acknowledge that the current experiments do not include an ablation study to separate the effects of the rescaled features and the distinct centers from the successive training. To address this, we will add new numerical results in the revised version of Section 4. These will include comparisons of the proposed method against ablated versions: one without rescaled features and one using identical centers for both networks, all under the same successive training schedule. This will help demonstrate the specific contributions of the two-scale architecture. revision: yes
Circularity Check
Proposed two-scale NN method validated on external benchmarks with no reduction to self-defined inputs
full rationale
The paper introduces a two-scale neural network architecture augmented with rescaled features and separate networks for state and adjoint variables, combined with successive training, to address optimal control problems for convection-dominated equations. It explicitly builds on prior two-scale methods for forward problems and demonstrates effectiveness through numerical experiments on independent benchmark problems. No load-bearing step in the method description or results reduces by construction to a fitted parameter, self-citation chain, or internal definition; the central claims rest on external validation rather than tautological equivalence to the inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The two-scale network takes the augmented vector (x, ε^γ(x−xc), ε^γ) … By choosing different center points for the two networks, the architecture naturally aligns with the layer location of each variable.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
successive training strategy that gradually decreases the diffusion coefficient toward its target value
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
J. H. Adler, C. Cavanaugh, X. Hu, A. Huang, and N. Trask. A s table mimetic finite-difference method for convection- dominated diffusion equations. SIAM Journal on Scientific Computing , 45(6):A2973–A3000, 2023
work page 2023
-
[2]
B. Ayuso and L. D. Marini. Discontinuous Galerkin method s for advection-diffusion-reaction problems. SIAM Journal on Numerical Analysis , 47(2):1391–1420, 2009
work page 2009
-
[3]
J. Barry-Straume, A. Sarshar, A. A. Popov, and A. Sandu. P hysics-informed neural networks for PDE-constrained opti - mization and control. Communications on Applied Mathematics and Computation , pages 1–24, 2025
work page 2025
-
[4]
M. Bergounioux. A penalization method for optimal contr ol of elliptic problems with state constraints. SIAM Journal on Control and Optimization , 30(2):305–323, 1992
work page 1992
-
[5]
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y. Kat ariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Van- derPlas, S. W anderman-Milne, and Q. Zhang. JAX: composable transformations of Python+NumPy programs, 2018
work page 2018
-
[6]
S. C. Brenner, S. Liu, and L.-Y. Sung. A p1 finite element method for a distributed elliptic optimal con trol problem with a general state equation and pointwise state constraints. Computational Methods in Applied Mathematics , 21(4):777–790, 2021
work page 2021
-
[7]
S. C. Brenner, S. Liu, and L.-Y. Sung. Multigrid methods f or an elliptic optimal control problem with pointwise state constraints. Results in Applied Mathematics , 17:100356, 2023. 16 SIJING LIU, MARCUS SARKIS, YI ZHANG, AND ZHONGQIANG ZHANG Figure 11. Comparison between NN predictions and EAFE solutions using (4.4) fo r Ex- ample 5.3 0.0 0.2 0.4 0.6 0.8...
work page 2023
-
[8]
A. N. Brooks and T. J. Hughes. Streamline upwind/petrov- galerkin formulations for convection dominated flows with p ar- ticular emphasis on the incompressible navier-stokes equa tions. Computer Methods in Applied Mechanics and Engineering , 32(1):199–259, 1982
work page 1982
-
[9]
F. Cao, F. Gao, X. Guo, and D. Yuan. Physics-informed neur al networks with parameter asymptotic strategy for learnin g singularly perturbed convection-dominated problem. Computers & Mathematics with Applications , 150:229–242, 2023
work page 2023
-
[10]
F. Cao, F. Gao, D. Yuan, and J. Liu. Multistep asymptotic pre-training strategy based on pinns for solving steep boun dary singular perturbation problems. Computer Methods in Applied Mechanics and Engineering , 431:117222, 2024
work page 2024
-
[11]
Y. Cao, C. C. So, Y. Dai, S. P. Yung, and J.-M. W ang. Advers arial physics-informed neural networks with hard constrai nts for optimal control of PDEs. Journal of Computational Physics , page 114307, 2025
work page 2025
- [12]
-
[13]
G. Chen, W. Hu, J. Shen, J. R. Singler, Y. Zhang, and X. Zhe ng. An HDG method for distributed control of convection diffusion PDEs. Journal of Computational and Applied Mathematics , 343:643–661, 2018
work page 2018
-
[14]
B. Cockburn and C.-W. Shu. The local discontinuous gale rkin method for time-dependent convection-diffusion syste ms. SIAM Journal on Numerical Analysis , 35(6):2440–2463, 1998
work page 1998
-
[15]
Y. Dai, B. Jin, R. C. Sau, and Z. Zhou. Solving elliptic op timal control problems via neural networks and optimality system. Advances in Computational Mathematics , 51(4):31, 2025
work page 2025
-
[16]
J.-L. Dupret and D. Hainaut. Deep learning for high-dim ensional continuous-time stochastic optimal control with out explicit solution. Operations Research, 2026
work page 2026
- [17]
- [18]
-
[19]
P. Houston, C. Schwab, and E. S¨ uli. Discontinuous hp-fi nite element methods for advection-diffusion-reaction pro blems. SIAM Journal on Numerical Analysis , 39(6):2133–2163, 2002. PINNS-OCP 17
work page 2002
-
[20]
T. J. R. Hughes and A. N. Brooks. Multi-dimensional upwi nd scheme with no crosswind diffusion. 1979
work page 1979
- [21]
- [22]
-
[23]
D. P. Kingma and J. Ba. Adam: A method for stochastic opti mization. arXiv:1412.6980, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
P. Knobloch and G. Lube. Local projection stabilizatio n for advection-diffusion-reaction problems: One-level vs . two-level approach. Applied Numerical Mathematics , 59(12):2891–2907, 2009
work page 2009
-
[25]
D. Leykekhman and M. Heinkenschloss. Local error analy sis of discontinuous Galerkin methods for advection-domin ated elliptic linear-quadratic optimal control problems. SIAM Journal on Numerical Analysis , 50(4):2012–2038, 2012
work page 2012
-
[26]
J. L. Lions. Optimal Control of Systems Governed by Partial Differential Equations. Springer, 1971
work page 1971
- [27]
-
[28]
S. Liu, Z. Tan, and Y. Zhang. Discontinuous galerkin met hods for an elliptic optimal control problem with a general s tate equation and pointwise state constraints. Journal of Computational and Applied Mathematics , 437:115494, 2024
work page 2024
- [29]
- [30]
-
[31]
L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. Deepxde: A d eep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021
work page 2021
-
[32]
M. M¨ unzer and C. Bard. A curriculum-training-based st rategy for distributing collocation points during physics -informed neural network training. arXiv:2211.11396, 2022
-
[33]
J. Nitsche. ¨Uber ein variationsprinzip zur l¨ osung von dirichlet-problemen bei verwendung von teilr¨ aumen, die keinen randbe- dingungen unterworfen sind. In Abhandlungen aus dem mathematischen Seminar der Universit ¨ at Hamburg, volume 36, pages 9–15. Springer, 1971
work page 1971
-
[34]
R. D. Nzoyem Ngueguin, D. A. Barton, and T. Deakin. A comp arison of mesh-free differentiable programming and data- driven strategies for optimal control under pde constraint s. In Proceedings of the SC ’23 Workshops of the International Conference on High Performance Computing, Network, Storag e, and Analysis , SC-W’23, page 21–28, New York, NY, USA, 2023...
work page 2023
-
[35]
H.-G. Roos, M. Stynes, and L. Tobiska. Robust numerical methods for singularly perturbed differen tial equations: convection-diffusion-reaction and flow problems . Springer, 2008
work page 2008
-
[36]
F. Tr¨ oltzsch.Optimal Control of Partial Differential Equations: Theory, Methods, and Applications, volume 112. American Mathematical Soc., 2010
work page 2010
- [37]
- [38]
- [39]
- [40]
- [41]
-
[42]
C. W u, M. Zhu, Q. Tan, Y. Kartha, and L. Lu. A comprehensiv e study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering , 403:115671, 2023
work page 2023
- [43]
-
[44]
P. Yin, G. Xiao, K. Tang, and C. Yang. AONN: An adjoint-or iented neural network method for all-at-once solutions of parametric optimal control problems. SIAM Journal on Scientific Computing , 46(1):C127–C153, 2024
work page 2024
-
[45]
Two-scale Neural Networks for Singularly Perturbed Dynamical Systems with Multiple Parameters
Q. Zhuang, T. W ang, R. W anjiku, M. Bani-Yaghoub, and Z. Z hang. Two-scale neural networks for singularly perturbed dynamical systems with multiple parameters. arXiv:2605.02799, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[46]
Q. Zhuang, C. Z. Yao, Z. Zhang, and G. E. Karniadakis. Two -scale neural networks for partial differential equations w ith small parameters. Communications in Computational Physics , 38(3):603–629, 2025. 18 SIJING LIU, MARCUS SARKIS, YI ZHANG, AND ZHONGQIANG ZHANG Sijing Liu, Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institu...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.