Optimization of randomized neural networks for transfer operator approximation
Pith reviewed 2026-05-25 05:07 UTC · model grok-4.3
The pith
Optimizing the activation function alone in randomized neural networks produces better dictionaries for transfer operator approximation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By optimizing the activation function of a randomized neural network while keeping its randomly initialized weights and biases fixed, a more suitable dictionary can be obtained for the data-driven approximation of transfer operators associated with complex dynamical systems.
What carries the argument
An algorithm that optimizes the activation function in RaNNDy while keeping randomly initialized weights and biases fixed.
If this is right
- The closed-form training of the output layer and low overall training cost of RaNNDy are preserved.
- Improved dictionaries become available for approximating transfer operators of stochastic differential equations.
- The same fixed-weight network can be adapted to random walks on graphons without retraining hidden-layer parameters.
- Dictionary quality improves without the computational expense of fully optimizing all network weights.
Where Pith is reading between the lines
- The same activation-tuning step could be inserted into other randomized architectures that rely on fixed hidden layers.
- Sensitivity of approximation quality to the initial random draw of weights may decrease once the activation is free to adjust.
- A two-stage procedure emerges in which randomization sets the scale and activation tuning refines the shape of the basis.
Load-bearing premise
Adjusting only the activation function is enough to overcome the restriction that fixed random weights and biases place on the basis functions.
What would settle it
A benchmark dynamical system on which the optimized activation function produces no reduction in approximation error compared with standard choices would falsify the central efficacy claim.
Figures
read the original abstract
RaNNDy is a randomized neural network architecture for the data-driven approximation of transfer operators associated with complex dynamical systems. The weights and biases of the hidden layers of the network are randomly initialized and kept fixed, only the output layer is trained. This has several advantages over fully optimized neural networks, notably a closed-form solution for the output layer and significantly lower training costs. Despite these advantages, RaNNDy is restricted to the initial selection of weights and biases that parametrize the basis functions required for the operator approximation. Since the basis functions are determined by the activation function, choosing an appropriate activation function for the hidden layers is crucial. In this work, we propose an algorithm that optimizes the activation function itself, while keeping the weights and biases in the randomized neural network fixed, providing a more suitable dictionary. We illustrate the efficacy of the approach using various benchmark problems, including stochastic differential equations and random walks on graphons.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RaNNDy, a randomized neural network architecture for data-driven approximation of transfer operators in dynamical systems. Weights and biases in hidden layers are randomly initialized and fixed, with only the output layer trained; the new contribution is an algorithm that optimizes the activation function itself to produce a more suitable dictionary of basis functions while leaving the random parameters unchanged. Efficacy is illustrated on benchmarks including stochastic differential equations and random walks on graphons.
Significance. If the optimization of the activation function demonstrably improves the dictionary beyond what fixed random features allow, the method would offer a low-cost way to adapt randomized networks for operator approximation without full retraining or additional random features. This could be useful for high-dimensional or graph-based dynamical systems where standard random dictionaries are insufficient.
major comments (2)
- [Abstract, §3] Abstract and the description of the proposed algorithm: the claim that optimizing the activation function overcomes the restriction imposed by the initial random weights and biases is not supported. Basis functions remain of the form σ(w_i · x + b_i) with fixed random w_i, b_i; varying σ only warps the nonlinearity along those fixed projections and cannot recover directions missed by the random feature map. The manuscript must supply either a theoretical argument showing how the optimized σ expands the spanned space or a concrete numerical test (e.g., a low-dimensional example where the initial random dictionary fails but the optimized-σ version succeeds).
- [§4] §4 (numerical experiments): the reported improvements on SDE and graphon benchmarks lack quantitative comparison to the baseline RaNNDy with standard activations, error bounds, or ablation on the number of random features. Without these, it is impossible to determine whether the activation optimization compensates for poor random projections or merely refines an already adequate dictionary.
minor comments (2)
- [§3] Notation for the optimized activation function should be introduced explicitly with an equation rather than described only in prose.
- [Abstract, §4] The abstract states efficacy on benchmarks but the main text should include a table summarizing quantitative metrics (e.g., approximation error, runtime) across methods.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, clarifying the scope of our contribution and committing to revisions where the manuscript requires strengthening.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and the description of the proposed algorithm: the claim that optimizing the activation function overcomes the restriction imposed by the initial random weights and biases is not supported. Basis functions remain of the form σ(w_i · x + b_i) with fixed random w_i, b_i; varying σ only warps the nonlinearity along those fixed projections and cannot recover directions missed by the random feature map. The manuscript must supply either a theoretical argument showing how the optimized σ expands the spanned space or a concrete numerical test (e.g., a low-dimensional example where the initial random dictionary fails but the optimized-σ version succeeds).
Authors: We agree that the current wording in the abstract and §3 can be read as implying that activation optimization expands the linear span beyond the random projections, which is not the case. The method improves the dictionary by selecting a nonlinearity that better matches the target operator along the fixed random directions w_i, b_i; it does not recover missed directions. We will revise the abstract and §3 to state explicitly that the optimization mitigates the restriction on the choice of activation function for a given random feature map, without claiming to overcome limitations of the random weights themselves. We will also add a low-dimensional numerical example (e.g., a 2-D linear SDE) that isolates the effect of σ optimization within a deliberately poor random dictionary and quantifies the resulting improvement in operator approximation error. revision: yes
-
Referee: [§4] §4 (numerical experiments): the reported improvements on SDE and graphon benchmarks lack quantitative comparison to the baseline RaNNDy with standard activations, error bounds, or ablation on the number of random features. Without these, it is impossible to determine whether the activation optimization compensates for poor random projections or merely refines an already adequate dictionary.
Authors: We acknowledge that the current §4 presents results primarily for the optimized activation without systematic side-by-side metrics against fixed standard activations (e.g., ReLU, tanh), without reported error bounds, and without ablation on the number of random features N. We will revise §4 to include: (i) direct quantitative comparisons (L2 operator error, eigenvalue errors) between optimized-σ RaNNDy and standard-activation RaNNDy on the same random seeds; (ii) error bounds or confidence intervals derived from multiple random initializations; and (iii) ablation plots showing approximation error versus N for both optimized and baseline activations on the SDE and graphon examples. These additions will clarify whether the gains arise from compensating for suboptimal projections or from refining an already sufficient dictionary. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents RaNNDy as a randomized NN architecture with fixed random weights/biases and an algorithm to optimize the activation function for a better dictionary in transfer operator approximation. No equations, self-citations, or claims in the provided text reduce any result to fitted parameters, self-definitions, or prior author work by construction. The method is described as an independent algorithmic proposal without load-bearing reductions to inputs. This is the expected self-contained case.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Optimizing the activation function overcomes the restriction imposed by fixed random weights and biases in the basis for transfer operator approximation
Reference graph
Works this paper leans on
-
[1]
S. Klus and N. D. Conrad. Dynamical systems and complex networks: A Koopman operator perspective.Journal of Physics: Complexity, 5(4):041001, 2024.doi:10. 1088/2632-072X/ad9e60
work page 2024
-
[2]
M. O. Williams, I. G. Kevrekidis, and C. W. Rowley. A data-driven approxima- tion of the Koopman operator: Extending dynamic mode decomposition.Journal of Nonlinear Science, 25:1307–1346, 2015.doi:10.1007/s00332-015-9258-5. 12 (a) 1 2 3 4 5 6 7 8 9 10 i 0.2 0.4 0.6 0.8 1.0 λi Initial eigenvalues Optimized eigenvalues (b) 0 5000 10000 15000 t −0.4 0.0 0.4 ...
-
[3]
S. Klus, P. Koltai, and C. Sch¨ utte. On the numerical approximation of the Perron– Frobenius and Koopman operator.Journal of Computational Dynamics, 3(1):51–79, 2016.doi:10.3934/jcd.2016003
-
[4]
F. No´ e and F. N¨ uske. A variational approach to modeling slow processes in stochastic dynamical systems.Multiscale Modeling & Simulation, 11(2):635–655, 2013.doi: 10.1137/110858616
-
[5]
F. N¨ uske, B. G. Keller, G. P´ erez-Hern´ andez, A. S. J. S. Mey, and F. No´ e. Varia- tional approach to molecular kinetics.Journal of chemical theory and computation, 10(4):1739–1752, 2014.doi:10.1021/ct4009156
-
[6]
Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis. Extended dynamic mode de- composition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator.Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(10), 2017.doi:10.1063/1.4993854
-
[7]
E. Yeung, S. Kundu, and N. Hodas. Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In2019 American Control Conference (ACC), pages 4832–4839, 2019.doi:10.23919/ACC.2019.8815339
-
[8]
A. Mardt, L. Pasquali, H. Wu, and F. No´ e. VAMPnets for deep learning of molecular kinetics.Nature communications, 9(1):5, 2018.doi:10.1038/s41467-017-02388-1
-
[9]
M. Gori, A. Tesi, et al. On the problem of local minima in backpropagation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(1):76–86, 1992.doi: 10.1109/34.107014
-
[10]
H. A. B. Te Braake and G. Van Straten. Random activation weight neural net (RAWN) for fast non-iterative training.Engineering Applications of Artificial Intel- ligence, 8(1):71–80, 1995.doi:10.1016/0952-1976(94)00056-S
-
[11]
L. Zhang and P. N. Suganthan. A survey of randomized algorithms for training neural networks.Information Sciences, 364:146–155, 2016.doi:10.1016/j.ins.2016.01. 039
-
[12]
A.K. Malik, R. Gao, M. A. Ganaie, M. Tanveer, and P.N. Suganthan. Random vector functional link network: recent developments, applications, and future directions. Applied Soft Computing, 143:110377, 2023.doi:10.1016/j.asoc.2023.110377
- [13]
-
[14]
I. Mezi´ c. Spectral properties of dynamical systems, model reduction and decomposi- tions.Nonlinear Dynamics, 41:309–325, 2005.doi:10.1007/s11071-005-2824-x
-
[15]
C. Sch¨ utte and M. Sarich.Metastability and Markov state models in molecular dy- namics, volume 24. American Mathematical Soc., 2013. URL:https://bookstore. ams.org/cln-24
work page 2013
-
[16]
G. Froyland. An analytic framework for identifying finite-time coherent sets in time- dependent dynamical systems.Physica D: Nonlinear Phenomena, 250:1–19, 2013. doi:10.1016/j.physd.2013.01.013. 14
-
[17]
R. Banisch and P. Koltai. Understanding the geometry of transport: Diffusion maps for Lagrangian trajectory data unravel coherent sets.Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(3), 2017.doi:10.1063/1.4971788
-
[18]
P. Koltai, H. Wu, F. No´ e, and C. Sch¨ utte. Optimal data-driven estimation of gen- eralized Markov state models for non-equilibrium dynamics.Computation, 6(1):22, 2018.doi:10.3390/computation6010022
-
[19]
P. I. Frazier. A tutorial on bayesian optimization, 2018. URL:https://arxiv.org/ abs/1807.02811,arXiv:1807.02811
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [20]
-
[21]
Learning graphons from data: Random walks, transfer operators, and spectral clustering
S. Klus and J. J. Bramburger. Learning graphons from data: Random walks, transfer operators, and spectral clustering.arXiv preprint arXiv:2507.18147, 2025.doi: 10.48550/arXiv.2507.18147
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.18147 2025
-
[22]
I. I. Rypina, M. G. Brown, F. J. Beron-Vera, H. Ko¸ cak, M. J. Olascoaga, and I. A. Udovydchenkov. On the Lagrangian dynamics of atmospheric zonal jets and the permeability of the stratospheric polar vortex.Journal of the Atmospheric Sciences, 64(10):3595–3610, 2007.doi:10.1175/JAS4036.1
-
[23]
M. Hoffmann, M. Scherer, T. Hempel, A. Mardt, B. de Silva, B. E. Husic, S. Klus, H. Wu, N. Kutz, S.L. Brunton, et al. Deeptime: a Python library for machine learning dynamical models from time series data.Machine Learning: Science and Technology, 3(1):015009, 2021.doi:10.1088/2632-2153/ac3de0
-
[24]
C. Sch¨ utte, S. Klus, and C. Hartmann. Overcoming the timescale barrier in molecular dynamics: Transfer operators, variational principles and machine learning.Acta Numerica, 32:517–673, 2023.doi:10.1017/S0962492923000016
-
[25]
K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw. How fast-folding proteins fold.Science, 334(6055):517–520, 2011.doi:10.1126/science.1208351. 15
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.