Random Vector Functional Link Neural Network based Ensemble Deep Learning
Pith reviewed 2026-05-25 13:08 UTC · model grok-4.3
The pith
Deep RVFL networks stack random hidden layers and solve output weights in closed form to achieve superior accuracy on benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The deep RVFL network stacks multiple layers whose hidden parameters are randomly generated within a suitable range and kept fixed, with output weights computed by closed-form solution as in standard RVFL; the ensemble edRVFL is obtained by treating intermediate output layers as an ensemble from a single training run, and both yield superior performance when tested on diverse benchmark datasets.
What carries the argument
The deep RVFL (dRVFL) network that stacks RVFL layers with randomly generated fixed hidden parameters and closed-form output weight solution.
If this is right
- Any RVFL variant can be turned into a deep model by stacking without changing the core training procedure.
- Ensemble benefits arise from one training run rather than independent model trainings.
- The frameworks apply across classification tasks from diverse domains.
- Integration with sparse-pretrained RVFL further boosts the reported performance.
Where Pith is reading between the lines
- The method may lower training cost relative to gradient-based deep networks since only output weights are solved analytically.
- The approach could extend naturally to regression or other supervised tasks beyond the classification benchmarks tested.
- Choice of the random parameter range might require task-specific tuning to maintain feature usefulness across stacked layers.
Load-bearing premise
Randomly generated hidden-layer parameters within a suitable range will continue to produce useful features when the layers are stacked, allowing the closed-form output solution to yield competitive accuracy.
What would settle it
A benchmark dataset on which the proposed dRVFL or edRVFL fails to match or exceed the accuracy of a standard shallow RVFL network or a backpropagation-trained deep network would falsify the superiority result.
Figures
read the original abstract
In this paper, we propose a deep learning framework based on randomized neural network. In particular, inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning networks with a recently proposed sparse-pretrained RVFL (SP-RVFL). Extensive experiments on benchmark datasets from diverse domains show the superior performance of our proposed deep RVFL networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a deep RVFL network (dRVFL) in which hidden-layer parameters are randomly generated within a suitable range and held fixed while output weights are obtained via the standard closed-form least-squares solution; it further introduces an ensemble deep RVFL (edRVFL) obtained from a single training run of the dRVFL and demonstrates integration with the sparse-pretrained RVFL variant. Extensive experiments on benchmark datasets from diverse domains are claimed to show superior performance of the proposed deep RVFL networks.
Significance. If the empirical superiority claims hold under rigorous evaluation, the work would supply an efficient route to deep architectures that avoids back-propagation through the hidden layers and yields output weights in closed form, potentially lowering training cost relative to conventional deep networks while retaining the ensemble benefit of edRVFL from a single model.
major comments (2)
- [Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.
- [Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive comments. We address each major comment below, indicating revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.
Authors: The abstract summarizes the experimental findings reported in full detail in Sections 4 and 5 of the manuscript, which include quantitative comparisons against multiple baselines across 20+ datasets from image, text, and time-series domains, with reported accuracies, standard deviations over multiple runs, and dataset sizes. To make the abstract self-contained and address the concern, we will revise it to include a concise statement of key results (e.g., average accuracy improvement and number of datasets) while respecting length constraints. revision: yes
-
Referee: [Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.
Authors: We agree that an explicit initialization rule is necessary for reproducibility and to ensure informative features across layers. The manuscript uses the conventional RVFL practice of drawing weights uniformly from [-1,1] scaled by 1/sqrt(input dimension), but this was not stated clearly. We will add a dedicated paragraph in Section 3.1 specifying the exact distribution, any depth-dependent scaling (none applied in our experiments), and the ranges used, along with a brief justification based on preserving feature variance. revision: yes
Circularity Check
No circularity; architectural proposal validated empirically on benchmarks
full rationale
The paper introduces dRVFL and edRVFL as direct extensions of the existing RVFL framework: hidden-layer parameters are randomly initialized within a suitable range and held fixed, while output weights are obtained via the standard closed-form least-squares solution. The central claim is empirical superiority on benchmark datasets, not a derivation or prediction that reduces to the inputs by construction. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. The work is self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- range for random hidden parameters
axioms (1)
- domain assumption Closed-form output weight solution remains effective when RVFL layers are stacked
invented entities (2)
-
dRVFL network
no independent evidence
-
edRVFL network
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117
J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117. doi:10.1016/j.neunet.2014.09.003
-
[3]
P. N. Suganthan, On non-iterative learning algorithms with closed-form solution, Applied Soft Computing 70 (2018) 1078 – 1082. doi:10.1016/j. asoc.2018.07.013
work page doi:10.1016/j 2018
- [4]
-
[5]
P. Guo, C. Chen, Y. Sun, An exact supervised learning for a three-layer supervised neural network, in: Proceedings of the International Conference on neural Information Processing (ICONIP’95), 1995, pp. 1041–1044
work page 1995
-
[6]
A VEST of the Pseudoinverse Learning Algorithm
P. Guo, A vest of the pseudoinverse learning algorithm, in: arXiv, https://arxiv.org/pdf/1805.07828, 2018, pp. 1–5. doi:https://arxiv. org/pdf/1805.07828
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [7]
-
[8]
W. F. Schmidt, M. A. Kraaijveld, R. P. Duin, Feedforward neural networks with random weights, in: Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, IEEE, 1992, pp. 1–4
work page 1992
-
[9]
H. A. T. Braake, G. V. Straten, Random activation weight neural net (rawn) for fast non-iterative training, Engineering Applications of Artificial Intelligence 8 (1) (1995) 71 – 80. doi:10.1016/0952-1976(94)00056-S
-
[10]
B. Widrow, A. Greenblatt, Y. Kim, D. Park, The no-prop algorithm: A new learning algorithm for multilayer neural networks, Neural Networks 37 (2013) 182 – 188, twenty-fifth Anniversay Commemorative Issue. doi: 10.1016/j.neunet.2012.09.020. 30
-
[11]
White, Chapter 9 approximate nonlinear forecasting methods, Vol
H. White, Chapter 9 approximate nonlinear forecasting methods, Vol. 1 of Handbook of Economic Forecasting, Elsevier, 2006, pp. 459 – 512. doi: 10.1016/S1574-0706(05)01009-8
-
[12]
Y. H. Pao, Y. Takefuji, Functional-link net computing: theory, system architecture, and functionalities, IEEE Computer 25 (5) (1992) 76–79.doi: 10.1109/2.144401
-
[13]
L. Zhang, P. N. Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10) (2017) 3243–3253. doi:10.1109/TCYB.2016.2588526
-
[14]
L. Zhang, P. N. Suganthan, Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensem- bles [research frontier], IEEE Computational Intelligence Magazine 12 (4) (2017) 61–72. doi:10.1109/MCI.2017.2742867
-
[15]
R. Katuwal, P. Suganthan, L. Zhang, An ensemble of decision trees with random vector functional link networks for multi-class classification, Ap- plied Soft Computing 70 (2018) 1146 – 1153. doi:10.1016/j.asoc.2017. 09.020
- [16]
-
[17]
doi:10.1016/j.asoc.2017.10.010
-
[18]
L. Tang, Y. Wu, L. Yu, A non-iterative decomposition-ensemble learning paradigm using rvfl network for crude oil price forecasting, Applied Soft Computing 70 (2018) 1097 – 1108. doi:10.1016/j.asoc.2017.02.013
-
[19]
Y. Dash, S. K. Mishra, S. Sahany, B. K. Panigrahi, Indian summer monsoon rainfall prediction: A comparison of iterative and non-iterative approaches, Applied Soft Computing 70 (2018) 1122 – 1134. doi:10.1016/j.asoc. 2017.08.055. 31
-
[20]
Y.-H. Pao, G.-H. Park, D. J. Sobajic, Learning and generalization charac- teristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163 – 180. doi:10.1016/0925-2312(94)90053-1
-
[21]
R. Katuwal, P. N. Suganthan, Enhancing multi-class classification of ran- dom forest using random vector functional neural network and oblique de- cision surfaces, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. doi:10.1109/IJCNN.2018.8489738
-
[22]
L. Zhang, P. N. Suganthan, A comprehensive evaluation of random vector functional link networks, Information Sciences 367-368 (2016) 1094 – 1105. doi:10.1016/j.ins.2015.09.025
-
[23]
Y. Ren, P. N. Suganthan, N. Srikanth, G. Amaratunga, Random vector functional link network for short-term electricity load demand forecasting, Information Sciences 367-368 (2016) 1078 – 1093. doi:10.1016/j.ins. 2015.11.039
-
[24]
M. J. Kearns, U. V. Vazirani, U. Vazirani, An introduction to computa- tional learning theory, MIT press, 1994
work page 1994
-
[25]
A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensem- bles of relatively shallow networks, in: Advances in Neural Information Processing Systems, 2016, pp. 550–558
work page 2016
-
[26]
Snapshot Ensembles: Train 1, get M for free
G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, K. Q. Weinberger, Snap- shot ensembles: Train 1, get m for free, arXiv preprint arXiv:1704.00109
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Y. Ren, L. Zhang, P. N. Suganthan, Ensemble classification and regression- recent developments, applications and future directions [review article], IEEE Computational Intelligence Magazine 11 (1) (2016) 41–53. doi: 10.1109/MCI.2015.2471235
-
[28]
C. Gallicchio, A. Micheli, L. Pedrelli, Design of deep echo state networks, Neural Networks 108 (2018) 33 – 47. doi:10.1016/j.neunet.2018.08. 002. 32
-
[29]
P. A. Henrquez, G. A. Ruz, Twitter sentiment classification based on deep random vector functional link, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–6. doi:10.1109/IJCNN.2018. 8489703
-
[30]
¨O. F. Ertu˘ grul, A novel type of activation function in artificial neural net- works: Trained activation function, Neural Networks 99 (2018) 148 – 157. doi:10.1016/j.neunet.2018.01.007
-
[31]
Y. Zhang, J. Wu, Z. Cai, B. Du, P. S. Yu, An unsupervised parameter learning model for RVFL neural network, Neural Networks 112 (2019) 85 – 97. doi:10.1016/j.neunet.2019.01.007
-
[32]
B. K. Verma, J. J. Mulawka, A modified backpropagation algorithm, in: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Vol. 2, 1994, pp. 840–844 vol.2. doi:10.1109/ICNN.1994. 374289
-
[33]
G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (1) (2006) 489 – 501, neural Net- works. doi:10.1016/j.neucom.2005.12.126
-
[34]
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, Siam Journal on Imaging Sciences 2 (1) (2009) 183–202
work page 2009
-
[35]
J. Tang, C. Deng, G. Huang, Extreme learning machine for multilayer perceptron, IEEE Transactions on Neural Networks and Learning Systems 27 (4) (2016) 809–821. doi:10.1109/TNNLS.2015.2424995
-
[36]
C. Gallicchio, A. Micheli, L. Pedrelli, Deep reservoir computing: A critical experimental analysis, Neurocomputing 268 (2017) 87 – 99, advances in artificial neural networks, machine learning and computational intelligence. doi:10.1016/j.neucom.2016.12.089. 33
- [37]
-
[38]
G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507
work page 2006
-
[39]
P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, New York, NY, USA, 2008, pp. 1096–1103. doi:10.1145/1390156.1390294
-
[40]
G. E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527–1554, pMID: 16764513. doi:10.1162/neco.2006.18.7.1527
-
[41]
R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in: Artificial in- telligence and statistics, 2009, pp. 448–455
work page 2009
-
[42]
M. Fern´ andez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems?, Journal of Machine Learning Research 15 (2014) 3133–3181
work page 2014
-
[43]
J. Demˇ sar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30
work page 2006
-
[44]
K. Rakesh, P. N. Suganthan, An ensemble of kernel ridge regression for multi-class classification, Procedia Computer Science 108 (2017) 375 – 383, international Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland. doi:10.1016/j.procs.2017.05.109
-
[45]
Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol
P. Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol. 18, International Biometric SOC 1441 I ST, NW, SUITE 700 Wash- ington, DC, 20005-2210, 1962, p. 263. 34
work page 1962
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.