pith. sign in

arxiv: 1907.00350 · v1 · pith:7F64M77Cnew · submitted 2019-06-30 · 💻 cs.CV

Random Vector Functional Link Neural Network based Ensemble Deep Learning

Pith reviewed 2026-05-25 13:08 UTC · model grok-4.3

classification 💻 cs.CV
keywords deep RVFLrandom vector functional linkensemble learningneural networksclosed-form solutionbenchmark datasetsclassification
0
0 comments X

The pith

Deep RVFL networks stack random hidden layers and solve output weights in closed form to achieve superior accuracy on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a deep RVFL network formed by stacking RVFL layers where hidden parameters are randomly generated and fixed while output weights are computed directly via closed-form solution. It further introduces an ensemble deep RVFL obtained from training one such network once. Both frameworks are shown to integrate with existing RVFL variants like sparse-pretrained RVFL. Experiments on benchmark datasets from multiple domains demonstrate that these deep versions deliver better performance than standard approaches. A sympathetic reader would care because the method avoids iterative training of all parameters yet claims competitive or better results.

Core claim

The deep RVFL network stacks multiple layers whose hidden parameters are randomly generated within a suitable range and kept fixed, with output weights computed by closed-form solution as in standard RVFL; the ensemble edRVFL is obtained by treating intermediate output layers as an ensemble from a single training run, and both yield superior performance when tested on diverse benchmark datasets.

What carries the argument

The deep RVFL (dRVFL) network that stacks RVFL layers with randomly generated fixed hidden parameters and closed-form output weight solution.

If this is right

  • Any RVFL variant can be turned into a deep model by stacking without changing the core training procedure.
  • Ensemble benefits arise from one training run rather than independent model trainings.
  • The frameworks apply across classification tasks from diverse domains.
  • Integration with sparse-pretrained RVFL further boosts the reported performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may lower training cost relative to gradient-based deep networks since only output weights are solved analytically.
  • The approach could extend naturally to regression or other supervised tasks beyond the classification benchmarks tested.
  • Choice of the random parameter range might require task-specific tuning to maintain feature usefulness across stacked layers.

Load-bearing premise

Randomly generated hidden-layer parameters within a suitable range will continue to produce useful features when the layers are stacked, allowing the closed-form output solution to yield competitive accuracy.

What would settle it

A benchmark dataset on which the proposed dRVFL or edRVFL fails to match or exceed the accuracy of a standard shallow RVFL network or a backpropagation-trained deep network would falsify the superiority result.

Figures

Figures reproduced from arXiv: 1907.00350 by M. Tanveer, P.N. Suganthan, Rakesh Katuwal.

Figure 1
Figure 1. Figure 1: Framework of RVFL (1994) and ELM (2004) networks. The structure of RVFL and [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework of a dRVFL network. It consists of several hidden layers stacked on top [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Framework of ensemble deep RVFL network (edRVFL). It differs from dRVFL [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Statistical comparison of classifiers against each other based on Nemenyi test. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of dRVFL and edRVFL in terms of accuracy (%) w.r.t different number of hidden [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Training and testing times comparison of dRVFL and edRVFL w.r.t different number of hidden [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of dRVFL and RVFL in terms of accuracy (%) with the same number [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of edRVLF (implicit ensemble) and TedRVFL (true ensemble) in terms of accuracy [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Training and testing times comparison of edRVLF (implicit ensemble) and TedRVFL (true ensem [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Performance variation of the proposed dRVFL (first row) and edRVFL(second row) [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Performance variation of the proposed dRVFL (first row) and edRVFL(second row) [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

In this paper, we propose a deep learning framework based on randomized neural network. In particular, inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning networks with a recently proposed sparse-pretrained RVFL (SP-RVFL). Extensive experiments on benchmark datasets from diverse domains show the superior performance of our proposed deep RVFL networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a deep RVFL network (dRVFL) in which hidden-layer parameters are randomly generated within a suitable range and held fixed while output weights are obtained via the standard closed-form least-squares solution; it further introduces an ensemble deep RVFL (edRVFL) obtained from a single training run of the dRVFL and demonstrates integration with the sparse-pretrained RVFL variant. Extensive experiments on benchmark datasets from diverse domains are claimed to show superior performance of the proposed deep RVFL networks.

Significance. If the empirical superiority claims hold under rigorous evaluation, the work would supply an efficient route to deep architectures that avoids back-propagation through the hidden layers and yields output weights in closed form, potentially lowering training cost relative to conventional deep networks while retaining the ensemble benefit of edRVFL from a single model.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.
  2. [Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address each major comment below, indicating revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.

    Authors: The abstract summarizes the experimental findings reported in full detail in Sections 4 and 5 of the manuscript, which include quantitative comparisons against multiple baselines across 20+ datasets from image, text, and time-series domains, with reported accuracies, standard deviations over multiple runs, and dataset sizes. To make the abstract self-contained and address the concern, we will revise it to include a concise statement of key results (e.g., average accuracy improvement and number of datasets) while respecting length constraints. revision: yes

  2. Referee: [Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.

    Authors: We agree that an explicit initialization rule is necessary for reproducibility and to ensure informative features across layers. The manuscript uses the conventional RVFL practice of drawing weights uniformly from [-1,1] scaled by 1/sqrt(input dimension), but this was not stated clearly. We will add a dedicated paragraph in Section 3.1 specifying the exact distribution, any depth-dependent scaling (none applied in our experiments), and the ranges used, along with a brief justification based on preserving feature variance. revision: yes

Circularity Check

0 steps flagged

No circularity; architectural proposal validated empirically on benchmarks

full rationale

The paper introduces dRVFL and edRVFL as direct extensions of the existing RVFL framework: hidden-layer parameters are randomly initialized within a suitable range and held fixed, while output weights are obtained via the standard closed-form least-squares solution. The central claim is empirical superiority on benchmark datasets, not a derivation or prediction that reduces to the inputs by construction. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. The work is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the standard RVFL closed-form solvability assumption being preserved under layer stacking and on the empirical superiority being generalizable beyond the tested benchmarks.

free parameters (1)
  • range for random hidden parameters
    Abstract states parameters are generated within a suitable range but provides no method for choosing or validating that range.
axioms (1)
  • domain assumption Closed-form output weight solution remains effective when RVFL layers are stacked
    Invoked when defining dRVFL; no proof or prior reference supplied in abstract.
invented entities (2)
  • dRVFL network no independent evidence
    purpose: Deep stacked version of RVFL
    New architecture introduced in the paper.
  • edRVFL network no independent evidence
    purpose: Ensemble obtained from single dRVFL training run
    New ensembling mechanism introduced in the paper.

pith-pipeline@v0.9.0 · 5717 in / 1296 out tokens · 27271 ms · 2026-05-25T13:08:38.527459+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

  1. [1]

    LeCun, Y

    Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436. 29

  2. [2]

    Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117

    J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117. doi:10.1016/j.neunet.2014.09.003

  3. [3]

    P. N. Suganthan, On non-iterative learning algorithms with closed-form solution, Applied Soft Computing 70 (2018) 1078 – 1082. doi:10.1016/j. asoc.2018.07.013

  4. [4]

    Olson, A

    M. Olson, A. Wyner, R. Berk, Modern neural networks generalize on small data sets, in: Advances in Neural Information Processing Systems 31, 2018, pp. 3623–3632

  5. [5]

    P. Guo, C. Chen, Y. Sun, An exact supervised learning for a three-layer supervised neural network, in: Proceedings of the International Conference on neural Information Processing (ICONIP’95), 1995, pp. 1041–1044

  6. [6]

    A VEST of the Pseudoinverse Learning Algorithm

    P. Guo, A vest of the pseudoinverse learning algorithm, in: arXiv, https://arxiv.org/pdf/1805.07828, 2018, pp. 1–5. doi:https://arxiv. org/pdf/1805.07828

  7. [7]

    Berry, M

    H. Berry, M. Quoy, Structure and dynamics of random recurrent neural networks, Adaptive Behavior 14 (2) (2006) 129–137

  8. [8]

    W. F. Schmidt, M. A. Kraaijveld, R. P. Duin, Feedforward neural networks with random weights, in: Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, IEEE, 1992, pp. 1–4

  9. [9]

    H. A. T. Braake, G. V. Straten, Random activation weight neural net (rawn) for fast non-iterative training, Engineering Applications of Artificial Intelligence 8 (1) (1995) 71 – 80. doi:10.1016/0952-1976(94)00056-S

  10. [10]

    Widrow, A

    B. Widrow, A. Greenblatt, Y. Kim, D. Park, The no-prop algorithm: A new learning algorithm for multilayer neural networks, Neural Networks 37 (2013) 182 – 188, twenty-fifth Anniversay Commemorative Issue. doi: 10.1016/j.neunet.2012.09.020. 30

  11. [11]

    White, Chapter 9 approximate nonlinear forecasting methods, Vol

    H. White, Chapter 9 approximate nonlinear forecasting methods, Vol. 1 of Handbook of Economic Forecasting, Elsevier, 2006, pp. 459 – 512. doi: 10.1016/S1574-0706(05)01009-8

  12. [12]

    Y. H. Pao, Y. Takefuji, Functional-link net computing: theory, system architecture, and functionalities, IEEE Computer 25 (5) (1992) 76–79.doi: 10.1109/2.144401

  13. [13]

    Zhang, P

    L. Zhang, P. N. Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10) (2017) 3243–3253. doi:10.1109/TCYB.2016.2588526

  14. [14]

    Zhang, P

    L. Zhang, P. N. Suganthan, Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensem- bles [research frontier], IEEE Computational Intelligence Magazine 12 (4) (2017) 61–72. doi:10.1109/MCI.2017.2742867

  15. [15]

    Katuwal, P

    R. Katuwal, P. Suganthan, L. Zhang, An ensemble of decision trees with random vector functional link networks for multi-class classification, Ap- plied Soft Computing 70 (2018) 1146 – 1153. doi:10.1016/j.asoc.2017. 09.020

  16. [16]

    Vukovi, M

    N. Vukovi, M. Petrovi, Z. Miljkovi, A comprehensive experimental eval- uation of orthogonal polynomial expanded random vector functional link neural networks for regression, Applied Soft Computing 70 (2018) 1083 –

  17. [17]

    doi:10.1016/j.asoc.2017.10.010

  18. [18]

    L. Tang, Y. Wu, L. Yu, A non-iterative decomposition-ensemble learning paradigm using rvfl network for crude oil price forecasting, Applied Soft Computing 70 (2018) 1097 – 1108. doi:10.1016/j.asoc.2017.02.013

  19. [19]

    Y. Dash, S. K. Mishra, S. Sahany, B. K. Panigrahi, Indian summer monsoon rainfall prediction: A comparison of iterative and non-iterative approaches, Applied Soft Computing 70 (2018) 1122 – 1134. doi:10.1016/j.asoc. 2017.08.055. 31

  20. [20]

    Pao, G.-H

    Y.-H. Pao, G.-H. Park, D. J. Sobajic, Learning and generalization charac- teristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163 – 180. doi:10.1016/0925-2312(94)90053-1

  21. [21]

    Katuwal, P

    R. Katuwal, P. N. Suganthan, Enhancing multi-class classification of ran- dom forest using random vector functional neural network and oblique de- cision surfaces, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. doi:10.1109/IJCNN.2018.8489738

  22. [22]

    Zhang, P

    L. Zhang, P. N. Suganthan, A comprehensive evaluation of random vector functional link networks, Information Sciences 367-368 (2016) 1094 – 1105. doi:10.1016/j.ins.2015.09.025

  23. [23]

    Y. Ren, P. N. Suganthan, N. Srikanth, G. Amaratunga, Random vector functional link network for short-term electricity load demand forecasting, Information Sciences 367-368 (2016) 1078 – 1093. doi:10.1016/j.ins. 2015.11.039

  24. [24]

    M. J. Kearns, U. V. Vazirani, U. Vazirani, An introduction to computa- tional learning theory, MIT press, 1994

  25. [25]

    A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensem- bles of relatively shallow networks, in: Advances in Neural Information Processing Systems, 2016, pp. 550–558

  26. [26]

    Snapshot Ensembles: Train 1, get M for free

    G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, K. Q. Weinberger, Snap- shot ensembles: Train 1, get m for free, arXiv preprint arXiv:1704.00109

  27. [27]

    Y. Ren, L. Zhang, P. N. Suganthan, Ensemble classification and regression- recent developments, applications and future directions [review article], IEEE Computational Intelligence Magazine 11 (1) (2016) 41–53. doi: 10.1109/MCI.2015.2471235

  28. [28]

    Gallicchio, A

    C. Gallicchio, A. Micheli, L. Pedrelli, Design of deep echo state networks, Neural Networks 108 (2018) 33 – 47. doi:10.1016/j.neunet.2018.08. 002. 32

  29. [29]

    P. A. Henrquez, G. A. Ruz, Twitter sentiment classification based on deep random vector functional link, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–6. doi:10.1109/IJCNN.2018. 8489703

  30. [30]

    ¨O. F. Ertu˘ grul, A novel type of activation function in artificial neural net- works: Trained activation function, Neural Networks 99 (2018) 148 – 157. doi:10.1016/j.neunet.2018.01.007

  31. [31]

    Zhang, J

    Y. Zhang, J. Wu, Z. Cai, B. Du, P. S. Yu, An unsupervised parameter learning model for RVFL neural network, Neural Networks 112 (2019) 85 – 97. doi:10.1016/j.neunet.2019.01.007

  32. [32]

    B. K. Verma, J. J. Mulawka, A modified backpropagation algorithm, in: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Vol. 2, 1994, pp. 840–844 vol.2. doi:10.1109/ICNN.1994. 374289

  33. [33]

    Huang, Q.-Y

    G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (1) (2006) 489 – 501, neural Net- works. doi:10.1016/j.neucom.2005.12.126

  34. [34]

    A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, Siam Journal on Imaging Sciences 2 (1) (2009) 183–202

  35. [35]

    J. Tang, C. Deng, G. Huang, Extreme learning machine for multilayer perceptron, IEEE Transactions on Neural Networks and Learning Systems 27 (4) (2016) 809–821. doi:10.1109/TNNLS.2015.2424995

  36. [36]

    Gallicchio, A

    C. Gallicchio, A. Micheli, L. Pedrelli, Deep reservoir computing: A critical experimental analysis, Neurocomputing 268 (2017) 87 – 99, advances in artificial neural networks, machine learning and computational intelligence. doi:10.1016/j.neucom.2016.12.089. 33

  37. [37]

    Zhang, J

    Y. Zhang, J. Duchi, M. Wainwright, Divide and conquer kernel ridge re- gression: A distributed algorithm with minimax optimal rates, The Journal of Machine Learning Research 16 (1) (2015) 3299–3340

  38. [38]

    G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507

  39. [39]

    Vincent, H

    P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, New York, NY, USA, 2008, pp. 1096–1103. doi:10.1145/1390156.1390294

  40. [40]

    G. E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527–1554, pMID: 16764513. doi:10.1162/neco.2006.18.7.1527

  41. [41]

    Salakhutdinov, G

    R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in: Artificial in- telligence and statistics, 2009, pp. 448–455

  42. [42]

    Fern´ andez-Delgado, E

    M. Fern´ andez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems?, Journal of Machine Learning Research 15 (2014) 3133–3181

  43. [43]

    Demˇ sar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

    J. Demˇ sar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

  44. [44]

    Rakesh, P

    K. Rakesh, P. N. Suganthan, An ensemble of kernel ridge regression for multi-class classification, Procedia Computer Science 108 (2017) 375 – 383, international Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland. doi:10.1016/j.procs.2017.05.109

  45. [45]

    Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol

    P. Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol. 18, International Biometric SOC 1441 I ST, NW, SUITE 700 Wash- ington, DC, 20005-2210, 1962, p. 263. 34