Random Vector Functional Link Neural Network based Ensemble Deep Learning

M. Tanveer; P.N. Suganthan; Rakesh Katuwal

arxiv: 1907.00350 · v1 · pith:7F64M77Cnew · submitted 2019-06-30 · 💻 cs.CV

Random Vector Functional Link Neural Network based Ensemble Deep Learning

Rakesh Katuwal , P.N. Suganthan , M. Tanveer This is my paper

Pith reviewed 2026-05-25 13:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords deep RVFLrandom vector functional linkensemble learningneural networksclosed-form solutionbenchmark datasetsclassification

0 comments

The pith

Deep RVFL networks stack random hidden layers and solve output weights in closed form to achieve superior accuracy on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a deep RVFL network formed by stacking RVFL layers where hidden parameters are randomly generated and fixed while output weights are computed directly via closed-form solution. It further introduces an ensemble deep RVFL obtained from training one such network once. Both frameworks are shown to integrate with existing RVFL variants like sparse-pretrained RVFL. Experiments on benchmark datasets from multiple domains demonstrate that these deep versions deliver better performance than standard approaches. A sympathetic reader would care because the method avoids iterative training of all parameters yet claims competitive or better results.

Core claim

The deep RVFL network stacks multiple layers whose hidden parameters are randomly generated within a suitable range and kept fixed, with output weights computed by closed-form solution as in standard RVFL; the ensemble edRVFL is obtained by treating intermediate output layers as an ensemble from a single training run, and both yield superior performance when tested on diverse benchmark datasets.

What carries the argument

The deep RVFL (dRVFL) network that stacks RVFL layers with randomly generated fixed hidden parameters and closed-form output weight solution.

If this is right

Any RVFL variant can be turned into a deep model by stacking without changing the core training procedure.
Ensemble benefits arise from one training run rather than independent model trainings.
The frameworks apply across classification tasks from diverse domains.
Integration with sparse-pretrained RVFL further boosts the reported performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may lower training cost relative to gradient-based deep networks since only output weights are solved analytically.
The approach could extend naturally to regression or other supervised tasks beyond the classification benchmarks tested.
Choice of the random parameter range might require task-specific tuning to maintain feature usefulness across stacked layers.

Load-bearing premise

Randomly generated hidden-layer parameters within a suitable range will continue to produce useful features when the layers are stacked, allowing the closed-form output solution to yield competitive accuracy.

What would settle it

A benchmark dataset on which the proposed dRVFL or edRVFL fails to match or exceed the accuracy of a standard shallow RVFL network or a backpropagation-trained deep network would falsify the superiority result.

Figures

Figures reproduced from arXiv: 1907.00350 by M. Tanveer, P.N. Suganthan, Rakesh Katuwal.

**Figure 2.** Figure 2: Framework of a dRVFL network. It consists of several hidden layers stacked on top [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Framework of ensemble deep RVFL network (edRVFL). It differs from dRVFL [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Statistical comparison of classifiers against each other based on Nemenyi test. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of dRVFL and edRVFL in terms of accuracy (%) w.r.t different number of hidden [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Training and testing times comparison of dRVFL and edRVFL w.r.t different number of hidden [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of dRVFL and RVFL in terms of accuracy (%) with the same number [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of edRVLF (implicit ensemble) and TedRVFL (true ensemble) in terms of accuracy [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Training and testing times comparison of edRVLF (implicit ensemble) and TedRVFL (true ensem [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Performance variation of the proposed dRVFL (first row) and edRVFL(second row) [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Performance variation of the proposed dRVFL (first row) and edRVFL(second row) [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

read the original abstract

In this paper, we propose a deep learning framework based on randomized neural network. In particular, inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning networks with a recently proposed sparse-pretrained RVFL (SP-RVFL). Extensive experiments on benchmark datasets from diverse domains show the superior performance of our proposed deep RVFL networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

dRVFL stacking and single-pass edRVFL are the concrete new constructions, but the abstract gives no numbers and leaves the random parameter range unspecified.

read the letter

The paper's main points are the dRVFL with stacked RVFL layers and the edRVFL ensemble built from one training run. Both keep hidden weights random and fixed, then solve output weights in closed form, and they show the idea works with their earlier SP-RVFL variant. These constructions look new against the classic RVFL references in the abstract. The single-pass ensemble is a practical detail that avoids training multiple models separately, and the approach preserves the non-iterative speed of basic RVFL while adding depth. That combination could be useful for quick experiments or constrained hardware. The abstract claims superior results on diverse benchmarks, yet supplies no tables, baselines, error bars, or dataset names, so the performance claim cannot be checked from the given text. The random hidden parameters are drawn within an unspecified suitable range, and nothing is said about how to choose or scale that range as layers are added. If the range produces weak or saturating features at depth two or more, the closed-form step has nothing useful to work with. This is the exact condition the method needs to hold for the claimed gains. The work is aimed at researchers already using randomized networks who want a fast way to add depth or ensembles. A reader testing alternatives to back-propagation might get value from trying the stacking and single-pass ideas once the range rule and full results are available. It deserves peer review because the proposals are specific enough for referees to examine the experiments and parameter choices directly.

Referee Report

2 major / 0 minor

Summary. The paper proposes a deep RVFL network (dRVFL) in which hidden-layer parameters are randomly generated within a suitable range and held fixed while output weights are obtained via the standard closed-form least-squares solution; it further introduces an ensemble deep RVFL (edRVFL) obtained from a single training run of the dRVFL and demonstrates integration with the sparse-pretrained RVFL variant. Extensive experiments on benchmark datasets from diverse domains are claimed to show superior performance of the proposed deep RVFL networks.

Significance. If the empirical superiority claims hold under rigorous evaluation, the work would supply an efficient route to deep architectures that avoids back-propagation through the hidden layers and yields output weights in closed form, potentially lowering training cost relative to conventional deep networks while retaining the ensemble benefit of edRVFL from a single model.

major comments (2)

[Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.
[Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address each major comment below, indicating revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'extensive experiments on benchmark datasets from diverse domains show the superior performance' is unsupported by any quantitative results, baselines, error bars, dataset sizes, or statistical tests in the provided text, rendering the empirical contribution unevaluable.

Authors: The abstract summarizes the experimental findings reported in full detail in Sections 4 and 5 of the manuscript, which include quantitative comparisons against multiple baselines across 20+ datasets from image, text, and time-series domains, with reported accuracies, standard deviations over multiple runs, and dataset sizes. To make the abstract self-contained and address the concern, we will revise it to include a concise statement of key results (e.g., average accuracy improvement and number of datasets) while respecting length constraints. revision: yes
Referee: [Abstract / dRVFL definition] dRVFL construction (abstract and method description): the hidden-layer weights are drawn 'within a suitable range' with no explicit rule, scaling procedure, or dependence on depth or input statistics supplied; because the closed-form output solution can recover accuracy only when the random features remain informative after stacking, the absence of this rule is load-bearing for both the single-network and ensemble claims.

Authors: We agree that an explicit initialization rule is necessary for reproducibility and to ensure informative features across layers. The manuscript uses the conventional RVFL practice of drawing weights uniformly from [-1,1] scaled by 1/sqrt(input dimension), but this was not stated clearly. We will add a dedicated paragraph in Section 3.1 specifying the exact distribution, any depth-dependent scaling (none applied in our experiments), and the ranges used, along with a brief justification based on preserving feature variance. revision: yes

Circularity Check

0 steps flagged

No circularity; architectural proposal validated empirically on benchmarks

full rationale

The paper introduces dRVFL and edRVFL as direct extensions of the existing RVFL framework: hidden-layer parameters are randomly initialized within a suitable range and held fixed, while output weights are obtained via the standard closed-form least-squares solution. The central claim is empirical superiority on benchmark datasets, not a derivation or prediction that reduces to the inputs by construction. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. The work is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the standard RVFL closed-form solvability assumption being preserved under layer stacking and on the empirical superiority being generalizable beyond the tested benchmarks.

free parameters (1)

range for random hidden parameters
Abstract states parameters are generated within a suitable range but provides no method for choosing or validating that range.

axioms (1)

domain assumption Closed-form output weight solution remains effective when RVFL layers are stacked
Invoked when defining dRVFL; no proof or prior reference supplied in abstract.

invented entities (2)

dRVFL network no independent evidence
purpose: Deep stacked version of RVFL
New architecture introduced in the paper.
edRVFL network no independent evidence
purpose: Ensemble obtained from single dRVFL training run
New ensembling mechanism introduced in the paper.

pith-pipeline@v0.9.0 · 5717 in / 1296 out tokens · 27271 ms · 2026-05-25T13:08:38.527459+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

[1]

LeCun, Y

Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436. 29

work page 2015
[2]

Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117. doi:10.1016/j.neunet.2014.09.003

work page doi:10.1016/j.neunet.2014.09.003 2015
[3]

P. N. Suganthan, On non-iterative learning algorithms with closed-form solution, Applied Soft Computing 70 (2018) 1078 – 1082. doi:10.1016/j. asoc.2018.07.013

work page doi:10.1016/j 2018
[4]

Olson, A

M. Olson, A. Wyner, R. Berk, Modern neural networks generalize on small data sets, in: Advances in Neural Information Processing Systems 31, 2018, pp. 3623–3632

work page 2018
[5]

P. Guo, C. Chen, Y. Sun, An exact supervised learning for a three-layer supervised neural network, in: Proceedings of the International Conference on neural Information Processing (ICONIP’95), 1995, pp. 1041–1044

work page 1995
[6]

A VEST of the Pseudoinverse Learning Algorithm

P. Guo, A vest of the pseudoinverse learning algorithm, in: arXiv, https://arxiv.org/pdf/1805.07828, 2018, pp. 1–5. doi:https://arxiv. org/pdf/1805.07828

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Berry, M

H. Berry, M. Quoy, Structure and dynamics of random recurrent neural networks, Adaptive Behavior 14 (2) (2006) 129–137

work page 2006
[8]

W. F. Schmidt, M. A. Kraaijveld, R. P. Duin, Feedforward neural networks with random weights, in: Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, IEEE, 1992, pp. 1–4

work page 1992
[9]

H. A. T. Braake, G. V. Straten, Random activation weight neural net (rawn) for fast non-iterative training, Engineering Applications of Artiﬁcial Intelligence 8 (1) (1995) 71 – 80. doi:10.1016/0952-1976(94)00056-S

work page doi:10.1016/0952-1976(94)00056-s 1995
[10]

Widrow, A

B. Widrow, A. Greenblatt, Y. Kim, D. Park, The no-prop algorithm: A new learning algorithm for multilayer neural networks, Neural Networks 37 (2013) 182 – 188, twenty-ﬁfth Anniversay Commemorative Issue. doi: 10.1016/j.neunet.2012.09.020. 30

work page doi:10.1016/j.neunet.2012.09.020 2013
[11]

White, Chapter 9 approximate nonlinear forecasting methods, Vol

H. White, Chapter 9 approximate nonlinear forecasting methods, Vol. 1 of Handbook of Economic Forecasting, Elsevier, 2006, pp. 459 – 512. doi: 10.1016/S1574-0706(05)01009-8

work page doi:10.1016/s1574-0706(05)01009-8 2006
[12]

Y. H. Pao, Y. Takefuji, Functional-link net computing: theory, system architecture, and functionalities, IEEE Computer 25 (5) (1992) 76–79.doi: 10.1109/2.144401

work page doi:10.1109/2.144401 1992
[13]

Zhang, P

L. Zhang, P. N. Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10) (2017) 3243–3253. doi:10.1109/TCYB.2016.2588526

work page doi:10.1109/tcyb.2016.2588526 2017
[14]

Zhang, P

L. Zhang, P. N. Suganthan, Benchmarking ensemble classiﬁers with novel co-trained kernel ridge regression and random vector functional link ensem- bles [research frontier], IEEE Computational Intelligence Magazine 12 (4) (2017) 61–72. doi:10.1109/MCI.2017.2742867

work page doi:10.1109/mci.2017.2742867 2017
[15]

Katuwal, P

R. Katuwal, P. Suganthan, L. Zhang, An ensemble of decision trees with random vector functional link networks for multi-class classiﬁcation, Ap- plied Soft Computing 70 (2018) 1146 – 1153. doi:10.1016/j.asoc.2017. 09.020

work page doi:10.1016/j.asoc.2017 2018
[16]

Vukovi, M

N. Vukovi, M. Petrovi, Z. Miljkovi, A comprehensive experimental eval- uation of orthogonal polynomial expanded random vector functional link neural networks for regression, Applied Soft Computing 70 (2018) 1083 –

work page 2018
[17]

doi:10.1016/j.asoc.2017.10.010

work page doi:10.1016/j.asoc.2017.10.010 2017
[18]

L. Tang, Y. Wu, L. Yu, A non-iterative decomposition-ensemble learning paradigm using rvﬂ network for crude oil price forecasting, Applied Soft Computing 70 (2018) 1097 – 1108. doi:10.1016/j.asoc.2017.02.013

work page doi:10.1016/j.asoc.2017.02.013 2018
[19]

Y. Dash, S. K. Mishra, S. Sahany, B. K. Panigrahi, Indian summer monsoon rainfall prediction: A comparison of iterative and non-iterative approaches, Applied Soft Computing 70 (2018) 1122 – 1134. doi:10.1016/j.asoc. 2017.08.055. 31

work page doi:10.1016/j.asoc 2018
[20]

Pao, G.-H

Y.-H. Pao, G.-H. Park, D. J. Sobajic, Learning and generalization charac- teristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163 – 180. doi:10.1016/0925-2312(94)90053-1

work page doi:10.1016/0925-2312(94)90053-1 1994
[21]

Katuwal, P

R. Katuwal, P. N. Suganthan, Enhancing multi-class classiﬁcation of ran- dom forest using random vector functional neural network and oblique de- cision surfaces, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. doi:10.1109/IJCNN.2018.8489738

work page doi:10.1109/ijcnn.2018.8489738 2018
[22]

Zhang, P

L. Zhang, P. N. Suganthan, A comprehensive evaluation of random vector functional link networks, Information Sciences 367-368 (2016) 1094 – 1105. doi:10.1016/j.ins.2015.09.025

work page doi:10.1016/j.ins.2015.09.025 2016
[23]

Y. Ren, P. N. Suganthan, N. Srikanth, G. Amaratunga, Random vector functional link network for short-term electricity load demand forecasting, Information Sciences 367-368 (2016) 1078 – 1093. doi:10.1016/j.ins. 2015.11.039

work page doi:10.1016/j.ins 2016
[24]

M. J. Kearns, U. V. Vazirani, U. Vazirani, An introduction to computa- tional learning theory, MIT press, 1994

work page 1994
[25]

A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensem- bles of relatively shallow networks, in: Advances in Neural Information Processing Systems, 2016, pp. 550–558

work page 2016
[26]

Snapshot Ensembles: Train 1, get M for free

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, K. Q. Weinberger, Snap- shot ensembles: Train 1, get m for free, arXiv preprint arXiv:1704.00109

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Y. Ren, L. Zhang, P. N. Suganthan, Ensemble classiﬁcation and regression- recent developments, applications and future directions [review article], IEEE Computational Intelligence Magazine 11 (1) (2016) 41–53. doi: 10.1109/MCI.2015.2471235

work page doi:10.1109/mci.2015.2471235 2016
[28]

Gallicchio, A

C. Gallicchio, A. Micheli, L. Pedrelli, Design of deep echo state networks, Neural Networks 108 (2018) 33 – 47. doi:10.1016/j.neunet.2018.08. 002. 32

work page doi:10.1016/j.neunet.2018.08 2018
[29]

P. A. Henrquez, G. A. Ruz, Twitter sentiment classiﬁcation based on deep random vector functional link, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–6. doi:10.1109/IJCNN.2018. 8489703

work page doi:10.1109/ijcnn.2018 2018
[30]

¨O. F. Ertu˘ grul, A novel type of activation function in artiﬁcial neural net- works: Trained activation function, Neural Networks 99 (2018) 148 – 157. doi:10.1016/j.neunet.2018.01.007

work page doi:10.1016/j.neunet.2018.01.007 2018
[31]

Zhang, J

Y. Zhang, J. Wu, Z. Cai, B. Du, P. S. Yu, An unsupervised parameter learning model for RVFL neural network, Neural Networks 112 (2019) 85 – 97. doi:10.1016/j.neunet.2019.01.007

work page doi:10.1016/j.neunet.2019.01.007 2019
[32]

B. K. Verma, J. J. Mulawka, A modiﬁed backpropagation algorithm, in: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Vol. 2, 1994, pp. 840–844 vol.2. doi:10.1109/ICNN.1994. 374289

work page doi:10.1109/icnn.1994 1994
[33]

Huang, Q.-Y

G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (1) (2006) 489 – 501, neural Net- works. doi:10.1016/j.neucom.2005.12.126

work page doi:10.1016/j.neucom.2005.12.126 2006
[34]

A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, Siam Journal on Imaging Sciences 2 (1) (2009) 183–202

work page 2009
[35]

J. Tang, C. Deng, G. Huang, Extreme learning machine for multilayer perceptron, IEEE Transactions on Neural Networks and Learning Systems 27 (4) (2016) 809–821. doi:10.1109/TNNLS.2015.2424995

work page doi:10.1109/tnnls.2015.2424995 2016
[36]

Gallicchio, A

C. Gallicchio, A. Micheli, L. Pedrelli, Deep reservoir computing: A critical experimental analysis, Neurocomputing 268 (2017) 87 – 99, advances in artiﬁcial neural networks, machine learning and computational intelligence. doi:10.1016/j.neucom.2016.12.089. 33

work page doi:10.1016/j.neucom.2016.12.089 2017
[37]

Zhang, J

Y. Zhang, J. Duchi, M. Wainwright, Divide and conquer kernel ridge re- gression: A distributed algorithm with minimax optimal rates, The Journal of Machine Learning Research 16 (1) (2015) 3299–3340

work page 2015
[38]

G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507

work page 2006
[39]

Vincent, H

P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, New York, NY, USA, 2008, pp. 1096–1103. doi:10.1145/1390156.1390294

work page doi:10.1145/1390156.1390294 2008
[40]

G. E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527–1554, pMID: 16764513. doi:10.1162/neco.2006.18.7.1527

work page doi:10.1162/neco.2006.18.7.1527 2006
[41]

Salakhutdinov, G

R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in: Artiﬁcial in- telligence and statistics, 2009, pp. 448–455

work page 2009
[42]

Fern´ andez-Delgado, E

M. Fern´ andez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?, Journal of Machine Learning Research 15 (2014) 3133–3181

work page 2014
[43]

Demˇ sar, Statistical comparisons of classiﬁers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

J. Demˇ sar, Statistical comparisons of classiﬁers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

work page 2006
[44]

Rakesh, P

K. Rakesh, P. N. Suganthan, An ensemble of kernel ridge regression for multi-class classiﬁcation, Procedia Computer Science 108 (2017) 375 – 383, international Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland. doi:10.1016/j.procs.2017.05.109

work page doi:10.1016/j.procs.2017.05.109 2017
[45]

Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol

P. Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol. 18, International Biometric SOC 1441 I ST, NW, SUITE 700 Wash- ington, DC, 20005-2210, 1962, p. 263. 34

work page 1962

[1] [1]

LeCun, Y

Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436. 29

work page 2015

[2] [2]

Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117. doi:10.1016/j.neunet.2014.09.003

work page doi:10.1016/j.neunet.2014.09.003 2015

[3] [3]

P. N. Suganthan, On non-iterative learning algorithms with closed-form solution, Applied Soft Computing 70 (2018) 1078 – 1082. doi:10.1016/j. asoc.2018.07.013

work page doi:10.1016/j 2018

[4] [4]

Olson, A

M. Olson, A. Wyner, R. Berk, Modern neural networks generalize on small data sets, in: Advances in Neural Information Processing Systems 31, 2018, pp. 3623–3632

work page 2018

[5] [5]

P. Guo, C. Chen, Y. Sun, An exact supervised learning for a three-layer supervised neural network, in: Proceedings of the International Conference on neural Information Processing (ICONIP’95), 1995, pp. 1041–1044

work page 1995

[6] [6]

A VEST of the Pseudoinverse Learning Algorithm

P. Guo, A vest of the pseudoinverse learning algorithm, in: arXiv, https://arxiv.org/pdf/1805.07828, 2018, pp. 1–5. doi:https://arxiv. org/pdf/1805.07828

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Berry, M

H. Berry, M. Quoy, Structure and dynamics of random recurrent neural networks, Adaptive Behavior 14 (2) (2006) 129–137

work page 2006

[8] [8]

W. F. Schmidt, M. A. Kraaijveld, R. P. Duin, Feedforward neural networks with random weights, in: Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, IEEE, 1992, pp. 1–4

work page 1992

[9] [9]

H. A. T. Braake, G. V. Straten, Random activation weight neural net (rawn) for fast non-iterative training, Engineering Applications of Artiﬁcial Intelligence 8 (1) (1995) 71 – 80. doi:10.1016/0952-1976(94)00056-S

work page doi:10.1016/0952-1976(94)00056-s 1995

[10] [10]

Widrow, A

B. Widrow, A. Greenblatt, Y. Kim, D. Park, The no-prop algorithm: A new learning algorithm for multilayer neural networks, Neural Networks 37 (2013) 182 – 188, twenty-ﬁfth Anniversay Commemorative Issue. doi: 10.1016/j.neunet.2012.09.020. 30

work page doi:10.1016/j.neunet.2012.09.020 2013

[11] [11]

White, Chapter 9 approximate nonlinear forecasting methods, Vol

H. White, Chapter 9 approximate nonlinear forecasting methods, Vol. 1 of Handbook of Economic Forecasting, Elsevier, 2006, pp. 459 – 512. doi: 10.1016/S1574-0706(05)01009-8

work page doi:10.1016/s1574-0706(05)01009-8 2006

[12] [12]

Y. H. Pao, Y. Takefuji, Functional-link net computing: theory, system architecture, and functionalities, IEEE Computer 25 (5) (1992) 76–79.doi: 10.1109/2.144401

work page doi:10.1109/2.144401 1992

[13] [13]

Zhang, P

L. Zhang, P. N. Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10) (2017) 3243–3253. doi:10.1109/TCYB.2016.2588526

work page doi:10.1109/tcyb.2016.2588526 2017

[14] [14]

Zhang, P

L. Zhang, P. N. Suganthan, Benchmarking ensemble classiﬁers with novel co-trained kernel ridge regression and random vector functional link ensem- bles [research frontier], IEEE Computational Intelligence Magazine 12 (4) (2017) 61–72. doi:10.1109/MCI.2017.2742867

work page doi:10.1109/mci.2017.2742867 2017

[15] [15]

Katuwal, P

R. Katuwal, P. Suganthan, L. Zhang, An ensemble of decision trees with random vector functional link networks for multi-class classiﬁcation, Ap- plied Soft Computing 70 (2018) 1146 – 1153. doi:10.1016/j.asoc.2017. 09.020

work page doi:10.1016/j.asoc.2017 2018

[16] [16]

Vukovi, M

N. Vukovi, M. Petrovi, Z. Miljkovi, A comprehensive experimental eval- uation of orthogonal polynomial expanded random vector functional link neural networks for regression, Applied Soft Computing 70 (2018) 1083 –

work page 2018

[17] [17]

doi:10.1016/j.asoc.2017.10.010

work page doi:10.1016/j.asoc.2017.10.010 2017

[18] [18]

L. Tang, Y. Wu, L. Yu, A non-iterative decomposition-ensemble learning paradigm using rvﬂ network for crude oil price forecasting, Applied Soft Computing 70 (2018) 1097 – 1108. doi:10.1016/j.asoc.2017.02.013

work page doi:10.1016/j.asoc.2017.02.013 2018

[19] [19]

Y. Dash, S. K. Mishra, S. Sahany, B. K. Panigrahi, Indian summer monsoon rainfall prediction: A comparison of iterative and non-iterative approaches, Applied Soft Computing 70 (2018) 1122 – 1134. doi:10.1016/j.asoc. 2017.08.055. 31

work page doi:10.1016/j.asoc 2018

[20] [20]

Pao, G.-H

Y.-H. Pao, G.-H. Park, D. J. Sobajic, Learning and generalization charac- teristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163 – 180. doi:10.1016/0925-2312(94)90053-1

work page doi:10.1016/0925-2312(94)90053-1 1994

[21] [21]

Katuwal, P

R. Katuwal, P. N. Suganthan, Enhancing multi-class classiﬁcation of ran- dom forest using random vector functional neural network and oblique de- cision surfaces, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. doi:10.1109/IJCNN.2018.8489738

work page doi:10.1109/ijcnn.2018.8489738 2018

[22] [22]

Zhang, P

L. Zhang, P. N. Suganthan, A comprehensive evaluation of random vector functional link networks, Information Sciences 367-368 (2016) 1094 – 1105. doi:10.1016/j.ins.2015.09.025

work page doi:10.1016/j.ins.2015.09.025 2016

[23] [23]

Y. Ren, P. N. Suganthan, N. Srikanth, G. Amaratunga, Random vector functional link network for short-term electricity load demand forecasting, Information Sciences 367-368 (2016) 1078 – 1093. doi:10.1016/j.ins. 2015.11.039

work page doi:10.1016/j.ins 2016

[24] [24]

M. J. Kearns, U. V. Vazirani, U. Vazirani, An introduction to computa- tional learning theory, MIT press, 1994

work page 1994

[25] [25]

A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensem- bles of relatively shallow networks, in: Advances in Neural Information Processing Systems, 2016, pp. 550–558

work page 2016

[26] [26]

Snapshot Ensembles: Train 1, get M for free

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, K. Q. Weinberger, Snap- shot ensembles: Train 1, get m for free, arXiv preprint arXiv:1704.00109

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Y. Ren, L. Zhang, P. N. Suganthan, Ensemble classiﬁcation and regression- recent developments, applications and future directions [review article], IEEE Computational Intelligence Magazine 11 (1) (2016) 41–53. doi: 10.1109/MCI.2015.2471235

work page doi:10.1109/mci.2015.2471235 2016

[28] [28]

Gallicchio, A

C. Gallicchio, A. Micheli, L. Pedrelli, Design of deep echo state networks, Neural Networks 108 (2018) 33 – 47. doi:10.1016/j.neunet.2018.08. 002. 32

work page doi:10.1016/j.neunet.2018.08 2018

[29] [29]

P. A. Henrquez, G. A. Ruz, Twitter sentiment classiﬁcation based on deep random vector functional link, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–6. doi:10.1109/IJCNN.2018. 8489703

work page doi:10.1109/ijcnn.2018 2018

[30] [30]

¨O. F. Ertu˘ grul, A novel type of activation function in artiﬁcial neural net- works: Trained activation function, Neural Networks 99 (2018) 148 – 157. doi:10.1016/j.neunet.2018.01.007

work page doi:10.1016/j.neunet.2018.01.007 2018

[31] [31]

Zhang, J

Y. Zhang, J. Wu, Z. Cai, B. Du, P. S. Yu, An unsupervised parameter learning model for RVFL neural network, Neural Networks 112 (2019) 85 – 97. doi:10.1016/j.neunet.2019.01.007

work page doi:10.1016/j.neunet.2019.01.007 2019

[32] [32]

B. K. Verma, J. J. Mulawka, A modiﬁed backpropagation algorithm, in: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Vol. 2, 1994, pp. 840–844 vol.2. doi:10.1109/ICNN.1994. 374289

work page doi:10.1109/icnn.1994 1994

[33] [33]

Huang, Q.-Y

G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (1) (2006) 489 – 501, neural Net- works. doi:10.1016/j.neucom.2005.12.126

work page doi:10.1016/j.neucom.2005.12.126 2006

[34] [34]

A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, Siam Journal on Imaging Sciences 2 (1) (2009) 183–202

work page 2009

[35] [35]

J. Tang, C. Deng, G. Huang, Extreme learning machine for multilayer perceptron, IEEE Transactions on Neural Networks and Learning Systems 27 (4) (2016) 809–821. doi:10.1109/TNNLS.2015.2424995

work page doi:10.1109/tnnls.2015.2424995 2016

[36] [36]

Gallicchio, A

C. Gallicchio, A. Micheli, L. Pedrelli, Deep reservoir computing: A critical experimental analysis, Neurocomputing 268 (2017) 87 – 99, advances in artiﬁcial neural networks, machine learning and computational intelligence. doi:10.1016/j.neucom.2016.12.089. 33

work page doi:10.1016/j.neucom.2016.12.089 2017

[37] [37]

Zhang, J

Y. Zhang, J. Duchi, M. Wainwright, Divide and conquer kernel ridge re- gression: A distributed algorithm with minimax optimal rates, The Journal of Machine Learning Research 16 (1) (2015) 3299–3340

work page 2015

[38] [38]

G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507

work page 2006

[39] [39]

Vincent, H

P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, New York, NY, USA, 2008, pp. 1096–1103. doi:10.1145/1390156.1390294

work page doi:10.1145/1390156.1390294 2008

[40] [40]

G. E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527–1554, pMID: 16764513. doi:10.1162/neco.2006.18.7.1527

work page doi:10.1162/neco.2006.18.7.1527 2006

[41] [41]

Salakhutdinov, G

R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in: Artiﬁcial in- telligence and statistics, 2009, pp. 448–455

work page 2009

[42] [42]

Fern´ andez-Delgado, E

M. Fern´ andez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?, Journal of Machine Learning Research 15 (2014) 3133–3181

work page 2014

[43] [43]

Demˇ sar, Statistical comparisons of classiﬁers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

J. Demˇ sar, Statistical comparisons of classiﬁers over multiple data sets, Journal of Machine learning research 7 (Jan) (2006) 1–30

work page 2006

[44] [44]

Rakesh, P

K. Rakesh, P. N. Suganthan, An ensemble of kernel ridge regression for multi-class classiﬁcation, Procedia Computer Science 108 (2017) 375 – 383, international Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland. doi:10.1016/j.procs.2017.05.109

work page doi:10.1016/j.procs.2017.05.109 2017

[45] [45]

Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol

P. Nemenyi, Distribution-free multiple comparisons, in: Biometrics, Vol. 18, International Biometric SOC 1441 I ST, NW, SUITE 700 Wash- ington, DC, 20005-2210, 1962, p. 263. 34

work page 1962