Fractional-order Backpropagation Neural Networks: Modified Fractional-order Steepest Descent Method for Family of Backpropagation Neural Networks
Pith reviewed 2026-05-25 18:11 UTC · model grok-4.3
The pith
Modified fractional-order steepest descent trains backpropagation networks for superior global optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A modified fractional-order steepest descent method based fractional-order backpropagation neural network has fractional-order global optimal convergence and fractional-order multi-scale global optimization, giving it a more efficient optimal searching capability to determine the global optimal solution than a classic first-order backpropagation neural network.
What carries the argument
Modified fractional-order steepest descent method that performs reverse incremental search in the negative directions of approximate fractional-order partial derivatives of the square error.
If this is right
- The fractional-order network shows improved performance in example function approximation tasks.
- It delivers fractional-order multi-scale global optimization in comparative tests.
- Real data experiments demonstrate advantages over standard backpropagation neural networks.
- The method generalizes classic first-order backpropagation using fractional calculus.
Where Pith is reading between the lines
- The same fractional-order update rule could apply to other gradient descent variants in machine learning.
- It might handle non-convex error surfaces more effectively due to the memory property.
- Similar modifications could appear in control or signal processing applications that already use fractional operators.
Load-bearing premise
The network structure enables fractional-order global optimal convergence and fractional-order multi-scale global optimization analysis.
What would settle it
An experiment on a multimodal test function or real dataset where the classic first-order backpropagation neural network reaches a better or equal global optimum than the modified fractional-order version.
Figures
read the original abstract
This paper offers a novel mathematical approach, the modified Fractional-order Steepest Descent Method (FSDM) for training BackPropagation Neural Networks (BPNNs); this differs from the majority of the previous approaches and as such. A promising mathematical method, fractional calculus, has the potential to assume a prominent role in the applications of neural networks and cybernetics because of its inherent strengths such as long-term memory, nonlocality, and weak singularity. Therefore, to improve the optimization performance of classic first-order BPNNs, in this paper we study whether it could be possible to modified FSDM and generalize classic first-order BPNNs to modified FSDM based Fractional-order Backpropagation Neural Networks (FBPNNs). Motivated by this inspiration, this paper proposes a state-of-the-art application of fractional calculus to implement a modified FSDM based FBPNN whose reverse incremental search is in the negative directions of the approximate fractional-order partial derivatives of the square error. At first, the theoretical concept of a modified FSDM based FBPNN is described mathematically. Then, the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization of a modified FSDM based FBPNN are analysed in detail. Finally, we perform comparative experiments and compare a modified FSDM based FBPNN with a classic first-order BPNN, i.e., an example function approximation, fractional-order multi-scale global optimization, and two comparative performances with real data. The more efficient optimal searching capability of the fractional-order multi-scale global optimization of a modified FSDM based FBPNN to determine the global optimal solution is the major advantage being superior to a classic first-order BPNN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a modified fractional-order steepest descent method (FSDM) for training backpropagation neural networks, generalizing them to fractional-order BPNNs (FBPNNs). It asserts mathematical proofs of fractional-order global optimal convergence and multi-scale global optimization under an 'assumption of the structure', and reports comparative experiments claiming superior global search performance over classic first-order BPNNs on function approximation and real data tasks.
Significance. If the proofs hold and the structural assumption is independently justified, the approach could advance the use of fractional calculus for neural network optimization by providing non-local memory effects that improve global convergence properties.
major comments (2)
- [Abstract] Abstract: The central superiority claim rests on 'the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization', yet no derivation steps, error bounds, or justification for the assumption of the structure are supplied. This assumption is load-bearing for the argument that the modified FSDM enables better global optimality than first-order BPNNs.
- [Theoretical analysis section] Theoretical analysis section: The reverse incremental search is described as following 'negative directions of the approximate fractional-order partial derivatives', but without explicit update rules, convergence analysis, or demonstration that the fractional-order modification does not reduce to a reparameterized first-order method under the stated assumption, the proof cannot be evaluated for circularity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation of major revision. The comments correctly identify areas where additional clarity on the proofs and assumption would strengthen the manuscript. We address each point below and will make the requested expansions in the revised version.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central superiority claim rests on 'the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization', yet no derivation steps, error bounds, or justification for the assumption of the structure are supplied. This assumption is load-bearing for the argument that the modified FSDM enables better global optimality than first-order BPNNs.
Authors: The abstract is a high-level summary; the full derivation steps, error bounds, and justification of the structural assumption appear in the Theoretical Analysis section. We agree that the abstract does not sufficiently signpost these elements or the assumption's role. In revision we will expand the abstract to reference the key proof components and add an explicit justification subsection for the assumption together with error-bound discussion, making its load-bearing status transparent. revision: yes
-
Referee: [Theoretical analysis section] Theoretical analysis section: The reverse incremental search is described as following 'negative directions of the approximate fractional-order partial derivatives', but without explicit update rules, convergence analysis, or demonstration that the fractional-order modification does not reduce to a reparameterized first-order method under the stated assumption, the proof cannot be evaluated for circularity.
Authors: Explicit update rules using the fractional-order partial derivatives are stated in the section, and convergence is shown via the subsequent theorems under the structural assumption. We acknowledge that a direct demonstration that the method does not collapse to a reparameterized first-order scheme is missing and could raise circularity concerns. In the revision we will insert the complete update-rule equations, expand the convergence proof, and add a subsection with a counter-example illustrating the non-local memory effect that produces distinct trajectories from standard first-order BPNNs. revision: yes
Circularity Check
Convergence and multi-scale optimization claims reduce to posited 'assumption of the structure'
specific steps
-
self definitional
[Abstract]
"Then, the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization of a modified FSDM based FBPNN are analysed in detail."
The proof of convergence and multi-scale optimization is explicitly bundled with 'an assumption of the structure' that enables those properties. This makes the claimed global optimality advantage equivalent to the assumption by construction rather than an independent derivation from first principles or external benchmarks.
full rationale
The abstract states that the paper provides 'the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization'. This phrasing indicates the central superiority claim (more efficient global search over first-order BPNN) is analyzed under an assumption of structure that is not shown to be independently derived or verified; the claimed fractional-order advantages are therefore conditional on that modeling choice by construction. No equations or self-citations are quoted that would allow further reduction, but the load-bearing role of the assumption matches the self-definitional pattern at the level of the derivation chain presented.
Axiom & Free-Parameter Ledger
free parameters (1)
- fractional order
axioms (1)
- domain assumption Fractional calculus properties (long-term memory, nonlocality, weak singularity) can be directly transferred to improve the optimization dynamics of first-order gradient descent in neural networks.
Reference graph
Works this paper leans on
-
[1]
EXPERIMENT AND ANALYSIS 4.1 Example function approximation of improved FSDM based FBPNN In this subsection, for the following Examples 1–4, we discuss an example of the function approximation of a FBPNN trained by an improved FSDM. As we know, multilayer networks can be used to approximate virtually any function if we have a sufficient number of neurons i...
work page 2000
-
[2]
CONCLUSIONS The application of fractional calculus to neural networks and cybernetics is an emerging field of study and only a small number of studies have been conducted in this area. The properties of the fractional calculus of a signal are considerably different from those of its integer-order calculus . Fractional calculus has been applied to neural n...
-
[3]
First and Second Order Methods for Learning: Between Steepest Descent and Newton’s Method,
R. Battiti, “First and Second Order Methods for Learning: Between Steepest Descent and Newton’s Method,” Neural Computation, vol. 4, no. 2, pp. 141-166, 1992
work page 1992
-
[4]
P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Scie nces. Ph.D. Thesis, Harvard University, U SA, 1974
work page 1974
-
[5]
Learnin g Representations by Back-Propagating Errors,
D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learnin g Representations by Back-Propagating Errors,” Nature, vol. 323, no. 9, pp. 533-536, 1986
work page 1986
-
[6]
Learning-Logic: Casting the Cortex of the Human Brain in Silicon,
D. B. Parker, “Learning-Logic: Casting the Cortex of the Human Brain in Silicon,” Center for Computati onal Research in Economics and Management Science, MIT, USA, Tech. Rep. Technical Report TR-47 , 1985
work page 1985
-
[7]
Une Procedure d’Apprentissage pour Reseau a Se uil Assymetrique,
Y. L. Cun, “Une Procedure d’Apprentissage pour Reseau a Se uil Assymetrique,” Proceedings of Cognitiva 85: A la Frontiere de l’Intelligence Artificielle des Sciences de la Connaissance des Neurosciences, vol. 85, pp. 599-604, 1985
work page 1985
-
[8]
D. E. Rumelhart, J. L. McClelland and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. vol. 1, Cambridge: MIT Press, 1986
work page 1986
-
[9]
A Survey of Monte Carlo Tree Search Methods,
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowli ng, P. Rohlfshagen, S. Tavener, D. Pere z, S. Samothrakis and S. Colton , “A Survey of Monte Carlo Tree Search Methods,” IEEE Trans. Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1-49, Mar. 2012
work page 2012
-
[10]
Multilayer Fee dforward Networks Are Universal Approximators,
K. M. Hornik, M. Stinchcombe and H. White, “Multilayer Fee dforward Networks Are Universal Approximators,” Neural Networks, vol. 2, no. 5, pp. 359-366, 1989
work page 1989
-
[11]
Accelerating the Convergence of the Backpropagation Method,
T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink and D. L. Alkon, “Accelerating the Convergence of the Backpropagation Method,” Biological Cybernetics, vol. 59, no. 4-5, pp. 257-263, 1988
work page 1988
-
[12]
Increased Rates of Convergence through Learning Rate Adaptation,
R. A. Jacobs, “Increased Rates of Convergence through Learning Rate Adaptation,” Neural Networks, vol. 1, no. 4, pp. 295-308, 1988
work page 1988
-
[13]
SuperSAB: Fast Adaptive Back Propagation with G ood Scaling Properties,
T. Tollenaere, “SuperSAB: Fast Adaptive Back Propagation with G ood Scaling Properties,” Neural Networks, vol. 3, no. 5, pp. 561-573, 1990
work page 1990
-
[14]
Rescaling of Variables in Back Propagation Learning,
A. K. Rigler, J. M. Irvine and T. P. Vogl, “Rescaling of Variables in Back Propagation Learning,” Neural Networks , vol. 4, no. 2, pp. 225-229, 1991
work page 1991
-
[15]
Recent Advances in Numerical Techniques for Large-scale Optimization,
D. F. Shanno, “Recent Advances in Numerical Techniques for Large-scale Optimization,” Neural Networks for Control, Cambrid ge: MIT Press, 1990
work page 1990
-
[16]
First-and Second-order Methods for Learning: Betwe en Steepest Descent and Newton’s Method,
R. Battiti, “First-and Second-order Methods for Learning: Betwe en Steepest Descent and Newton’s Method,” Neural Computation, vol. 4, no. 2, pp. 141-166, 1992
work page 1992
-
[17]
Optimization for Training Neural Nets,
E. Barnard, “Optimization for Training Neural Nets,” IEEE Trans. Neural Networks, vol. 3, no. 2, pp. 232-240, Mar. 1992
work page 1992
-
[18]
Conjugate Gradient Algorithm for Efficient Tr aining of Artificial Neural Networks,
C. Charalambous, “Conjugate Gradient Algorithm for Efficient Tr aining of Artificial Neural Networks,” IEE Proceedings , vol. 139, no. 3, pp. 301-310, Mar. 1992
work page 1992
-
[19]
Training Feedforward Networks wi th the Marquardt Algorithm,
M. T. Hagan and M. B. Menhaj, “Training Feedforward Networks wi th the Marquardt Algorithm,” IEEE Trans. Neural Networks , vol. 5, no. 6, pp. 989-993, Mar. 1994
work page 1994
-
[20]
Cutting Angle Methods in Global Optimization,
M. Andramonov, A. Rubinov and B. Glover, “Cutting Angle Methods in Global Optimization,” Applied Mathematics Letters, vol. 12, no. 3, pp. 95-100, 1999
work page 1999
-
[21]
Simulated Annealing and Weig ht Decay in Adaptive Learning: the SARPROP Algorithm,
N. K. Treadgold and T. D. Gedeon, “Simulated Annealing and Weig ht Decay in Adaptive Learning: the SARPROP Algorithm,” IEEE Trans. Neural Networks, vol. 9, no. 4, pp. 662-668, Jul. 1998
work page 1998
-
[22]
The Annealing Robust Backpropagation (ARBP) Learning Algorithm,
C. C. Chuang, S. F. Su and C. C. Hsiao, “The Annealing Robust Backpropagation (ARBP) Learning Algorithm,” IEEE Trans. Neural Networks, vol. 11, no. 5, pp. 1067-1077, Sept. 2000
work page 2000
-
[23]
An Optimization Methodology for Neural Network Weights and Architectures,
T. B. Ludermir, A. Yamazaki and C. Zanchettin, “An Optimization Methodology for Neural Network Weights and Architectures,” IEEE Trans. Neural Networks, vol. 17, no. 6, pp. 1452-1459, Nov. 2006
work page 2006
-
[24]
W. C. Yeh, “New Parameter-Free Simplified Swarm Optimization fo r Artificial Neural Network Training and its Application in the P rediction of Time Series,” IEEE Trans. Neural Networks and Learning Systems , vol. 24, no. 4, pp. 661-665, Apr. 2013
work page 2013
-
[25]
Genetic Evolution of the Topology and Weight Distribution of Neural Networks,
V. Maniezzo, “Genetic Evolution of the Topology and Weight Distribution of Neural Networks,” IEEE Trans. Neural Networks, vol. 5, no. 1, pp. 39-53, Jan. 1994
work page 1994
-
[26]
Learning Polynomial Feedforward Neu ral Networks by Genetic Programming and Backpropagation,
N. Y. Nikolaev and H. Iba, “Learning Polynomial Feedforward Neu ral Networks by Genetic Programming and Backpropagation,” IEEE Trans. Neural Networks, vol. 14, no. 2, pp. 337-350, Mar. 2003
work page 2003
-
[27]
Tuning of the Structure and Parameters of a Neural Network using an Improved Genetic Algorithm,
F. H. F. Leung, H. K. Lam, S. H. Ling and P. K. S. Tam, “Tuning of the Structure and Parameters of a Neural Network using an Improved Genetic Algorithm,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 79-88, Jan. 2003
work page 2003
-
[28]
Mutation-based Genetic Neural Network,
P. P. Palmes, T. Hayasaka and S. Usui, “Mutation-based Genetic Neural Network,” IEEE Trans. Neural Networks , vol. 16, no. 3, pp. 587-600, May. 2005
work page 2005
-
[29]
E. C. Paz and C. Kamath, “An Empirical Comparison of Combinations of Evolutionary Algorithms and Neural Networks for Classification Problems,” IEEE Trans. Systems, Man and Cybernetics, Part B: Cybernetics, vol. 35, no. 5, pp. 915-927, Oct. 2005
work page 2005
-
[30]
Hybrid Traini ng Method for MLP: Optimization of Architecture and Training,
C. Zanchettin, T. B. Ludernir and L. M. Almeida, “Hybrid Traini ng Method for MLP: Optimization of Architecture and Training,” IEEE Trans. Systems, Man and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 1097-1109, Aug. 2011
work page 2011
-
[31]
K. B. Oldham and J. Spanier, The Fractional Calculus: Integrations and Differentiations of Arbitrary Order. New York: Academic Press, 1974
work page 1974
-
[32]
I. Podlubny, Fractional Differential Equa tions: An Introduction to Fractional Derivatives , Fractional Differen tial Equations, Some Methods of Their Solution and Some of Their Applications . San Diego: Academic Press, 1998
work page 1998
-
[33]
Frac tional Diffusion-Wave Problem in Cylindrical Coordinates,
N. Özdemir, D. Karadeniz, “Frac tional Diffusion-Wave Problem in Cylindrical Coordinates,” Physics Letters A , vol. 372, no. 38, pp. 5968-5972, 2008. 18
work page 2008
-
[34]
Applications of the Fracti onal Calculus to the Theory of Viscoelasticity,
R. C. Koeller, “Applications of the Fracti onal Calculus to the Theory of Viscoelasticity,” Journal of Applied Mechanics , vol. 51, no. 2, pp. 294-298, 1984
work page 1984
-
[35]
Y. A. Rossikhin and M. V. Shitikova, “Applications of Fractiona l Calculus to Dynamic Problems of Linear and Nonlinear Heredi-Tar y Mechanics of Solids,” Applied Mechanics Reviews , vol. 50, no. 1, pp. 15-67, 1997
work page 1997
-
[36]
A Suggestion of Frac tional-Order Controller for Fle xible Spacecraft Attitude Control,
S. Manabe, “A Suggestion of Frac tional-Order Controller for Fle xible Spacecraft Attitude Control,” Nonlinear Dynamic , vol. 29, no. 1, pp. 251-268, 2002
work page 2002
-
[37]
Analogue Realizations of Fractional-Order Controllers,
I. Podlubny, I. Petras, B.M. Vinagre, P. O’Leary and L. Dorcak, “Analogue Realizations of Fractional-Order Controllers,” Nonlinear Dynamics, vol. 29, no.1, pp. 281-296, 2002
work page 2002
-
[38]
Fractional-Order Circuits and Systems: An Emerg ing Interdisciplinary Research Area,
A. S. Elwakil, “Fractional-Order Circuits and Systems: An Emerg ing Interdisciplinary Research Area,” IEEE Circuits Syst. Mag., vol. 10, no. 4, pp. 40-50, Nov. 2010
work page 2010
-
[39]
Y. F. Pu, X. Yuan and B. Yu, “Analog Circuit Implementation of Fractional-Order Memristor: Arbitrary-Order Lattice Scaling Fracmemristor,” IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 65, no. 9, pp. 2903-2916, 2018
work page 2018
-
[40]
Y. F. Pu, J. L. Zhou and X. Yuan, “Fractional Differential Mask : A Fractional Differential Based A pproach for Multiscale Texture Enhancement,” IEEE Transactions on Image Processing, vol. 19, no.2, pp. 491-511, Feb. 2010
work page 2010
-
[41]
Y. F. Pu, P. Siarry, A. Chatterjee, Z. N. Wang, Z. Yi, Y. G. Liu, J. L. Zhou and Y. Wang, “A Fractional-Order Variational Framework for Reti nex: Fractional-Order Partial Differe ntial Equation Based Formulatio n for Multi-scale Nonlocal Contrast E nhancement with Texture Preservi ng,” IEEE Trans. on Image Processing , vol. 27, no. 3, pp. 1214-1229, Mar. 2018
work page 2018
-
[42]
Fractional Extreme Value Adap tive Training Method: Fractional Steepest Descent Approach,
Y. F. Pu, J. L. Zhou, Z. Yi, N. Zhang, G. Huang and S. Patrick, “Fractional Extreme Value Adap tive Training Method: Fractional Steepest Descent Approach,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 4, pp. 653-662, Apr. 2015
work page 2015
-
[43]
Defense against Chip Cloning A ttacks Based on Fractional Hopf ield Neural Networks,
Y. F. Pu, Z. Yi and J. L. Zhou, “Defense against Chip Cloning A ttacks Based on Fractional Hopf ield Neural Networks,” International Journal of Neural Systems, vol. 27, no. 4, Article ID 1750003, 28 pages, 2016
work page 2016
-
[44]
Fractional Hopfield Neural Net works: Fractional Dynamic Associative Recurrent Neural Networks,
Y. F. Pu, Z. Yi and J. L. Zhou, “Fractional Hopfield Neural Net works: Fractional Dynamic Associative Recurrent Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2319-2333, 2017
work page 2017
-
[45]
Dynamics of Fractional-Order Ne ural Networks,
E. Kaslik and S. Sivasundaram, “Dynamics of Fractional-Order Ne ural Networks,” in Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, Aug. 2011, pp. 1375-1380
work page 2011
-
[46]
N. Heymans and I. Podlubny, “Phys ical Interpretation of Initial Conditions for Fractional Differential Equations with Riemann-Liouville Fractional Derivatives,” Rheolgica Acta , vol. 45, no. 5, pp. 765-772, 2006
work page 2006
-
[47]
Petráš, Fractional-Order Nonlinear Systems: Modeling, Analysis and Simulation
I. Petráš, Fractional-Order Nonlinear Systems: Modeling, Analysis and Simulation. Berlin: Springer Berlin Heidelberg, 2011
work page 2011
-
[48]
Approximation by S uperpositions of A Sigmoidal Function,
G. Cybenko, “Approximation by S uperpositions of A Sigmoidal Function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 304-324, 1989
work page 1989
-
[49]
Feedback Stabiliza tion Using Two-Hidden-Layer Ne ts,
E. D. Sontag, “Feedback Stabiliza tion Using Two-Hidden-Layer Ne ts,” IEEE Transactions on Neural Networks, vol. 3, no. 6, pp. 981-990, Nov. 1992
work page 1992
-
[50]
Universal Approximation Bounds for Superpositions of A Sigmoidal Function,
A. R. Barron, “Universal Approximation Bounds for Superpositions of A Sigmoidal Function,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 930-945, May 1993
work page 1993
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.