Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS
Pith reviewed 2026-05-23 02:44 UTC · model grok-4.3
The pith
The two-parameter Euler logarithm unifies many generalized entropies and serves as a link function for generalized exponentiated gradient, mirror descent, and natural gradient methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Euler (a,b)-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes are introduced in which the Euler (a,b)-logarithm acts as a flexible link function in the underlying Bregman divergence. An Euler-based Generalized Cross-Entropy (GCE) loss is proposed for deep neural networks, together with its exact backpropagation formulas and seamless integration with Fisher-Rao Natural Gradient descent, where the two deformation parameters decouple tail robustness from local gradient shaping.
What carries the argument
The Euler (a,b)-logarithm, used as the link function inside Bregman divergences to generate the generalized algorithms.
If this is right
- Generalized EG and MD updates become available for any pair (a,b) inside the valid domain.
- Exact back-propagation rules exist for the Euler-based GCE loss in deep networks.
- Fisher-Rao natural gradient descent can be realized with a diagonal approximation that isolates the two deformation parameters.
- Tail robustness and local gradient shaping are controlled independently by a and b.
Where Pith is reading between the lines
- The same link-function construction could be applied directly to online portfolio selection algorithms listed in the title.
- Different (a,b) pairs might be selected per layer or per task to match data tail properties without changing the overall optimizer structure.
- The integral and series representations derived for the logarithm may yield new closed-form expressions for other information-theoretic quantities.
Load-bearing premise
The chosen ranges of a and b must keep the logarithm monotonic, concave, and invertible so that it can serve as a valid link inside Bregman divergences.
What would settle it
Run the proposed GEG or GCE algorithms with parameter pairs lying outside the clarified monotonicity domains and check whether the iterates remain valid probability vectors or whether the loss ceases to decrease.
Figures
read the original abstract
This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schw\"ammle--Tsallis, Kaniadakis--Scarfone, and Tempesta-type logarithms and their inverse exponentials. In this way, the Euler $(a,b)$-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, we extend applications of the Euler logarithm to modern machine learning and optimization. We introduce generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes in which the Euler $(a,b)$-logarithm acts as a flexible link function in the underlying Bregman divergence. In addition, we propose an Euler-based Generalized Cross-Entropy (GCE) loss for deep neural networks, derive its exact backpropagation formulas, and detail its seamless integration with Fisher-Rao Natural Gradient (NG) descent. By isolating the Fisher Information Matrix (FIM) and developing a diagonal NG approximation, we demonstrate how the two deformation parameters successfully decouple tail robustness from local gradient shaping.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines the two-parameter Euler (a,b)-logarithm and its inverse (a,b)-exponential, clarifies the (a,b) domains ensuring monotonicity, concavity and invertibility, derives series/integral representations, and shows explicit reductions to Tsallis, Kaniadakis, Schwämmle–Tsallis and related one- and two-parameter deformations. It then uses the Euler logarithm as the link function inside Bregman divergences to define generalized exponentiated gradient and mirror descent updates, introduces an Euler-based generalized cross-entropy loss with exact back-propagation rules, and combines it with a diagonal Fisher-Rao natural-gradient approximation that isolates the two parameters to separately control tail robustness and local gradient shaping.
Significance. If the domain clarification and the associated convexity/invertibility proofs hold, the work supplies a single two-parameter kernel that recovers a broad family of generalized entropies and supplies concrete algorithmic extensions (GEG, MD, GCE loss, exact back-prop, diagonal NG) whose parameter decoupling is potentially useful for robust deep-learning training. The explicit links to existing deformations and the machine-learning instantiations would constitute a genuine unification rather than a reparametrization.
major comments (1)
- [Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.
minor comments (2)
- [Introduction and definitions] Notation for the two-parameter exponential and its inverse should be introduced once with a single consistent symbol rather than alternating between “deformed exponential” and “(a,b)-exponential.”
- [Natural-gradient subsection] The diagonal FIM approximation is presented without a quantitative error bound or comparison to the full-matrix NG baseline; a short remark on the approximation quality would strengthen the NG section.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the importance of rigorous verification of the domain properties. We address the single major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.
Authors: We agree that explicit second-derivative analysis and sign charts are necessary to fully substantiate the claimed (a,b) domains. While the manuscript states the domains that ensure monotonicity, concavity, and invertibility, the detailed derivative sign analysis and convexity proofs for the interior and boundary points are only outlined rather than presented with complete charts. In the revision we will expand the parameter-domain section and add a dedicated appendix containing (i) the first- and second-derivative expressions, (ii) sign charts over the interior and boundary of the claimed region, and (iii) explicit verification that the second derivative is strictly negative (hence strict concavity of the Euler logarithm and convexity of the associated mirror map) together with global invertibility. These additions will directly confirm the validity of the Bregman link for all listed deformations and support the parameter-decoupling claims in the algorithmic sections. revision: yes
Circularity Check
No circularity: derivations and applications are self-contained
full rationale
The paper defines the Euler (a,b)-logarithm, derives its monotonicity/concavity/invertibility domains, series/integral forms, and explicit mappings to Tsallis/Kaniadakis/etc. deformations directly from its functional form, then uses the resulting object as a link inside newly proposed Bregman-based GEG/MD schemes and GCE loss with exact backprop and diagonal NG. No equations reduce a claimed prediction or uniqueness result to a fitted input or prior self-citation by construction; the ML extensions introduce independent algorithmic content grounded in the derived link function rather than tautological re-use of inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- a, b
axioms (1)
- domain assumption The generalized logarithm is monotonic, concave, and invertible for appropriate parameter domains
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the Euler (a,b)-logarithm … acts as a flexible link function in the underlying Bregman divergence … two deformation parameters successfully decouple tail robustness from local gradient shaping
-
IndisputableMonolith/Foundation/BranchSelection.leanalpha_pin_under_high_calibration echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
clarify the parameter domains that guarantee monotonicity, concavity, and invertibility
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abe, S. (1997). A note on the q-deformation-theoretic as pect of the generalized entropies in nonextensive physics. Physics Letters A , 224(6):326–330
work page 1997
-
[2]
Amari, S. (2009). Alpha-divergence is unique, belongin g to both f-divergence and Bregman divergence classes. IEEE Transactions on Informations Theory , 55, 4925–4931
work page 2009
-
[3]
Amari, S.; Nagaoka, H. (2000). Methods of Information Geometry . Oxford University Press, New York. 17
work page 2000
-
[4]
Amari, S. (2009). Information geometry and its applicat ions: Convex function and dually flat manifold. In Emerging Trends in Visual Computing ; Nielsen, F., Ed. Springer Lecture Notes in Computer Science, pp. 75–102
work page 2009
-
[5]
Amari, S. and Cichocki, A. (2010). Information geometry of divergence functions. Bulletin of the Polish Academy of Science , 58, 183–195
work page 2010
-
[6]
Amid, E., and Warmuth, M. K. (2020). Reparameterizing mi rror descent as gradient descent. In Proceedings of the 34th International Conference on Neur al Information Processing Systems (NIPS’20), Curran Associates Inc., Red Hook, NY, USA, Article 706, 843 0-8439
work page 2020
-
[7]
Amid, E. and Warmuth, M. K. (2020). Winnowing with Gradie nt Descent. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory, PMLR 125:163-182
work page 2020
-
[8]
Beck, A. and Teboulle, M. (2003). Mirror descent and nonli near projected subgradient methods for convex optimization. Operations Research Letters, 31(3), pp.167-175
work page 2003
-
[9]
Ben-Tal, A. Charnes, A. and Teboulle, M. (1989). Entropic means. Journal of Mathematical Analysis and Applications , 139(2):537–551
work page 1989
-
[10]
Borges, E.P. and Roditi, I. (1998). A family of nonextens ive entropies. Physics Letters A , 246(5):399–402
work page 1998
-
[11]
Bregman, L. (1967). The relaxation method of finding a com mon point of convex sets and its application to the solution of problems in convex programmi ng. Comp. Math. Phys., USSR , 7, 200–217
work page 1967
-
[12]
Canturk, B., Oikonomou, T., and Baris Bagci, G. (2018). The parameter space and third law of thermodynamics for the Borges-Roditi, Abe and Sharma-Mit tal entropies. International Journal of Modern Physics B 32(24), 1850274
work page 2018
-
[13]
Chakrabarti, R. and Jagannathan, R. (1991). A (p, q)-os cillator realization of two-parameter quantum algebras. Journal of Physics A: Mathematical and General , 24(13):L711
work page 1991
- [14]
-
[15]
Cichocki, A., Cruces, S., Sarmineto A., Tanaka, T. (202 4). Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portf olio Selection. IEEE Access , https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10807168
-
[16]
Cichocki, A.,Tanaka, T., Nielsonm F., Cruces, S. (2025 ). Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies.. Entropy (submitted)
work page 2025
-
[17]
Cichocki, A. and S.I. Amari, S.I. (2010). Families of α -β -and γ-divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532–1568
work page 2010
-
[18]
Cichocki, A. and Cruces, S. and Amari, S. I. (2011). Gene ralized alpha-beta divergences and their application to robust nonnegative matrix factorizat ion. Entropy, 13(1), pp. 134-170. 18
work page 2011
-
[19]
Cichocki, A., Zdunek, R., Phan, A. H. and Amari S.I. (200 9). Nonnegative Ma- trix and Tensor Factorizations. John Wiley and Sons , Chapter 3, pp. 131-202. https://doi.org/10.1002/9780470747278.ch3
-
[20]
Cichocki, A.; Zdunek, R.; Amari, S. (2006). Csiszár’s d ivergences for nonnegative matrix factorization: Family of new algorithms. Springer, LNCS-3889 , 3889, 32–39
work page 2006
-
[21]
Cornford, J., Pogodin, R., Ghosh, A., Sheng, K., Bicknel l, B., Codol, O., Clark, B.A., La- joie, G. and Richards, B. (2024) Brain-like learning with expo nentiated gradients. bioRxiv . https://doi.org/10.1101/2024.10.25.620272
- [22]
-
[23]
Euler, L. (1779). De serie Lambertina plurimisque eius insignibus proprietatibus. Acta Academiae Scientiarum Petropolitanae (1779: II, 1783 ) p. 29-51, Sankt Peterburg. Leonardi Euleri Opera Omnia, Series Prima Opera Mathematic a, IV 1921 p. 350-369 ; http://math.dartmouth.edu/ euler.docs/originals/E532 .pdf)
work page 1921
-
[24]
Furuichi S. (2010). An axiomatic characterization of a two-parameter extended relative entropy. Journal of mathematical physics , 51(12):123302
work page 2010
-
[25]
Ghai, U., Hazan, E. and Singer, Y. (2020). Exponentiate d Gradient Meets Gradient Descent. In Proceedings of the 31st International Conference on Algo rithmic Learning Theory , PMLR 117:386-407. https://doi.org/10.48550/arXiv.1902.019 03
-
[26]
Gomez, I.S. and Borges, E.P., (2021). Algebraic structu res and position-dependent mass Schrödinger equation from group entropy theory. Letters in Mathematical Physics, 111(2), p.43
work page 2021
-
[27]
Harvda, J. and Charvat, F. (1967). Quantification metho d of classification processes. Concept of structural a-entropy, Kybernetica, 3, 30-45 (1967)
work page 1967
-
[28]
He, W., and Jiang, H. (2008). Explicit update vs implici t update. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , Hong Kong, pp. 3441-3447. https://doi.org/10.1109/IJCNN .2008.4634288
-
[29]
Helmbold, D.P., Schapire, R.E., Singer, Y. and Warmuth , M. K. (1998). On-line Portfolio Selection Using Multiplicative Updates. Mathematical Finance , 8: 325-347. https://doi.org/10.1111/1467-9965.00058
-
[30]
Herbster, M. and Warmuth, M. K. (1998). Tracking the Best Expert. Machine Learning , 32:151-178. https://doi.org/10.1023/A:1007424614876
-
[31]
Kaniadakis, G. (2002(. Statistical mechanics in the co ntext of special relativity. Physical Review E, 66(5):056125
work page 2002
-
[32]
Kaniadakis, G.; Lissia, M. (2004). Editorial on News an d expectations in thermostatistics. Phys. A , 340, XV-XIX. 19
work page 2004
-
[33]
Kaniadakis, G., Lissia, M. and Scarfone, A.M., (2004). Deformed logarithms and entropies. Physica A: Statistical Mechanics and its Applications , 340(1-3), pp.41-49
work page 2004
-
[34]
Kaniadakis, G., Lissia, M., and Scarfone, A.M. (2005). Two-parameter deformations of loga- rithm, exponential, and entropy: A consistent framework fo r generalized statistical mechanics. Physical Review E , 71(4):046128
work page 2005
-
[35]
Kaniadakis, G. (2009). Maximum entropy principle and p ower-law tailed distributions. The European Physical Journal B , 70(1), 3-13
work page 2009
-
[36]
Kaniadakis, G., Scarfone, A.M., Sparavigna, A. and Wad a, T., (2017). Composition law of κ-entropy for statistically independent systems. Physical Review E , 95(5), p.052112
work page 2017
-
[37]
Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gr adient versus Gra- dient Descent for Linear Predictors. Information and Computation , 132:1-63. http://dx.doi.org/10.1006/inco.1996.2612
-
[38]
Kivinen, J., Warmuth, M. K. (1995). Additive versus exponentiated gradient updates for linear prediction. In Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing (pp. 209-218). https://doi.org/10.1145/22505 8.225121
-
[39]
Kullback, S. and Leibler, R.A. (1951). On information a nd sufficiency. The annals of mathe- matical statistics , 22(1):79-86
work page 1951
-
[40]
Lambert, J.H. (1758). Observationes varie in mathesin puram. Acta Hel- vetica, Physico-mathematicoanatomico-botanico-medica , Basel, 3, 128-168 (1758). http://www.kuttaka.org/ JHL/L1758c.pdf
-
[41]
Majidi, N., Amid, E., Talebi, H. and Warmuth, M. K. (2021 ). Exponentiated Gradi- ent Reweighting for Robust Training Under Label Noise and Bey ond. ArXiv preprint arXiv:2104.01493
-
[42]
McAnally, D.S. (1995). q-exponential and q-gamma func tions. i. q-exponential functionsa. Journal of Mathematical Physics , 36(1):546–573
work page 1995
-
[43]
Mittal, D.P. (1975). On some functional equations conc erning entropy, directed divergence and inaccuracy. Metrika, 22(1):35–45, 1975
work page 1975
-
[44]
Naudts, J. (2002). Deformed exponentials and logarith ms in generalized thermostatistics. Phys- ica A: Statistical Mechanics and its Applications , 316(1-4):323–334, 2002
work page 2002
-
[45]
Nemirovsky, A. and Yudin, D. (1983). Problem Complexit y and Method Efficiency in Opti- mization. John Wiley and Sons , https://doi.org/10.1137/1027074
- [46]
-
[47]
Scarfone, A.M., Suyari, H. and Wada, T. (2009). Gauss la w of error revisited in the framework of Sharma-Taneja-Mittal information measure. Central European Journal of Physics , 7, pp.414- 420. 20
work page 2009
-
[48]
Shannon, C.E. (1948). A mathematical theory of communi cation. Bell system technical journal , 27(3):379–423, 1948
work page 1948
-
[49]
Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107-194
work page 2011
-
[50]
Sharma, B.D., and Taneja, I.J. (1975). Entropy of type ( α , β ) and other generalized measures in information theory. Metrika, 22(1):205–215
work page 1975
-
[51]
Schwämmle, V. and Tsallis, C. (2007). Two-parameter ge neralization of the logarithm and ex- ponential functions and Boltzmann-Gibbs-Shannon entropy. Journal of Mathematical Physics , 48 (11), AIP Publishing
work page 2007
-
[52]
Taneja, I.J. (2001). Generalized information measure s and their applications. on-line book. URL www. mtm. ufsc. br/taneja/book/book. html
work page 2001
-
[53]
Taneja, I.J. (1989). On generalized information measu res and their applications. Advances in Electronics and Electron Physics , 76:327–413
work page 1989
-
[54]
Tempesta. P. (2015). A theorem on the existence of trace -form generalized entropies. Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2183), p.20150165
work page 2015
-
[55]
Tempesta, P. (2016). Formal groups and Z-entropies. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 472(2195), 20160143
work page 2016
-
[56]
Tsallis, C. (1988). Possible generalization of Boltzma nn-Gibbs statistics. Journal of statistical physics, 52(1):479–487
work page 1988
-
[57]
Tasllis, C. (1994). What are the numbers that experimen ts provide. Quimica Nova , 17,6, 468–471
work page 1994
-
[58]
Wada, T. and Scarfone, A.M. (2010). Finite difference an d averaging operators in generalized entropies. In Journal of Physics: Conference Series , volume 201, page 012005. IOP Publishing. 21
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.