pith. sign in

arxiv: 2502.17500 · v3 · submitted 2025-02-21 · 💻 cs.LG · cs.AI

Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS

Pith reviewed 2026-05-23 02:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords Euler logarithmgeneralized entropyBregman divergencenatural gradientmirror descentgeneralized cross-entropybackpropagation
0
0 comments X

The pith

The two-parameter Euler logarithm unifies many generalized entropies and serves as a link function for generalized exponentiated gradient, mirror descent, and natural gradient methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the Euler (a,b)-logarithm and its inverse deformed exponential as a common kernel that recovers Tsallis, Kaniadakis, and several other one- and two-parameter logarithms under suitable choices of a and b. It derives the parameter domains that keep the function monotonic, concave, and invertible, then uses these properties to embed the logarithm inside Bregman divergences. This construction yields concrete generalized exponentiated gradient and mirror descent updates, an Euler-based generalized cross-entropy loss whose back-propagation formulas are given explicitly, and a natural-gradient scheme in which the two parameters separately control tail behavior and local curvature via the Fisher information matrix.

Core claim

The Euler (a,b)-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes are introduced in which the Euler (a,b)-logarithm acts as a flexible link function in the underlying Bregman divergence. An Euler-based Generalized Cross-Entropy (GCE) loss is proposed for deep neural networks, together with its exact backpropagation formulas and seamless integration with Fisher-Rao Natural Gradient descent, where the two deformation parameters decouple tail robustness from local gradient shaping.

What carries the argument

The Euler (a,b)-logarithm, used as the link function inside Bregman divergences to generate the generalized algorithms.

If this is right

  • Generalized EG and MD updates become available for any pair (a,b) inside the valid domain.
  • Exact back-propagation rules exist for the Euler-based GCE loss in deep networks.
  • Fisher-Rao natural gradient descent can be realized with a diagonal approximation that isolates the two deformation parameters.
  • Tail robustness and local gradient shaping are controlled independently by a and b.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same link-function construction could be applied directly to online portfolio selection algorithms listed in the title.
  • Different (a,b) pairs might be selected per layer or per task to match data tail properties without changing the overall optimizer structure.
  • The integral and series representations derived for the logarithm may yield new closed-form expressions for other information-theoretic quantities.

Load-bearing premise

The chosen ranges of a and b must keep the logarithm monotonic, concave, and invertible so that it can serve as a valid link inside Bregman divergences.

What would settle it

Run the proposed GEG or GCE algorithms with parameter pairs lying outside the clarified monotonicity domains and check whether the iterates remain valid probability vectors or whether the loss ceases to decrease.

Figures

Figures reproduced from arXiv: 2502.17500 by Andrzej Cichocki.

Figure 1
Figure 1. Figure 1: Surface plots of the Euler (a, b)-logarithm for various values of hyperparameters a and b. These figures illustrate the (a, b)-logarithm in terms of b and x for fixed a = −0.3 and a = −1.1. Furthermore, by applying nonlinear transformation in (34) x → exp(logT q (x)) (35) 9 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Surface plots of the Kaniadakis-Scarfone (κ, λ)-logarithm for various values of hyperpa￾rameters λ and κ. These figures illustrate the (λ, κ)-logarithm in terms of α and x for fixed λ = 0.7. The black continuous line represents the reference of the standard natural logarithm, which is ob￾tained for a = b = 0. Similarly, the Tempesta two-parameters (κ, α)-logarithm defined as [54, 14] logTe α,κ(x) = 1 (1 + … view at source ↗
Figure 3
Figure 3. Figure 3: Surface plots of the Tempesta (α, κ)-logarithm for various values of hyperparameters α and kappa. These figures illustrate the (α, κ)-logarithm in terms of α and x for fixed κ = 0.7 and κ = −0.9 The black continuous line represents the reference of the standard logarithm, which is obtained for α = 1 and κ = 0. where λ = (a − b)/a, x˜ = (a − b)x, q = (λ + 1)/λ = (2a − b)/(a − b) and Wq is the Lambert–Tsalli… view at source ↗
read the original abstract

This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schw\"ammle--Tsallis, Kaniadakis--Scarfone, and Tempesta-type logarithms and their inverse exponentials. In this way, the Euler $(a,b)$-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, we extend applications of the Euler logarithm to modern machine learning and optimization. We introduce generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes in which the Euler $(a,b)$-logarithm acts as a flexible link function in the underlying Bregman divergence. In addition, we propose an Euler-based Generalized Cross-Entropy (GCE) loss for deep neural networks, derive its exact backpropagation formulas, and detail its seamless integration with Fisher-Rao Natural Gradient (NG) descent. By isolating the Fisher Information Matrix (FIM) and developing a diagonal NG approximation, we demonstrate how the two deformation parameters successfully decouple tail robustness from local gradient shaping.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper defines the two-parameter Euler (a,b)-logarithm and its inverse (a,b)-exponential, clarifies the (a,b) domains ensuring monotonicity, concavity and invertibility, derives series/integral representations, and shows explicit reductions to Tsallis, Kaniadakis, Schwämmle–Tsallis and related one- and two-parameter deformations. It then uses the Euler logarithm as the link function inside Bregman divergences to define generalized exponentiated gradient and mirror descent updates, introduces an Euler-based generalized cross-entropy loss with exact back-propagation rules, and combines it with a diagonal Fisher-Rao natural-gradient approximation that isolates the two parameters to separately control tail robustness and local gradient shaping.

Significance. If the domain clarification and the associated convexity/invertibility proofs hold, the work supplies a single two-parameter kernel that recovers a broad family of generalized entropies and supplies concrete algorithmic extensions (GEG, MD, GCE loss, exact back-prop, diagonal NG) whose parameter decoupling is potentially useful for robust deep-learning training. The explicit links to existing deformations and the machine-learning instantiations would constitute a genuine unification rather than a reparametrization.

major comments (1)
  1. [Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.
minor comments (2)
  1. [Introduction and definitions] Notation for the two-parameter exponential and its inverse should be introduced once with a single consistent symbol rather than alternating between “deformed exponential” and “(a,b)-exponential.”
  2. [Natural-gradient subsection] The diagonal FIM approximation is presented without a quantitative error bound or comparison to the full-matrix NG baseline; a short remark on the approximation quality would strengthen the NG section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the importance of rigorous verification of the domain properties. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.

    Authors: We agree that explicit second-derivative analysis and sign charts are necessary to fully substantiate the claimed (a,b) domains. While the manuscript states the domains that ensure monotonicity, concavity, and invertibility, the detailed derivative sign analysis and convexity proofs for the interior and boundary points are only outlined rather than presented with complete charts. In the revision we will expand the parameter-domain section and add a dedicated appendix containing (i) the first- and second-derivative expressions, (ii) sign charts over the interior and boundary of the claimed region, and (iii) explicit verification that the second derivative is strictly negative (hence strict concavity of the Euler logarithm and convexity of the associated mirror map) together with global invertibility. These additions will directly confirm the validity of the Bregman link for all listed deformations and support the parameter-decoupling claims in the algorithmic sections. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations and applications are self-contained

full rationale

The paper defines the Euler (a,b)-logarithm, derives its monotonicity/concavity/invertibility domains, series/integral forms, and explicit mappings to Tsallis/Kaniadakis/etc. deformations directly from its functional form, then uses the resulting object as a link inside newly proposed Bregman-based GEG/MD schemes and GCE loss with exact backprop and diagonal NG. No equations reduce a claimed prediction or uniqueness result to a fitted input or prior self-citation by construction; the ML extensions introduce independent algorithmic content grounded in the derived link function rather than tautological re-use of inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; free parameters are the two deformation parameters whose domains control key properties. Axioms concern the function's monotonicity and invertibility for valid use in divergences. No invented entities are introduced.

free parameters (1)
  • a, b
    Two deformation parameters whose domains are clarified to ensure monotonicity, concavity, and invertibility for use as link functions.
axioms (1)
  • domain assumption The generalized logarithm is monotonic, concave, and invertible for appropriate parameter domains
    Abstract states that these domains are clarified to establish the function as a valid kernel for entropies and divergences.

pith-pipeline@v0.9.0 · 5789 in / 1314 out tokens · 35684 ms · 2026-05-23T02:44:43.881827+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Abe, S. (1997). A note on the q-deformation-theoretic as pect of the generalized entropies in nonextensive physics. Physics Letters A , 224(6):326–330

  2. [2]

    Amari, S. (2009). Alpha-divergence is unique, belongin g to both f-divergence and Bregman divergence classes. IEEE Transactions on Informations Theory , 55, 4925–4931

  3. [3]

    Amari, S.; Nagaoka, H. (2000). Methods of Information Geometry . Oxford University Press, New York. 17

  4. [4]

    Amari, S. (2009). Information geometry and its applicat ions: Convex function and dually flat manifold. In Emerging Trends in Visual Computing ; Nielsen, F., Ed. Springer Lecture Notes in Computer Science, pp. 75–102

  5. [5]

    and Cichocki, A

    Amari, S. and Cichocki, A. (2010). Information geometry of divergence functions. Bulletin of the Polish Academy of Science , 58, 183–195

  6. [6]

    Amid, E., and Warmuth, M. K. (2020). Reparameterizing mi rror descent as gradient descent. In Proceedings of the 34th International Conference on Neur al Information Processing Systems (NIPS’20), Curran Associates Inc., Red Hook, NY, USA, Article 706, 843 0-8439

  7. [7]

    and Warmuth, M

    Amid, E. and Warmuth, M. K. (2020). Winnowing with Gradie nt Descent. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory, PMLR 125:163-182

  8. [8]

    and Teboulle, M

    Beck, A. and Teboulle, M. (2003). Mirror descent and nonli near projected subgradient methods for convex optimization. Operations Research Letters, 31(3), pp.167-175

  9. [9]

    Charnes, A

    Ben-Tal, A. Charnes, A. and Teboulle, M. (1989). Entropic means. Journal of Mathematical Analysis and Applications , 139(2):537–551

  10. [10]

    and Roditi, I

    Borges, E.P. and Roditi, I. (1998). A family of nonextens ive entropies. Physics Letters A , 246(5):399–402

  11. [11]

    Bregman, L. (1967). The relaxation method of finding a com mon point of convex sets and its application to the solution of problems in convex programmi ng. Comp. Math. Phys., USSR , 7, 200–217

  12. [12]

    Canturk, B., Oikonomou, T., and Baris Bagci, G. (2018). The parameter space and third law of thermodynamics for the Borges-Roditi, Abe and Sharma-Mit tal entropies. International Journal of Modern Physics B 32(24), 1850274

  13. [13]

    and Jagannathan, R

    Chakrabarti, R. and Jagannathan, R. (1991). A (p, q)-os cillator realization of two-parameter quantum algebras. Journal of Physics A: Mathematical and General , 24(13):L711

  14. [14]

    Cichocki, A. (2025). Mirror Descent Using the Tempesta Generalized Multi-parametric Loga- rithms. arXiv preprint arXiv:2506.13984

  15. [15]

    Cichocki, A., Cruces, S., Sarmineto A., Tanaka, T. (202 4). Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portf olio Selection. IEEE Access , https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10807168

  16. [16]

    Cichocki, A.,Tanaka, T., Nielsonm F., Cruces, S. (2025 ). Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies.. Entropy (submitted)

  17. [17]

    Cichocki, A. and S.I. Amari, S.I. (2010). Families of α -β -and γ-divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532–1568

  18. [18]

    and Cruces, S

    Cichocki, A. and Cruces, S. and Amari, S. I. (2011). Gene ralized alpha-beta divergences and their application to robust nonnegative matrix factorizat ion. Entropy, 13(1), pp. 134-170. 18

  19. [19]

    Cichocki, A., Zdunek, R., Phan, A. H. and Amari S.I. (200 9). Nonnegative Ma- trix and Tensor Factorizations. John Wiley and Sons , Chapter 3, pp. 131-202. https://doi.org/10.1002/9780470747278.ch3

  20. [20]

    Cichocki, A.; Zdunek, R.; Amari, S. (2006). Csiszár’s d ivergences for nonnegative matrix factorization: Family of new algorithms. Springer, LNCS-3889 , 3889, 32–39

  21. [21]

    and Richards, B

    Cornford, J., Pogodin, R., Ghosh, A., Sheng, K., Bicknel l, B., Codol, O., Clark, B.A., La- joie, G. and Richards, B. (2024) Brain-like learning with expo nentiated gradients. bioRxiv . https://doi.org/10.1101/2024.10.25.620272

  22. [22]

    B., and R

    Da Silva, G. B., and R. V. Ramos. (2019). The Lambert-Tsal lis Wq function. Physica A: Statistical Mechanics and its Applications . 525: 164-170

  23. [23]

    Euler, L. (1779). De serie Lambertina plurimisque eius insignibus proprietatibus. Acta Academiae Scientiarum Petropolitanae (1779: II, 1783 ) p. 29-51, Sankt Peterburg. Leonardi Euleri Opera Omnia, Series Prima Opera Mathematic a, IV 1921 p. 350-369 ; http://math.dartmouth.edu/ euler.docs/originals/E532 .pdf)

  24. [24]

    Furuichi S. (2010). An axiomatic characterization of a two-parameter extended relative entropy. Journal of mathematical physics , 51(12):123302

  25. [25]

    and Singer, Y

    Ghai, U., Hazan, E. and Singer, Y. (2020). Exponentiate d Gradient Meets Gradient Descent. In Proceedings of the 31st International Conference on Algo rithmic Learning Theory , PMLR 117:386-407. https://doi.org/10.48550/arXiv.1902.019 03

  26. [26]

    and Borges, E.P., (2021)

    Gomez, I.S. and Borges, E.P., (2021). Algebraic structu res and position-dependent mass Schrödinger equation from group entropy theory. Letters in Mathematical Physics, 111(2), p.43

  27. [27]

    and Charvat, F

    Harvda, J. and Charvat, F. (1967). Quantification metho d of classification processes. Concept of structural a-entropy, Kybernetica, 3, 30-45 (1967)

  28. [28]

    He, W., and Jiang, H. (2008). Explicit update vs implici t update. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , Hong Kong, pp. 3441-3447. https://doi.org/10.1109/IJCNN .2008.4634288

  29. [29]

    and Warmuth , M

    Helmbold, D.P., Schapire, R.E., Singer, Y. and Warmuth , M. K. (1998). On-line Portfolio Selection Using Multiplicative Updates. Mathematical Finance , 8: 325-347. https://doi.org/10.1111/1467-9965.00058

  30. [30]

    and Warmuth, M

    Herbster, M. and Warmuth, M. K. (1998). Tracking the Best Expert. Machine Learning , 32:151-178. https://doi.org/10.1023/A:1007424614876

  31. [31]

    Kaniadakis, G. (2002(. Statistical mechanics in the co ntext of special relativity. Physical Review E, 66(5):056125

  32. [32]

    Kaniadakis, G.; Lissia, M. (2004). Editorial on News an d expectations in thermostatistics. Phys. A , 340, XV-XIX. 19

  33. [33]

    and Scarfone, A.M., (2004)

    Kaniadakis, G., Lissia, M. and Scarfone, A.M., (2004). Deformed logarithms and entropies. Physica A: Statistical Mechanics and its Applications , 340(1-3), pp.41-49

  34. [34]

    Kaniadakis, G., Lissia, M., and Scarfone, A.M. (2005). Two-parameter deformations of loga- rithm, exponential, and entropy: A consistent framework fo r generalized statistical mechanics. Physical Review E , 71(4):046128

  35. [35]

    Kaniadakis, G. (2009). Maximum entropy principle and p ower-law tailed distributions. The European Physical Journal B , 70(1), 3-13

  36. [36]

    and Wad a, T., (2017)

    Kaniadakis, G., Scarfone, A.M., Sparavigna, A. and Wad a, T., (2017). Composition law of κ-entropy for statistically independent systems. Physical Review E , 95(5), p.052112

  37. [37]

    and Warmuth, M

    Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gr adient versus Gra- dient Descent for Linear Predictors. Information and Computation , 132:1-63. http://dx.doi.org/10.1006/inco.1996.2612

  38. [38]

    Kivinen, J., Warmuth, M. K. (1995). Additive versus exponentiated gradient updates for linear prediction. In Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing (pp. 209-218). https://doi.org/10.1145/22505 8.225121

  39. [39]

    and Leibler, R.A

    Kullback, S. and Leibler, R.A. (1951). On information a nd sufficiency. The annals of mathe- matical statistics , 22(1):79-86

  40. [40]

    Lambert, J.H. (1758). Observationes varie in mathesin puram. Acta Hel- vetica, Physico-mathematicoanatomico-botanico-medica , Basel, 3, 128-168 (1758). http://www.kuttaka.org/ JHL/L1758c.pdf

  41. [41]

    and Warmuth, M

    Majidi, N., Amid, E., Talebi, H. and Warmuth, M. K. (2021 ). Exponentiated Gradi- ent Reweighting for Robust Training Under Label Noise and Bey ond. ArXiv preprint arXiv:2104.01493

  42. [42]

    McAnally, D.S. (1995). q-exponential and q-gamma func tions. i. q-exponential functionsa. Journal of Mathematical Physics , 36(1):546–573

  43. [43]

    Mittal, D.P. (1975). On some functional equations conc erning entropy, directed divergence and inaccuracy. Metrika, 22(1):35–45, 1975

  44. [44]

    Naudts, J. (2002). Deformed exponentials and logarith ms in generalized thermostatistics. Phys- ica A: Statistical Mechanics and its Applications , 316(1-4):323–334, 2002

  45. [45]

    and Yudin, D

    Nemirovsky, A. and Yudin, D. (1983). Problem Complexit y and Method Efficiency in Opti- mization. John Wiley and Sons , https://doi.org/10.1137/1027074

  46. [46]

    Nock, R., Amid, E., Warmuth, M. K. (2023). Boosting with Tempered Exponential Measures. arXiv preprint arXiv:2306.05487. https://doi.org/10.48 550/arXiv.2306.05487

  47. [47]

    and Wada, T

    Scarfone, A.M., Suyari, H. and Wada, T. (2009). Gauss la w of error revisited in the framework of Sharma-Taneja-Mittal information measure. Central European Journal of Physics , 7, pp.414- 420. 20

  48. [48]

    Shannon, C.E. (1948). A mathematical theory of communi cation. Bell system technical journal , 27(3):379–423, 1948

  49. [49]

    Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107-194

  50. [50]

    Sharma, B.D., and Taneja, I.J. (1975). Entropy of type ( α , β ) and other generalized measures in information theory. Metrika, 22(1):205–215

  51. [51]

    and Tsallis, C

    Schwämmle, V. and Tsallis, C. (2007). Two-parameter ge neralization of the logarithm and ex- ponential functions and Boltzmann-Gibbs-Shannon entropy. Journal of Mathematical Physics , 48 (11), AIP Publishing

  52. [52]

    Taneja, I.J. (2001). Generalized information measure s and their applications. on-line book. URL www. mtm. ufsc. br/taneja/book/book. html

  53. [53]

    Taneja, I.J. (1989). On generalized information measu res and their applications. Advances in Electronics and Electron Physics , 76:327–413

  54. [54]

    Tempesta. P. (2015). A theorem on the existence of trace -form generalized entropies. Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2183), p.20150165

  55. [55]

    Tempesta, P. (2016). Formal groups and Z-entropies. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 472(2195), 20160143

  56. [56]

    Tsallis, C. (1988). Possible generalization of Boltzma nn-Gibbs statistics. Journal of statistical physics, 52(1):479–487

  57. [57]

    Tasllis, C. (1994). What are the numbers that experimen ts provide. Quimica Nova , 17,6, 468–471

  58. [58]

    and Scarfone, A.M

    Wada, T. and Scarfone, A.M. (2010). Finite difference an d averaging operators in generalized entropies. In Journal of Physics: Conference Series , volume 201, page 012005. IOP Publishing. 21