Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS

Andrzej Cichocki

arxiv: 2502.17500 · v3 · submitted 2025-02-21 · 💻 cs.LG · cs.AI

Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS

Andrzej Cichocki This is my paper

Pith reviewed 2026-05-23 02:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Euler logarithmgeneralized entropyBregman divergencenatural gradientmirror descentgeneralized cross-entropybackpropagation

0 comments

The pith

The two-parameter Euler logarithm unifies many generalized entropies and serves as a link function for generalized exponentiated gradient, mirror descent, and natural gradient methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the Euler (a,b)-logarithm and its inverse deformed exponential as a common kernel that recovers Tsallis, Kaniadakis, and several other one- and two-parameter logarithms under suitable choices of a and b. It derives the parameter domains that keep the function monotonic, concave, and invertible, then uses these properties to embed the logarithm inside Bregman divergences. This construction yields concrete generalized exponentiated gradient and mirror descent updates, an Euler-based generalized cross-entropy loss whose back-propagation formulas are given explicitly, and a natural-gradient scheme in which the two parameters separately control tail behavior and local curvature via the Fisher information matrix.

Core claim

The Euler (a,b)-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes are introduced in which the Euler (a,b)-logarithm acts as a flexible link function in the underlying Bregman divergence. An Euler-based Generalized Cross-Entropy (GCE) loss is proposed for deep neural networks, together with its exact backpropagation formulas and seamless integration with Fisher-Rao Natural Gradient descent, where the two deformation parameters decouple tail robustness from local gradient shaping.

What carries the argument

The Euler (a,b)-logarithm, used as the link function inside Bregman divergences to generate the generalized algorithms.

If this is right

Generalized EG and MD updates become available for any pair (a,b) inside the valid domain.
Exact back-propagation rules exist for the Euler-based GCE loss in deep networks.
Fisher-Rao natural gradient descent can be realized with a diagonal approximation that isolates the two deformation parameters.
Tail robustness and local gradient shaping are controlled independently by a and b.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same link-function construction could be applied directly to online portfolio selection algorithms listed in the title.
Different (a,b) pairs might be selected per layer or per task to match data tail properties without changing the overall optimizer structure.
The integral and series representations derived for the logarithm may yield new closed-form expressions for other information-theoretic quantities.

Load-bearing premise

The chosen ranges of a and b must keep the logarithm monotonic, concave, and invertible so that it can serve as a valid link inside Bregman divergences.

What would settle it

Run the proposed GEG or GCE algorithms with parameter pairs lying outside the clarified monotonicity domains and check whether the iterates remain valid probability vectors or whether the loss ceases to decrease.

Figures

Figures reproduced from arXiv: 2502.17500 by Andrzej Cichocki.

**Figure 2.** Figure 2: Surface plots of the Kaniadakis-Scarfone (κ, λ)-logarithm for various values of hyperparameters λ and κ. These figures illustrate the (λ, κ)-logarithm in terms of α and x for fixed λ = 0.7. The black continuous line represents the reference of the standard natural logarithm, which is obtained for a = b = 0. Similarly, the Tempesta two-parameters (κ, α)-logarithm defined as [54, 14] logTe α,κ(x) = 1 (1 + … view at source ↗

**Figure 3.** Figure 3: Surface plots of the Tempesta (α, κ)-logarithm for various values of hyperparameters α and kappa. These figures illustrate the (α, κ)-logarithm in terms of α and x for fixed κ = 0.7 and κ = −0.9 The black continuous line represents the reference of the standard logarithm, which is obtained for α = 1 and κ = 0. where λ = (a − b)/a, x˜ = (a − b)x, q = (λ + 1)/λ = (2a − b)/(a − b) and Wq is the Lambert–Tsalli… view at source ↗

read the original abstract

This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schw\"ammle--Tsallis, Kaniadakis--Scarfone, and Tempesta-type logarithms and their inverse exponentials. In this way, the Euler $(a,b)$-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, we extend applications of the Euler logarithm to modern machine learning and optimization. We introduce generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes in which the Euler $(a,b)$-logarithm acts as a flexible link function in the underlying Bregman divergence. In addition, we propose an Euler-based Generalized Cross-Entropy (GCE) loss for deep neural networks, derive its exact backpropagation formulas, and detail its seamless integration with Fisher-Rao Natural Gradient (NG) descent. By isolating the Fisher Information Matrix (FIM) and developing a diagonal NG approximation, we demonstrate how the two deformation parameters successfully decouple tail robustness from local gradient shaping.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies several known deformed logarithms under a two-parameter Euler form and sketches their use in mirror descent and natural gradient, but supplies no derivations or checks so the claims remain untested.

read the letter

Colleague, the main takeaway is that this work treats the Euler (a,b)-log as a single kernel covering Tsallis, Kaniadakis, and a few other deformations, then plugs it into Bregman divergences for generalized exponentiated gradient, mirror descent, a generalized cross-entropy loss, exact backpropagation, and a diagonal Fisher-Rao natural gradient step. The two parameters are said to separate tail robustness from local gradient behavior. That is the entire contribution. The unification itself is the part that holds up on the page. The author lists the (a,b) ranges that preserve monotonicity and concavity, maps them onto the cited one- and two-parameter families, and gives series and integral representations. For someone who already tracks these deformations, having the domains collected in one place is convenient bookkeeping. The algorithmic extensions are straightforward re-statements of existing schemes with the new link function substituted in. No new convergence rates, no new regret bounds, and no numerical examples appear. The text describes the backpropagation formulas and the FIM isolation but does not derive them or test whether the claimed decoupling survives the approximation. The stress-test point about strict concavity and global invertibility across the full claimed domain is therefore still open; the abstract asserts the domains work, yet the paper does not show the derivative sign charts or convexity arguments that would confirm it. This is a narrow technical note aimed at the small group already working on deformed entropies and information-geometric optimization. A reader looking for a compact reference on parameter ranges will get some value. It does not contain the independent grounding or verification that would make it required reading outside that circle. I would send it to referees. The unification is limited but cleanly scoped, and the ML applications are at least consistent with prior literature even if they need the missing steps filled in.

Referee Report

1 major / 2 minor

Summary. The paper defines the two-parameter Euler (a,b)-logarithm and its inverse (a,b)-exponential, clarifies the (a,b) domains ensuring monotonicity, concavity and invertibility, derives series/integral representations, and shows explicit reductions to Tsallis, Kaniadakis, Schwämmle–Tsallis and related one- and two-parameter deformations. It then uses the Euler logarithm as the link function inside Bregman divergences to define generalized exponentiated gradient and mirror descent updates, introduces an Euler-based generalized cross-entropy loss with exact back-propagation rules, and combines it with a diagonal Fisher-Rao natural-gradient approximation that isolates the two parameters to separately control tail robustness and local gradient shaping.

Significance. If the domain clarification and the associated convexity/invertibility proofs hold, the work supplies a single two-parameter kernel that recovers a broad family of generalized entropies and supplies concrete algorithmic extensions (GEG, MD, GCE loss, exact back-prop, diagonal NG) whose parameter decoupling is potentially useful for robust deep-learning training. The explicit links to existing deformations and the machine-learning instantiations would constitute a genuine unification rather than a reparametrization.

major comments (1)

[Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.

minor comments (2)

[Introduction and definitions] Notation for the two-parameter exponential and its inverse should be introduced once with a single consistent symbol rather than alternating between “deformed exponential” and “(a,b)-exponential.”
[Natural-gradient subsection] The diagonal FIM approximation is presented without a quantitative error bound or comparison to the full-matrix NG baseline; a short remark on the approximation quality would strengthen the NG section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the importance of rigorous verification of the domain properties. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Parameter-domain section (and any appendix containing the monotonicity/concavity arguments)] The central load-bearing claim is that the clarified (a,b) domains guarantee strict concavity (hence convexity of the mirror map) and global invertibility so that the Euler logarithm can serve as a valid Bregman link across all listed deformations and algorithms. The manuscript states these domains but supplies no derivative sign charts, second-derivative analysis, or explicit convexity proofs for the interior or boundary points of the claimed region; without such verification the unification and the claimed decoupling of tail robustness from gradient shaping remain unconfirmed.

Authors: We agree that explicit second-derivative analysis and sign charts are necessary to fully substantiate the claimed (a,b) domains. While the manuscript states the domains that ensure monotonicity, concavity, and invertibility, the detailed derivative sign analysis and convexity proofs for the interior and boundary points are only outlined rather than presented with complete charts. In the revision we will expand the parameter-domain section and add a dedicated appendix containing (i) the first- and second-derivative expressions, (ii) sign charts over the interior and boundary of the claimed region, and (iii) explicit verification that the second derivative is strictly negative (hence strict concavity of the Euler logarithm and convexity of the associated mirror map) together with global invertibility. These additions will directly confirm the validity of the Bregman link for all listed deformations and support the parameter-decoupling claims in the algorithmic sections. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations and applications are self-contained

full rationale

The paper defines the Euler (a,b)-logarithm, derives its monotonicity/concavity/invertibility domains, series/integral forms, and explicit mappings to Tsallis/Kaniadakis/etc. deformations directly from its functional form, then uses the resulting object as a link inside newly proposed Bregman-based GEG/MD schemes and GCE loss with exact backprop and diagonal NG. No equations reduce a claimed prediction or uniqueness result to a fitted input or prior self-citation by construction; the ML extensions introduce independent algorithmic content grounded in the derived link function rather than tautological re-use of inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; free parameters are the two deformation parameters whose domains control key properties. Axioms concern the function's monotonicity and invertibility for valid use in divergences. No invented entities are introduced.

free parameters (1)

a, b
Two deformation parameters whose domains are clarified to ensure monotonicity, concavity, and invertibility for use as link functions.

axioms (1)

domain assumption The generalized logarithm is monotonic, concave, and invertible for appropriate parameter domains
Abstract states that these domains are clarified to establish the function as a valid kernel for entropies and divergences.

pith-pipeline@v0.9.0 · 5789 in / 1314 out tokens · 35684 ms · 2026-05-23T02:44:43.881827+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the Euler (a,b)-logarithm … acts as a flexible link function in the underlying Bregman divergence … two deformation parameters successfully decouple tail robustness from local gradient shaping
IndisputableMonolith/Foundation/BranchSelection.lean alpha_pin_under_high_calibration echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

clarify the parameter domains that guarantee monotonicity, concavity, and invertibility

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

[1]

Abe, S. (1997). A note on the q-deformation-theoretic as pect of the generalized entropies in nonextensive physics. Physics Letters A , 224(6):326–330

work page 1997
[2]

Amari, S. (2009). Alpha-divergence is unique, belongin g to both f-divergence and Bregman divergence classes. IEEE Transactions on Informations Theory , 55, 4925–4931

work page 2009
[3]

Amari, S.; Nagaoka, H. (2000). Methods of Information Geometry . Oxford University Press, New York. 17

work page 2000
[4]

Amari, S. (2009). Information geometry and its applicat ions: Convex function and dually ﬂat manifold. In Emerging Trends in Visual Computing ; Nielsen, F., Ed. Springer Lecture Notes in Computer Science, pp. 75–102

work page 2009
[5]

and Cichocki, A

Amari, S. and Cichocki, A. (2010). Information geometry of divergence functions. Bulletin of the Polish Academy of Science , 58, 183–195

work page 2010
[6]

Amid, E., and Warmuth, M. K. (2020). Reparameterizing mi rror descent as gradient descent. In Proceedings of the 34th International Conference on Neur al Information Processing Systems (NIPS’20), Curran Associates Inc., Red Hook, NY, USA, Article 706, 843 0-8439

work page 2020
[7]

and Warmuth, M

Amid, E. and Warmuth, M. K. (2020). Winnowing with Gradie nt Descent. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory, PMLR 125:163-182

work page 2020
[8]

and Teboulle, M

Beck, A. and Teboulle, M. (2003). Mirror descent and nonli near projected subgradient methods for convex optimization. Operations Research Letters, 31(3), pp.167-175

work page 2003
[9]

Charnes, A

Ben-Tal, A. Charnes, A. and Teboulle, M. (1989). Entropic means. Journal of Mathematical Analysis and Applications , 139(2):537–551

work page 1989
[10]

and Roditi, I

Borges, E.P. and Roditi, I. (1998). A family of nonextens ive entropies. Physics Letters A , 246(5):399–402

work page 1998
[11]

Bregman, L. (1967). The relaxation method of ﬁnding a com mon point of convex sets and its application to the solution of problems in convex programmi ng. Comp. Math. Phys., USSR , 7, 200–217

work page 1967
[12]

Canturk, B., Oikonomou, T., and Baris Bagci, G. (2018). The parameter space and third law of thermodynamics for the Borges-Roditi, Abe and Sharma-Mit tal entropies. International Journal of Modern Physics B 32(24), 1850274

work page 2018
[13]

and Jagannathan, R

Chakrabarti, R. and Jagannathan, R. (1991). A (p, q)-os cillator realization of two-parameter quantum algebras. Journal of Physics A: Mathematical and General , 24(13):L711

work page 1991
[14]

Cichocki, A. (2025). Mirror Descent Using the Tempesta Generalized Multi-parametric Loga- rithms. arXiv preprint arXiv:2506.13984

work page arXiv 2025
[15]

Cichocki, A., Cruces, S., Sarmineto A., Tanaka, T. (202 4). Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portf olio Selection. IEEE Access , https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10807168

work page
[16]

Cichocki, A.,Tanaka, T., Nielsonm F., Cruces, S. (2025 ). Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies.. Entropy (submitted)

work page 2025
[17]

Cichocki, A. and S.I. Amari, S.I. (2010). Families of α -β -and γ-divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532–1568

work page 2010
[18]

and Cruces, S

Cichocki, A. and Cruces, S. and Amari, S. I. (2011). Gene ralized alpha-beta divergences and their application to robust nonnegative matrix factorizat ion. Entropy, 13(1), pp. 134-170. 18

work page 2011
[19]

Cichocki, A., Zdunek, R., Phan, A. H. and Amari S.I. (200 9). Nonnegative Ma- trix and Tensor Factorizations. John Wiley and Sons , Chapter 3, pp. 131-202. https://doi.org/10.1002/9780470747278.ch3

work page doi:10.1002/9780470747278.ch3
[20]

Cichocki, A.; Zdunek, R.; Amari, S. (2006). Csiszár’s d ivergences for nonnegative matrix factorization: Family of new algorithms. Springer, LNCS-3889 , 3889, 32–39

work page 2006
[21]

and Richards, B

Cornford, J., Pogodin, R., Ghosh, A., Sheng, K., Bicknel l, B., Codol, O., Clark, B.A., La- joie, G. and Richards, B. (2024) Brain-like learning with expo nentiated gradients. bioRxiv . https://doi.org/10.1101/2024.10.25.620272

work page doi:10.1101/2024.10.25.620272 2024
[22]

B., and R

Da Silva, G. B., and R. V. Ramos. (2019). The Lambert-Tsal lis Wq function. Physica A: Statistical Mechanics and its Applications . 525: 164-170

work page 2019
[23]

Euler, L. (1779). De serie Lambertina plurimisque eius insignibus proprietatibus. Acta Academiae Scientiarum Petropolitanae (1779: II, 1783 ) p. 29-51, Sankt Peterburg. Leonardi Euleri Opera Omnia, Series Prima Opera Mathematic a, IV 1921 p. 350-369 ; http://math.dartmouth.edu/ euler.docs/originals/E532 .pdf)

work page 1921
[24]

Furuichi S. (2010). An axiomatic characterization of a two-parameter extended relative entropy. Journal of mathematical physics , 51(12):123302

work page 2010
[25]

and Singer, Y

Ghai, U., Hazan, E. and Singer, Y. (2020). Exponentiate d Gradient Meets Gradient Descent. In Proceedings of the 31st International Conference on Algo rithmic Learning Theory , PMLR 117:386-407. https://doi.org/10.48550/arXiv.1902.019 03

work page doi:10.48550/arxiv.1902.019 2020
[26]

and Borges, E.P., (2021)

Gomez, I.S. and Borges, E.P., (2021). Algebraic structu res and position-dependent mass Schrödinger equation from group entropy theory. Letters in Mathematical Physics, 111(2), p.43

work page 2021
[27]

and Charvat, F

Harvda, J. and Charvat, F. (1967). Quantiﬁcation metho d of classiﬁcation processes. Concept of structural a-entropy, Kybernetica, 3, 30-45 (1967)

work page 1967
[28]

He, W., and Jiang, H. (2008). Explicit update vs implici t update. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , Hong Kong, pp. 3441-3447. https://doi.org/10.1109/IJCNN .2008.4634288

work page doi:10.1109/ijcnn 2008
[29]

and Warmuth , M

Helmbold, D.P., Schapire, R.E., Singer, Y. and Warmuth , M. K. (1998). On-line Portfolio Selection Using Multiplicative Updates. Mathematical Finance , 8: 325-347. https://doi.org/10.1111/1467-9965.00058

work page doi:10.1111/1467-9965.00058 1998
[30]

and Warmuth, M

Herbster, M. and Warmuth, M. K. (1998). Tracking the Best Expert. Machine Learning , 32:151-178. https://doi.org/10.1023/A:1007424614876

work page doi:10.1023/a:1007424614876 1998
[31]

Kaniadakis, G. (2002(. Statistical mechanics in the co ntext of special relativity. Physical Review E, 66(5):056125

work page 2002
[32]

Kaniadakis, G.; Lissia, M. (2004). Editorial on News an d expectations in thermostatistics. Phys. A , 340, XV-XIX. 19

work page 2004
[33]

and Scarfone, A.M., (2004)

Kaniadakis, G., Lissia, M. and Scarfone, A.M., (2004). Deformed logarithms and entropies. Physica A: Statistical Mechanics and its Applications , 340(1-3), pp.41-49

work page 2004
[34]

Kaniadakis, G., Lissia, M., and Scarfone, A.M. (2005). Two-parameter deformations of loga- rithm, exponential, and entropy: A consistent framework fo r generalized statistical mechanics. Physical Review E , 71(4):046128

work page 2005
[35]

Kaniadakis, G. (2009). Maximum entropy principle and p ower-law tailed distributions. The European Physical Journal B , 70(1), 3-13

work page 2009
[36]

and Wad a, T., (2017)

Kaniadakis, G., Scarfone, A.M., Sparavigna, A. and Wad a, T., (2017). Composition law of κ-entropy for statistically independent systems. Physical Review E , 95(5), p.052112

work page 2017
[37]

and Warmuth, M

Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gr adient versus Gra- dient Descent for Linear Predictors. Information and Computation , 132:1-63. http://dx.doi.org/10.1006/inco.1996.2612

work page doi:10.1006/inco.1996.2612 1997
[38]

Kivinen, J., Warmuth, M. K. (1995). Additive versus exponentiated gradient updates for linear prediction. In Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing (pp. 209-218). https://doi.org/10.1145/22505 8.225121

work page doi:10.1145/22505 1995
[39]

and Leibler, R.A

Kullback, S. and Leibler, R.A. (1951). On information a nd suﬃciency. The annals of mathe- matical statistics , 22(1):79-86

work page 1951
[40]

Lambert, J.H. (1758). Observationes varie in mathesin puram. Acta Hel- vetica, Physico-mathematicoanatomico-botanico-medica , Basel, 3, 128-168 (1758). http://www.kuttaka.org/ JHL/L1758c.pdf

work page
[41]

and Warmuth, M

Majidi, N., Amid, E., Talebi, H. and Warmuth, M. K. (2021 ). Exponentiated Gradi- ent Reweighting for Robust Training Under Label Noise and Bey ond. ArXiv preprint arXiv:2104.01493

work page arXiv 2021
[42]

McAnally, D.S. (1995). q-exponential and q-gamma func tions. i. q-exponential functionsa. Journal of Mathematical Physics , 36(1):546–573

work page 1995
[43]

Mittal, D.P. (1975). On some functional equations conc erning entropy, directed divergence and inaccuracy. Metrika, 22(1):35–45, 1975

work page 1975
[44]

Naudts, J. (2002). Deformed exponentials and logarith ms in generalized thermostatistics. Phys- ica A: Statistical Mechanics and its Applications , 316(1-4):323–334, 2002

work page 2002
[45]

and Yudin, D

Nemirovsky, A. and Yudin, D. (1983). Problem Complexit y and Method Eﬃciency in Opti- mization. John Wiley and Sons , https://doi.org/10.1137/1027074

work page doi:10.1137/1027074 1983
[46]

Nock, R., Amid, E., Warmuth, M. K. (2023). Boosting with Tempered Exponential Measures. arXiv preprint arXiv:2306.05487. https://doi.org/10.48 550/arXiv.2306.05487

work page arXiv 2023
[47]

and Wada, T

Scarfone, A.M., Suyari, H. and Wada, T. (2009). Gauss la w of error revisited in the framework of Sharma-Taneja-Mittal information measure. Central European Journal of Physics , 7, pp.414- 420. 20

work page 2009
[48]

Shannon, C.E. (1948). A mathematical theory of communi cation. Bell system technical journal , 27(3):379–423, 1948

work page 1948
[49]

Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107-194

work page 2011
[50]

Sharma, B.D., and Taneja, I.J. (1975). Entropy of type ( α , β ) and other generalized measures in information theory. Metrika, 22(1):205–215

work page 1975
[51]

and Tsallis, C

Schwämmle, V. and Tsallis, C. (2007). Two-parameter ge neralization of the logarithm and ex- ponential functions and Boltzmann-Gibbs-Shannon entropy. Journal of Mathematical Physics , 48 (11), AIP Publishing

work page 2007
[52]

Taneja, I.J. (2001). Generalized information measure s and their applications. on-line book. URL www. mtm. ufsc. br/taneja/book/book. html

work page 2001
[53]

Taneja, I.J. (1989). On generalized information measu res and their applications. Advances in Electronics and Electron Physics , 76:327–413

work page 1989
[54]

Tempesta. P. (2015). A theorem on the existence of trace -form generalized entropies. Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2183), p.20150165

work page 2015
[55]

Tempesta, P. (2016). Formal groups and Z-entropies. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 472(2195), 20160143

work page 2016
[56]

Tsallis, C. (1988). Possible generalization of Boltzma nn-Gibbs statistics. Journal of statistical physics, 52(1):479–487

work page 1988
[57]

Tasllis, C. (1994). What are the numbers that experimen ts provide. Quimica Nova , 17,6, 468–471

work page 1994
[58]

and Scarfone, A.M

Wada, T. and Scarfone, A.M. (2010). Finite diﬀerence an d averaging operators in generalized entropies. In Journal of Physics: Conference Series , volume 201, page 012005. IOP Publishing. 21

work page 2010

[1] [1]

Abe, S. (1997). A note on the q-deformation-theoretic as pect of the generalized entropies in nonextensive physics. Physics Letters A , 224(6):326–330

work page 1997

[2] [2]

Amari, S. (2009). Alpha-divergence is unique, belongin g to both f-divergence and Bregman divergence classes. IEEE Transactions on Informations Theory , 55, 4925–4931

work page 2009

[3] [3]

Amari, S.; Nagaoka, H. (2000). Methods of Information Geometry . Oxford University Press, New York. 17

work page 2000

[4] [4]

Amari, S. (2009). Information geometry and its applicat ions: Convex function and dually ﬂat manifold. In Emerging Trends in Visual Computing ; Nielsen, F., Ed. Springer Lecture Notes in Computer Science, pp. 75–102

work page 2009

[5] [5]

and Cichocki, A

Amari, S. and Cichocki, A. (2010). Information geometry of divergence functions. Bulletin of the Polish Academy of Science , 58, 183–195

work page 2010

[6] [6]

Amid, E., and Warmuth, M. K. (2020). Reparameterizing mi rror descent as gradient descent. In Proceedings of the 34th International Conference on Neur al Information Processing Systems (NIPS’20), Curran Associates Inc., Red Hook, NY, USA, Article 706, 843 0-8439

work page 2020

[7] [7]

and Warmuth, M

Amid, E. and Warmuth, M. K. (2020). Winnowing with Gradie nt Descent. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory, PMLR 125:163-182

work page 2020

[8] [8]

and Teboulle, M

Beck, A. and Teboulle, M. (2003). Mirror descent and nonli near projected subgradient methods for convex optimization. Operations Research Letters, 31(3), pp.167-175

work page 2003

[9] [9]

Charnes, A

Ben-Tal, A. Charnes, A. and Teboulle, M. (1989). Entropic means. Journal of Mathematical Analysis and Applications , 139(2):537–551

work page 1989

[10] [10]

and Roditi, I

Borges, E.P. and Roditi, I. (1998). A family of nonextens ive entropies. Physics Letters A , 246(5):399–402

work page 1998

[11] [11]

Bregman, L. (1967). The relaxation method of ﬁnding a com mon point of convex sets and its application to the solution of problems in convex programmi ng. Comp. Math. Phys., USSR , 7, 200–217

work page 1967

[12] [12]

Canturk, B., Oikonomou, T., and Baris Bagci, G. (2018). The parameter space and third law of thermodynamics for the Borges-Roditi, Abe and Sharma-Mit tal entropies. International Journal of Modern Physics B 32(24), 1850274

work page 2018

[13] [13]

and Jagannathan, R

Chakrabarti, R. and Jagannathan, R. (1991). A (p, q)-os cillator realization of two-parameter quantum algebras. Journal of Physics A: Mathematical and General , 24(13):L711

work page 1991

[14] [14]

Cichocki, A. (2025). Mirror Descent Using the Tempesta Generalized Multi-parametric Loga- rithms. arXiv preprint arXiv:2506.13984

work page arXiv 2025

[15] [15]

Cichocki, A., Cruces, S., Sarmineto A., Tanaka, T. (202 4). Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portf olio Selection. IEEE Access , https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10807168

work page

[16] [16]

Cichocki, A.,Tanaka, T., Nielsonm F., Cruces, S. (2025 ). Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies.. Entropy (submitted)

work page 2025

[17] [17]

Cichocki, A. and S.I. Amari, S.I. (2010). Families of α -β -and γ-divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532–1568

work page 2010

[18] [18]

and Cruces, S

Cichocki, A. and Cruces, S. and Amari, S. I. (2011). Gene ralized alpha-beta divergences and their application to robust nonnegative matrix factorizat ion. Entropy, 13(1), pp. 134-170. 18

work page 2011

[19] [19]

Cichocki, A., Zdunek, R., Phan, A. H. and Amari S.I. (200 9). Nonnegative Ma- trix and Tensor Factorizations. John Wiley and Sons , Chapter 3, pp. 131-202. https://doi.org/10.1002/9780470747278.ch3

work page doi:10.1002/9780470747278.ch3

[20] [20]

Cichocki, A.; Zdunek, R.; Amari, S. (2006). Csiszár’s d ivergences for nonnegative matrix factorization: Family of new algorithms. Springer, LNCS-3889 , 3889, 32–39

work page 2006

[21] [21]

and Richards, B

Cornford, J., Pogodin, R., Ghosh, A., Sheng, K., Bicknel l, B., Codol, O., Clark, B.A., La- joie, G. and Richards, B. (2024) Brain-like learning with expo nentiated gradients. bioRxiv . https://doi.org/10.1101/2024.10.25.620272

work page doi:10.1101/2024.10.25.620272 2024

[22] [22]

B., and R

Da Silva, G. B., and R. V. Ramos. (2019). The Lambert-Tsal lis Wq function. Physica A: Statistical Mechanics and its Applications . 525: 164-170

work page 2019

[23] [23]

Euler, L. (1779). De serie Lambertina plurimisque eius insignibus proprietatibus. Acta Academiae Scientiarum Petropolitanae (1779: II, 1783 ) p. 29-51, Sankt Peterburg. Leonardi Euleri Opera Omnia, Series Prima Opera Mathematic a, IV 1921 p. 350-369 ; http://math.dartmouth.edu/ euler.docs/originals/E532 .pdf)

work page 1921

[24] [24]

Furuichi S. (2010). An axiomatic characterization of a two-parameter extended relative entropy. Journal of mathematical physics , 51(12):123302

work page 2010

[25] [25]

and Singer, Y

Ghai, U., Hazan, E. and Singer, Y. (2020). Exponentiate d Gradient Meets Gradient Descent. In Proceedings of the 31st International Conference on Algo rithmic Learning Theory , PMLR 117:386-407. https://doi.org/10.48550/arXiv.1902.019 03

work page doi:10.48550/arxiv.1902.019 2020

[26] [26]

and Borges, E.P., (2021)

Gomez, I.S. and Borges, E.P., (2021). Algebraic structu res and position-dependent mass Schrödinger equation from group entropy theory. Letters in Mathematical Physics, 111(2), p.43

work page 2021

[27] [27]

and Charvat, F

Harvda, J. and Charvat, F. (1967). Quantiﬁcation metho d of classiﬁcation processes. Concept of structural a-entropy, Kybernetica, 3, 30-45 (1967)

work page 1967

[28] [28]

He, W., and Jiang, H. (2008). Explicit update vs implici t update. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , Hong Kong, pp. 3441-3447. https://doi.org/10.1109/IJCNN .2008.4634288

work page doi:10.1109/ijcnn 2008

[29] [29]

and Warmuth , M

Helmbold, D.P., Schapire, R.E., Singer, Y. and Warmuth , M. K. (1998). On-line Portfolio Selection Using Multiplicative Updates. Mathematical Finance , 8: 325-347. https://doi.org/10.1111/1467-9965.00058

work page doi:10.1111/1467-9965.00058 1998

[30] [30]

and Warmuth, M

Herbster, M. and Warmuth, M. K. (1998). Tracking the Best Expert. Machine Learning , 32:151-178. https://doi.org/10.1023/A:1007424614876

work page doi:10.1023/a:1007424614876 1998

[31] [31]

Kaniadakis, G. (2002(. Statistical mechanics in the co ntext of special relativity. Physical Review E, 66(5):056125

work page 2002

[32] [32]

Kaniadakis, G.; Lissia, M. (2004). Editorial on News an d expectations in thermostatistics. Phys. A , 340, XV-XIX. 19

work page 2004

[33] [33]

and Scarfone, A.M., (2004)

Kaniadakis, G., Lissia, M. and Scarfone, A.M., (2004). Deformed logarithms and entropies. Physica A: Statistical Mechanics and its Applications , 340(1-3), pp.41-49

work page 2004

[34] [34]

Kaniadakis, G., Lissia, M., and Scarfone, A.M. (2005). Two-parameter deformations of loga- rithm, exponential, and entropy: A consistent framework fo r generalized statistical mechanics. Physical Review E , 71(4):046128

work page 2005

[35] [35]

Kaniadakis, G. (2009). Maximum entropy principle and p ower-law tailed distributions. The European Physical Journal B , 70(1), 3-13

work page 2009

[36] [36]

and Wad a, T., (2017)

Kaniadakis, G., Scarfone, A.M., Sparavigna, A. and Wad a, T., (2017). Composition law of κ-entropy for statistically independent systems. Physical Review E , 95(5), p.052112

work page 2017

[37] [37]

and Warmuth, M

Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gr adient versus Gra- dient Descent for Linear Predictors. Information and Computation , 132:1-63. http://dx.doi.org/10.1006/inco.1996.2612

work page doi:10.1006/inco.1996.2612 1997

[38] [38]

Kivinen, J., Warmuth, M. K. (1995). Additive versus exponentiated gradient updates for linear prediction. In Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing (pp. 209-218). https://doi.org/10.1145/22505 8.225121

work page doi:10.1145/22505 1995

[39] [39]

and Leibler, R.A

Kullback, S. and Leibler, R.A. (1951). On information a nd suﬃciency. The annals of mathe- matical statistics , 22(1):79-86

work page 1951

[40] [40]

Lambert, J.H. (1758). Observationes varie in mathesin puram. Acta Hel- vetica, Physico-mathematicoanatomico-botanico-medica , Basel, 3, 128-168 (1758). http://www.kuttaka.org/ JHL/L1758c.pdf

work page

[41] [41]

and Warmuth, M

Majidi, N., Amid, E., Talebi, H. and Warmuth, M. K. (2021 ). Exponentiated Gradi- ent Reweighting for Robust Training Under Label Noise and Bey ond. ArXiv preprint arXiv:2104.01493

work page arXiv 2021

[42] [42]

McAnally, D.S. (1995). q-exponential and q-gamma func tions. i. q-exponential functionsa. Journal of Mathematical Physics , 36(1):546–573

work page 1995

[43] [43]

Mittal, D.P. (1975). On some functional equations conc erning entropy, directed divergence and inaccuracy. Metrika, 22(1):35–45, 1975

work page 1975

[44] [44]

Naudts, J. (2002). Deformed exponentials and logarith ms in generalized thermostatistics. Phys- ica A: Statistical Mechanics and its Applications , 316(1-4):323–334, 2002

work page 2002

[45] [45]

and Yudin, D

Nemirovsky, A. and Yudin, D. (1983). Problem Complexit y and Method Eﬃciency in Opti- mization. John Wiley and Sons , https://doi.org/10.1137/1027074

work page doi:10.1137/1027074 1983

[46] [46]

Nock, R., Amid, E., Warmuth, M. K. (2023). Boosting with Tempered Exponential Measures. arXiv preprint arXiv:2306.05487. https://doi.org/10.48 550/arXiv.2306.05487

work page arXiv 2023

[47] [47]

and Wada, T

Scarfone, A.M., Suyari, H. and Wada, T. (2009). Gauss la w of error revisited in the framework of Sharma-Taneja-Mittal information measure. Central European Journal of Physics , 7, pp.414- 420. 20

work page 2009

[48] [48]

Shannon, C.E. (1948). A mathematical theory of communi cation. Bell system technical journal , 27(3):379–423, 1948

work page 1948

[49] [49]

Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107-194

work page 2011

[50] [50]

Sharma, B.D., and Taneja, I.J. (1975). Entropy of type ( α , β ) and other generalized measures in information theory. Metrika, 22(1):205–215

work page 1975

[51] [51]

and Tsallis, C

Schwämmle, V. and Tsallis, C. (2007). Two-parameter ge neralization of the logarithm and ex- ponential functions and Boltzmann-Gibbs-Shannon entropy. Journal of Mathematical Physics , 48 (11), AIP Publishing

work page 2007

[52] [52]

Taneja, I.J. (2001). Generalized information measure s and their applications. on-line book. URL www. mtm. ufsc. br/taneja/book/book. html

work page 2001

[53] [53]

Taneja, I.J. (1989). On generalized information measu res and their applications. Advances in Electronics and Electron Physics , 76:327–413

work page 1989

[54] [54]

Tempesta. P. (2015). A theorem on the existence of trace -form generalized entropies. Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2183), p.20150165

work page 2015

[55] [55]

Tempesta, P. (2016). Formal groups and Z-entropies. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 472(2195), 20160143

work page 2016

[56] [56]

Tsallis, C. (1988). Possible generalization of Boltzma nn-Gibbs statistics. Journal of statistical physics, 52(1):479–487

work page 1988

[57] [57]

Tasllis, C. (1994). What are the numbers that experimen ts provide. Quimica Nova , 17,6, 468–471

work page 1994

[58] [58]

and Scarfone, A.M

Wada, T. and Scarfone, A.M. (2010). Finite diﬀerence an d averaging operators in generalized entropies. In Journal of Physics: Conference Series , volume 201, page 012005. IOP Publishing. 21

work page 2010