arxiv: 2605.05808 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Recognition: unknown

Ratio-based Loss Functions

Lena Helgerth , Andreas Christmann

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:29 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords ratio-based loss functionsmachine learningregressionconvexitydifferentiabilityrelative errorloss functionssurvey

0 comments

The pith

Ratio-based loss functions depend on the ratio of target to prediction and satisfy general properties of continuity, convexity, and differentiability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines loss functions built around the ratio of observed values to model predictions instead of their absolute difference. The authors focus on regression settings where relative errors are the modeling target. They classify existing ratio-based losses and introduce several new ones while checking their continuity, Lipschitz continuity, convexity, and differentiability. These mathematical traits matter because most machine learning algorithms require them for stable optimization and later theoretical analysis. The paper stops short of proving consistency or learning rates for any particular algorithm, leaving that work for future studies.

Core claim

Ratio-based loss functions are losses that take the ratio y_i / f(x_i) as their argument, in contrast to the more common margin-based or distance-based losses. The paper provides a systematic survey of their general properties, including continuity, Lipschitz continuity, convexity, and differentiability, and proposes a small number of new ratio-based losses. These properties are examined because they are central to the behavior of optimization routines in machine learning, independent of any specific choice of hypothesis space or probability measure.

What carries the argument

Ratio-based loss functions, which are scalar functions of the ratio between the target value and the model's prediction, designed to capture multiplicative rather than additive error structures.

If this is right

Optimization algorithms can safely use gradient-based methods on any convex and differentiable ratio-based loss without additional safeguards.
Researchers can select or design losses according to whether they need bounded Lipschitz constants for stability arguments.
Newly proposed ratio-based losses inherit the same general properties as the surveyed ones and can be substituted into existing frameworks.
Theoretical guarantees developed for one ratio-based loss may transfer to others that share the same continuity or convexity class.
The separation of ratio-based losses from distance-based losses clarifies when relative-error modeling is mathematically well-behaved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These losses could be especially useful in domains where measurement noise scales with signal magnitude, such as count data or positive quantities.
Hybrid losses that blend a ratio term with a small distance term might combine robustness to relative errors with protection near zero predictions.
The listed properties could guide the construction of surrogate losses that approximate non-convex ratio-based objectives while preserving convexity.
Extension of the same ratio construction to classification margins would require careful handling of sign changes and zero crossings.

Load-bearing premise

The assumption that examining continuity, convexity, and differentiability in isolation will be sufficient to support later proofs of consistency or stability for algorithms that use these losses.

What would settle it

A concrete ratio-based loss that is discontinuous or non-convex at the points where predictions equal zero yet still produces stable empirical risk minimization in practice would weaken the rationale for prioritizing those properties.

Figures

Figures reproduced from arXiv: 2605.05808 by Andreas Christmann, Lena Helgerth.

**Figure 1.** Figure 1: Convex and Lipschitz continuous loss function view at source ↗

**Figure 2.** Figure 2: Plots of the representing functions ℓ using the logarithm; Huber-type logarithmic relative loss with parameter α = 3 5.2 Ratio-Based Loss Functions Using Logarithm and Hyperbolic Cosine A common distance-based loss function is log-cosh loss L(x, y, t) = ψ(y −t) = log(cosh(y −t)) ([JPJ22, Chapter III.M; LLL25, Chapter 3.1.9; Cia+24, Chapter 4.2.8]). We modify ψ in various ways such that the minimum is attai… view at source ↗

**Figure 3.** Figure 3: Plots of the representation functions ℓ using logarithm and hyperbolic cosine 5.4 Logarithmic Pinball Loss The following rb loss function L using ℓ is based on the idea of the distance-based τ -pinball loss function (see e.g. [SC08, Ex. 2.43]) and is convex, Lipschitz continuous, but not differentiable. For τ ∈ (0, 1) we define the logarithmic pinball loss function via ℓ(r) = max{τ log(r),(1 − τ ) log(r −1… view at source ↗

**Figure 5.** Figure 5: Plots of the representing functions ℓ of absolute relative, squared relative and Huber-type relative loss functions; Huber-type relative loss for parameter α = 3 5.6 Inverse Absolute Relative Loss ℓ(r) = |r −1 − 1| (18) In contrast to the absolute loss, which penalizes overestimation hard, but neglected underestimation, this representation function ℓ behaves conversely. Here, we focus on penalizing underes… view at source ↗

**Figure 6.** Figure 6: Plots of the representing functions ℓ of inverse relative loss functions; Huber-type inverse relative loss function for parameter α = 3 5.7 Least Absolute Relative Loss ℓ(r) = |r − r −1 | = |1 − r| + |1 − r −1 | (21) This loss function is following the example of least absolute relative error (LARE, see [Che+10, (3)]). The first representation allows easy calculations, the second shows its connection to ge… view at source ↗

**Figure 7.** Figure 7: Plots of the representing functions ℓ of LARE loss functions; Huber-type least absolute relative loss function’s representation ℓ with parameter α = 3 The function g is not defined in [Che+16, Chapter 2.3], unless it is supposed to satisfy “certain regularity conditions” that are not further specified. For our purposes, we assume a measurable function g : [0, ∞) × [0, ∞) → [0,∞). Some of the prior examples… view at source ↗

**Figure 8.** Figure 8: Plots of LPRE’s und GRE’s representation functions view at source ↗

**Figure 10.** Figure 10: Plots of the representing functions ℓ of robust loss based on maximum loss function; Left: loss without insensitivity (i.e. ε = 0) for different choices of parameter α; Right: robust loss functions with different insensitivity values ε and parameter α = 3 The second version is based on LPRE ℓ(r) = ( max{0, r−1 + r − 2 − ε}, r ∈ (α −1 , α), α −1 + α − 2 − ε, r ̸∈ (α −1 , α). (33) The parameters are require… view at source ↗

**Figure 11.** Figure 11: Plots of the representing functions ℓ of robust loss based on LPRE; Left: loss with ε = 0 (i.e. without insensitivity around 0) for different values of parameter α; Right: ℓ for various insensitivity choices ε and parameter α = 3 5.12 Smooth Robust Relative Loss Functions Instead of clipping the loss function at some point, [Fu+24] give a general framework to (smoothly) flatten general unbounded loss func… view at source ↗

**Figure 12.** Figure 12: Plots of the representing functions ℓ of [Fu+24]’s robust loss functions based on log-cosh-log loss for certain choices of parameters λ and b Now, we extend the idea of Hampel’s piecewise defined loss function ([Ham+86, Chapter 2.6: Ex. 1]) to smoothed versions of (smooth) LARE by adding some part(s) before cutting the loss. A perfect transfer of Hampel’s loss ([Ham+86, Chapter 2.6: Ex. 1]) can be achieve… view at source ↗

**Figure 13.** Figure 13: Plots of the representing functions ℓ for Hampel-type loss function; Left: ℓ from (35) with three parameters α = 2, β = 3, and γ = 5; Right: ℓ from (36) for two parameters α = 2 and β = 5 5.13 Weighted Relative Loss Finally, let us once again focus on rb loss functions that do not require ratio-symmetry, i.e. on rb loss function that can be used if overestimation and underestimation by the same factor are… view at source ↗

**Figure 14.** Figure 14: Plots of the representing functions ℓ of weighted loss functions for various weights τ view at source ↗

read the original abstract

Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending on the product of the output values $y_i$ and the predictions $f(x_i)$ as well as distance-based loss functions depending on the difference of $y_i$ and $f(x_i)$ for regression are common. Distance-based loss functions are in particular useful, if an additive model assumption seems plausible, i.e. the common signal plus noise assumption. However, in the literature, several loss functions proposed for regression purposes have a multiplicative error structure in mind and pay attention to relative errors, i.e. to the ratio of $y_i$ and $f(x_i)$. In this survey article, we systematically investigate such ratio-based loss functions and propose a few new losses, which may be interesting for future research. We concentrate on investigating general properties of ratio-based loss functions like continuity, Lipschitz-continuity, convexity, and differentiability, because these properties play a central role in most machine learning algorithms. Therefore, we do not focus on some specific machine learning algorithm to derive universal consistency, learning rates, or stability results. Instead, we want to enable future research in this direction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript surveys ratio-based loss functions (those depending on the ratio y/f(x)) for supervised learning, contrasting them with margin-based and distance-based losses. It reviews existing examples motivated by multiplicative error structures, proposes a few new ratio-based losses, and systematically derives their analytic properties including continuity, Lipschitz-continuity, convexity, and differentiability. The authors explicitly limit scope to these general properties and state that they do not derive consistency, learning rates, or stability results for any concrete algorithm, instead positioning the catalog as a foundation for future work.

Significance. If the property derivations hold, the paper supplies a useful reference catalog for loss functions suited to relative-error regression settings. Credit is given for the systematic treatment of standard properties (continuity, convexity, differentiability) that are load-bearing for optimization and for the constructive proposal of new losses. This modest but focused survey can facilitate subsequent research on algorithmic guarantees without claiming those guarantees itself.

minor comments (2)

Abstract: the phrase 'propose a few new losses' is not accompanied by even a brief indication of their functional forms; adding one sentence would improve reader orientation without lengthening the abstract unduly.
The manuscript should ensure uniform notation for the ratio argument (e.g., consistently using r = y/f(x) or an equivalent) when stating the new losses and their properties.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our survey on ratio-based loss functions. The recommendation for minor revision is noted, but no specific major comments or requested changes were provided in the report. We therefore have no points to address point-by-point and no revisions to incorporate at this stage.

Circularity Check

0 steps flagged

No significant circularity: survey of standard properties

full rationale

The paper is a survey that catalogs existing and proposes new ratio-based loss functions (depending on y/f(x)) and derives their basic analytic properties—continuity, Lipschitz continuity, convexity, differentiability—directly from standard real-analysis definitions. No load-bearing step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or ansatz smuggled from prior author work. The central premise explicitly disclaims deriving consistency, learning rates, or stability for any algorithm and instead positions the catalog as enabling future work. All derivations are therefore self-contained against external mathematical benchmarks and exhibit no self-definitional, fitted-input, or renaming circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on standard definitions from real analysis and convex analysis. No free parameters are introduced. The new loss functions are proposed rather than postulated as physical entities.

axioms (1)

standard math Standard definitions of continuity, Lipschitz continuity, convexity, and differentiability for real-valued functions of one variable.
Invoked when the paper states it will investigate these properties for ratio-based losses.

pith-pipeline@v0.9.0 · 5583 in / 1351 out tokens · 34804 ms · 2026-05-08T05:29:18.948732+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 2 canonical work pages

[1]

LINEXLossFunctionswithApplicationstoDetermining the Optimum Process Parameters

[CH07] Yen-ChangChangandWen-LiangHung:“LINEXLossFunctionswithApplicationstoDetermining the Optimum Process Parameters”. In:Quality & Quantity41 (2007), pp. 291–301. [Che+10] Kani Chen, Shaojun Guo, Yuanyuan Lin, and Zhiliang Ying: “Least Absolute Relative Error Estimation”. In:Journal of the American Statistical Association105.491 (2010), pp. 1104–1112. [...

2007
[2]

[Cla+98] Francis H

arXiv:2301.05579 [cs.LG]. [Cla+98] Francis H. Clarke, Yuri S. Ledyaev, Ronald J. Stern, and Peter R. Wolenski:Nonsmooth Analysis and Control Theory. New York, Berlin, Heidelberg: Springer,

work page arXiv
[3]

Elastic-net regularization in learning theory

[DDR09] Christine De Mol, Ernestor De Vito, and Lorenzo Rosasco: “Elastic-net regularization in learning theory”. In:Journal of Complexity25 (2009), pp. 201–230. [Dud02] Richard M. Dudley:Real Analysis and Probability. 2nd ed. Cambridge: Cambridge University Press,

2009
[4]

Generalized robust loss functions for machine learning

[Fu+24] Saiji Fu, Xiaoxiao Wang, Jingjing Tang, Shulin Lan, and Yingjie Tian: “Generalized robust loss functions for machine learning”. In:Neural Networks171 (2024), pp. 200–214. [Ham+86] Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel:Robust Statistics: The Approach Based on Influence Functions. Wiley series in probability...

2024
[5]

Robust Kernel Density Estimation

arXiv:2211.02989 [cs.LG]. [KS12] JooSeuk Kim and Clayton D. Scott: “Robust Kernel Density Estimation”. In:Journal of Machine Learning Research13 (2012), pp. 2529–2565. [Koe05] Roger Koenker:Quantile Regression. Cambridge: Cambridge University Press,

work page arXiv 2012
[6]

Regression quantiles

[KB78] Roger W. Koenker and Gilbert W. Bassett: “Regression quantiles”. In:Econometrica46 (1978), pp. 33–50. [Kön04] Konrad Königsberger:Analysis

1978
[7]

A Survey of Loss Functions in Deep Learning

[LLL25] Caiyi Li, Kaishuai Liu, and Shuai Liu: “A Survey of Loss Functions in Deep Learning”. In: Mathematics13.15 (2025). [LS17] Eckhard Limpert and Werner A. Stahel: “The log-normal distribution”. In:Significance14.1 (2017), pp. 8–9. [LSA01] Eckhard Limpert, Werner A. Stahel, and Markus Abbt: “Log-normal Distributions across the Sciences: Keys and Clues...

2025
[8]

L0-regularized high-dimensional sparse multiplicative models

[MYX25] Hao Ming, Hu Yang, and Xiaochao Xia: “L0-regularized high-dimensional sparse multiplicative models”. In:Statistical Theory and Related Fields9.1 (2025), pp. 59–83. [Roc97] Ralph T. Rockafellar:Convex Analysis. Princeton, Chichester: Princeton University Press,

2025
[9]

Nonparametric Sparsity and Regularization

[Ros+13] Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, and Alessandro Verri: “Nonparametric Sparsity and Regularization”. In:Journal of Machine Learning Research14.52 (2013), pp. 1665–

2013
[10]

A tutorial on support vector regression

A PROOFS 32 [SS04] Alex J. Smola and Bernhard Schölkopf: “A tutorial on support vector regression”. In:Statistics and Computing14 (2004), pp. 199–222. [Ste07] Ingo Steinwart: “How to Compare Different Loss Functions and Their Risks”. In:Constructive Approximation26 (2007), pp. 225–287. [SC08] Ingo Steinwart and Andreas Christmann:Support Vector Machines. ...

2004
[11]

Estimating conditional quantiles with the help of the pinball loss

[SC11] Ingo Steinwart and Andreas Christmann: “Estimating conditional quantiles with the help of the pinball loss”. In:Bernoulli17.1 (2011), pp. 211–225. [Tan+21] Jingjing Tang, Jiahui Li, Weiqi Xu, Yingjie Tian, Xuchan Ju, and Jie Zhang: “Robust cost- sensitive kernel method with Blinex loss and its applications in credit risk evaluation”. In:Neural Netw...

2011
[12]

A penalized least product relative error loss function based on wavelet decomposition for non-parametric multiplicative additive models

[Yan+23] Fan Yang, Zhanyang Li, Yushan Xue, and Yuehan Yang: “A penalized least product relative error loss function based on wavelet decomposition for non-parametric multiplicative additive models”. In:Journal of Computational and Applied Mathematics432 (2023), p. 115299. [Ye07] Jianming Ye: “Price Models and the Value Relevance of Accounting Information...

2023