On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Reza Sameni

arxiv: 2504.02169 · v3 · submitted 2025-04-02 · 💻 cs.LG · cs.AI· math.ST· stat.ML· stat.TH

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Reza Sameni This is my paper

Pith reviewed 2026-05-22 21:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.STstat.MLstat.TH

keywords ROC curvesPR curvesbinary classificationscore distributionsCDF compositionclassifier metricsoperating point selection

0 comments

The pith

Binary classification metrics are all functions of one composition G of the positive and negative score CDFs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that ROC and PR curves, along with many standard performance metrics, are completely determined by the single function G formed by composing the cumulative distribution function of positive-class scores with the inverse of the negative-class CDF. This geometric view makes it possible to see how decision thresholds map to operating points, how class separability and variance shape the curves, and how one classifier dominates another. It also supplies an explicit link between G and Kullback-Leibler divergence and shows how the same object governs cost-sensitive optimization and capacity-constrained deployment.

Core claim

Many of the most commonly used binary classification metrics are merely functions of the composition function G := F_p ∘ F_n^{-1}, where F_p and F_n are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes.

What carries the argument

The composition G := F_p ∘ F_n^{-1} that maps the negative-class score distribution through the positive-class distribution and thereby fixes every point on the ROC and PR curves.

If this is right

Operating-point selection and threshold choice reduce to inspecting the shape of G.
Classifier dominance can be decided by comparing the corresponding G functions directly.
The geometry of ROC and PR curves is fully explained by the degree of class separability and the variance ratio encoded in G.
Cost-sensitive and capacity-constrained decisions become direct functions of G and its inverse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New parametric families of ROC and PR curves could be generated simply by choosing convenient forms for G.
The same reduction may apply to other threshold-based performance measures not explicitly treated in the paper.
Empirical checks on real datasets could verify whether observed metric differences collapse once G is matched.

Load-bearing premise

The class-conditional score distributions admit well-defined and invertible cumulative distribution functions.

What would settle it

Two classifiers whose score distributions produce identical G yet yield different values for AUC, average precision, or any other standard metric derived from ROC or PR curves.

Figures

Figures reproduced from arXiv: 2504.02169 by Reza Sameni.

**Figure 1.** Figure 1: Illustration of score distributions fn(x) and fp(x) in a binary classification problem. Shaded regions show classifier decisions based on the threshold τ ; observed score distribution f(x) (pool of both classes) not shown. See (4), (5), (6), and (7) for definitions of shaded areas. 2 Binary Classification Setup Consider a binary classification problem in which a classifier C assigns continuous real-valued … view at source ↗

**Figure 2.** Figure 2: Common classification evaluation measures [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: A typical ROC curve highlighting the composition function [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of different constrained scenarios and the corresponding design regions in the ROC [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: ROC curves for Gaussian score classifiers under different parameter settings. (a) Varying [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Generic ROC family generated from the transformation [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: PR curves for binomial positive and negative class scores under varying classifier parameters and [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: PR curves for binomial positive and negative class scores under fixed values of [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback-Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes ROC/PR geometry around G = F_p ∘ F_n^{-1} but the claim that metrics are merely functions of G does not hold for PR because precision also depends on prevalence π.

read the letter

The main takeaway is that this paper tries to unify the geometry of ROC and PR curves through the single composition G of the positive and negative class-conditional CDFs, but the central phrasing that many metrics are merely functions of G runs into trouble once precision-recall enters the picture. ROC quantities do follow directly from G, with FPR = t and TPR = 1 - G(t). That part is standard and cleanly stated. The paper adds value by giving analytical and numerical examples of how separability and variance change curve shapes, by spelling out conditions for classifier dominance, and by linking G to Kullback-Leibler divergence. Those pieces supply concrete intuition for threshold selection and model comparison that some readers will find handy. The examples on capacity constraints and cost-sensitive settings are also practical. The soft spot is the PR claim. Precision equals (TPR · π) / (TPR · π + FPR · (1-π)), so every PR-derived metric is a joint function of G and the prevalence π. Treating PR geometry as determined by G alone therefore does not hold. If the derivations or figures in the full text omit explicit dependence on π, the geometric story for PR curves is incomplete. The abstract does not show the actual math, so it is hard to judge how far the derivations go, but the wording in the abstract already overreaches on this point. This is a paper for readers who already work with score distributions and want a compact geometric language for operating-point choice. It is not a major theoretical advance; it is mostly a reframing with some useful illustrations. The ROC side is solid enough to deserve referee time, but the PR section needs a clear correction on the role of prevalence before it would be ready for publication.

Referee Report

1 major / 1 minor

Summary. The paper studies the geometry of ROC and PR curves in binary classification. Its central claim is that many commonly used metrics are merely functions of the composition G := F_p ∘ F_n^{-1}, where F_p and F_n are the class-conditional CDFs of classifier scores in the positive and negative classes. The work examines how this view aids operating-point selection, threshold effects, classifier comparison and dominance conditions; it supplies analytical/numerical examples on separability and variance, derives a link between G and KL divergence, and discusses practical issues such as calibration, cost-sensitive learning, and capacity constraints.

Significance. If the geometric reduction is made precise (including explicit handling of prevalence), the framework could supply a compact language for comparing classifiers and selecting thresholds under application-specific constraints. The explicit examples on how separability and variance shape the curves, together with the KL link, would be useful strengths if they are derived without circularity.

major comments (1)

[Abstract] Abstract (key finding paragraph): the statement that 'many of the most commonly used binary classification metrics are merely functions of the composition function G' does not hold for PR-derived quantities. ROC coordinates depend only on G (FPR = t, TPR = 1 - G(t)), but precision = (TPR · π) / (TPR · π + FPR · (1-π)) is a joint function of G and the prevalence π. Any claim that PR geometry is 'merely' a function of G therefore requires either an explicit restriction to ROC or a clear statement that π is carried as an additional parameter; otherwise the central reduction is overstated for the PR case.

minor comments (1)

[Abstract] The abstract announces a derivation linking G to Kullback-Leibler divergence but supplies no equation or sketch; the full manuscript should state the precise relation (e.g., which integral or expectation) so readers can verify it is not tautological.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the precise observation regarding the abstract. We address the comment below.

read point-by-point responses

Referee: [Abstract] Abstract (key finding paragraph): the statement that 'many of the most commonly used binary classification metrics are merely functions of the composition function G' does not hold for PR-derived quantities. ROC coordinates depend only on G (FPR = t, TPR = 1 - G(t)), but precision = (TPR · π) / (TPR · π + FPR · (1-π)) is a joint function of G and the prevalence π. Any claim that PR geometry is 'merely' a function of G therefore requires either an explicit restriction to ROC or a clear statement that π is carried as an additional parameter; otherwise the central reduction is overstated for the PR case.

Authors: We agree that the abstract phrasing is imprecise for the PR case. ROC coordinates (FPR, TPR) are indeed functions of G alone, whereas precision explicitly incorporates prevalence π. In the body of the manuscript we already treat π as an explicit parameter when deriving PR curves, but the abstract does not make this distinction clear. We will revise the abstract to state that ROC metrics depend only on G while PR metrics depend on G together with π (treated as a fixed problem parameter). This preserves the geometric framework while removing the overstatement. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation follows directly from CDF definitions

full rationale

The paper constructs its geometric framework for ROC and PR curves from the standard definitions of class-conditional CDFs F_p and F_n and their composition G. All subsequent expressions for metrics follow by algebraic substitution using the usual probabilistic definitions of TPR, FPR, precision, etc. No parameters are fitted and then relabeled as predictions, no self-citations serve as load-bearing uniqueness theorems, and no ansatz is smuggled in. The central claim that many metrics are functions of G is a direct consequence of the input definitions rather than a tautology that redefines its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified from the provided information.

pith-pipeline@v0.9.0 · 5742 in / 1059 out tokens · 48777 ms · 2026-05-22T21:21:01.943879+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

many of the most commonly used binary classification metrics are merely functions of the composition function G := F_p ∘ F_n^{-1}
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AUROC = 1 - ∫_0^1 G(v) dv

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

W. J. Krzanowski and D. J. Hand, ROC curves for continuous data . Chapman and Hall/CRC, 2009

work page 2009
[2]

Bipartite Ranking: a Risk-Theoretic Perspective,

A. K. Menon and R. C. Williamson, “Bipartite Ranking: a Risk-Theoretic Perspective,” Journal of Machine Learning Research, vol. 17, no. 195, pp. 1–102, 2016

work page 2016
[3]

Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,

D. E. Leisman, “Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,” Critical Care Medicine, vol. 46, p. 418–424, Mar. 2018

work page 2018
[4]

H. L. Van Trees, Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. John Wiley & Sons, 2004

work page 2004
[5]

The meaning and use of the area under a receiver operating characteristic (ROC) curve,

J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, p. 29–36, Apr. 1982

work page 1982
[6]

An introduction to ROC analysis,

T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, p. 861–874, June 2006. 1Remember that G(·) and g(·) are only defined over [0 , 1]. 16

work page 2006
[7]

The relationship between Precision-Recall and ROC curves,

J. Davis and M. Goadrich, “The relationship between Precision-Recall and ROC curves,” in Proceedings of the 23rd international conference on Machine learning - ICML ’06 , ICML ’06, p. 233–240, ACM Press, 2006

work page 2006
[8]

Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,

M. A. Reyna, J. Weigle, Z. Koscova, A. Elola, S. Seyedi, K. Campbell, M.-S. Hassannia, J. Pavlus, A. H. Ribeiro, A. L. P. Ribeiro, R. Sameni, and G. D. Clifford, “Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,” 2025. Accessed: 2025-04-01

work page 2025
[9]

D. S. Naidu, Optimal control systems. CRC press, 2018

work page 2018
[10]

Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,

R. Sameni, “Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,” IEEE Journal of Selected Topics in Signal Processing , vol. 16, p. 307–317, Feb. 2022

work page 2022
[11]

Age, sex and race bias in automated arrhythmia detectors,

E. A. Perez Alday, A. B. Rad, M. A. Reyna, N. Sadr, A. Gu, Q. Li, M. Dumitru, J. Xue, D. Albert, R. Sameni, and G. D. Clifford, “Age, sex and race bias in automated arrhythmia detectors,” Journal of Electrocardiology, vol. 74, p. 5–9, Sept. 2022

work page 2022
[12]

Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,

M. A. Reyna, Y. Kiarashi, A. Elola, J. Oliveira, F. Renna, A. Gu, E. A. Perez Alday, N. Sadr, A. Sharma, J. Kpodonu, S. Mattos, M. T. Coimbra, R. Sameni, A. B. Rad, and G. D. Clifford, “Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,” PLOS Digital Health, vol. 2, p. e0000324, Sept. 2023

work page 2022
[13]

Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,

E. A. Perez Alday, A. Gu, A. J Shah, C. Robichaux, A.-K. Ian Wong, C. Liu, F. Liu, A. Bahrami Rad, A. Elola, S. Seyedi, Q. Li, A. Sharma, G. D. Clifford, and M. A. Reyna, “Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,” Physiological Measurement, vol. 41, p. 124003, Dec. 2020

work page 2020
[14]

Papoulis and S

A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes . McGraw-Hill, 4th ed., 2002. 17

work page 2002

[1] [1]

W. J. Krzanowski and D. J. Hand, ROC curves for continuous data . Chapman and Hall/CRC, 2009

work page 2009

[2] [2]

Bipartite Ranking: a Risk-Theoretic Perspective,

A. K. Menon and R. C. Williamson, “Bipartite Ranking: a Risk-Theoretic Perspective,” Journal of Machine Learning Research, vol. 17, no. 195, pp. 1–102, 2016

work page 2016

[3] [3]

Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,

D. E. Leisman, “Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,” Critical Care Medicine, vol. 46, p. 418–424, Mar. 2018

work page 2018

[4] [4]

H. L. Van Trees, Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. John Wiley & Sons, 2004

work page 2004

[5] [5]

The meaning and use of the area under a receiver operating characteristic (ROC) curve,

J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, p. 29–36, Apr. 1982

work page 1982

[6] [6]

An introduction to ROC analysis,

T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, p. 861–874, June 2006. 1Remember that G(·) and g(·) are only defined over [0 , 1]. 16

work page 2006

[7] [7]

The relationship between Precision-Recall and ROC curves,

J. Davis and M. Goadrich, “The relationship between Precision-Recall and ROC curves,” in Proceedings of the 23rd international conference on Machine learning - ICML ’06 , ICML ’06, p. 233–240, ACM Press, 2006

work page 2006

[8] [8]

Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,

M. A. Reyna, J. Weigle, Z. Koscova, A. Elola, S. Seyedi, K. Campbell, M.-S. Hassannia, J. Pavlus, A. H. Ribeiro, A. L. P. Ribeiro, R. Sameni, and G. D. Clifford, “Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,” 2025. Accessed: 2025-04-01

work page 2025

[9] [9]

D. S. Naidu, Optimal control systems. CRC press, 2018

work page 2018

[10] [10]

Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,

R. Sameni, “Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,” IEEE Journal of Selected Topics in Signal Processing , vol. 16, p. 307–317, Feb. 2022

work page 2022

[11] [11]

Age, sex and race bias in automated arrhythmia detectors,

E. A. Perez Alday, A. B. Rad, M. A. Reyna, N. Sadr, A. Gu, Q. Li, M. Dumitru, J. Xue, D. Albert, R. Sameni, and G. D. Clifford, “Age, sex and race bias in automated arrhythmia detectors,” Journal of Electrocardiology, vol. 74, p. 5–9, Sept. 2022

work page 2022

[12] [12]

Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,

M. A. Reyna, Y. Kiarashi, A. Elola, J. Oliveira, F. Renna, A. Gu, E. A. Perez Alday, N. Sadr, A. Sharma, J. Kpodonu, S. Mattos, M. T. Coimbra, R. Sameni, A. B. Rad, and G. D. Clifford, “Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,” PLOS Digital Health, vol. 2, p. e0000324, Sept. 2023

work page 2022

[13] [13]

Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,

E. A. Perez Alday, A. Gu, A. J Shah, C. Robichaux, A.-K. Ian Wong, C. Liu, F. Liu, A. Bahrami Rad, A. Elola, S. Seyedi, Q. Li, A. Sharma, G. D. Clifford, and M. A. Reyna, “Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,” Physiological Measurement, vol. 41, p. 124003, Dec. 2020

work page 2020

[14] [14]

Papoulis and S

A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes . McGraw-Hill, 4th ed., 2002. 17

work page 2002