pith. sign in

arxiv: 2504.02169 · v3 · submitted 2025-04-02 · 💻 cs.LG · cs.AI· math.ST· stat.ML· stat.TH

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Pith reviewed 2026-05-22 21:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.STstat.MLstat.TH
keywords ROC curvesPR curvesbinary classificationscore distributionsCDF compositionclassifier metricsoperating point selection
0
0 comments X

The pith

Binary classification metrics are all functions of one composition G of the positive and negative score CDFs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that ROC and PR curves, along with many standard performance metrics, are completely determined by the single function G formed by composing the cumulative distribution function of positive-class scores with the inverse of the negative-class CDF. This geometric view makes it possible to see how decision thresholds map to operating points, how class separability and variance shape the curves, and how one classifier dominates another. It also supplies an explicit link between G and Kullback-Leibler divergence and shows how the same object governs cost-sensitive optimization and capacity-constrained deployment.

Core claim

Many of the most commonly used binary classification metrics are merely functions of the composition function G := F_p ∘ F_n^{-1}, where F_p and F_n are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes.

What carries the argument

The composition G := F_p ∘ F_n^{-1} that maps the negative-class score distribution through the positive-class distribution and thereby fixes every point on the ROC and PR curves.

If this is right

  • Operating-point selection and threshold choice reduce to inspecting the shape of G.
  • Classifier dominance can be decided by comparing the corresponding G functions directly.
  • The geometry of ROC and PR curves is fully explained by the degree of class separability and the variance ratio encoded in G.
  • Cost-sensitive and capacity-constrained decisions become direct functions of G and its inverse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • New parametric families of ROC and PR curves could be generated simply by choosing convenient forms for G.
  • The same reduction may apply to other threshold-based performance measures not explicitly treated in the paper.
  • Empirical checks on real datasets could verify whether observed metric differences collapse once G is matched.

Load-bearing premise

The class-conditional score distributions admit well-defined and invertible cumulative distribution functions.

What would settle it

Two classifiers whose score distributions produce identical G yet yield different values for AUC, average precision, or any other standard metric derived from ROC or PR curves.

Figures

Figures reproduced from arXiv: 2504.02169 by Reza Sameni.

Figure 1
Figure 1. Figure 1: Illustration of score distributions fn(x) and fp(x) in a binary classification problem. Shaded regions show classifier decisions based on the threshold τ ; observed score distribution f(x) (pool of both classes) not shown. See (4), (5), (6), and (7) for definitions of shaded areas. 2 Binary Classification Setup Consider a binary classification problem in which a classifier C assigns continuous real-valued … view at source ↗
Figure 2
Figure 2. Figure 2: Common classification evaluation measures [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A typical ROC curve highlighting the composition function [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of different constrained scenarios and the corresponding design regions in the ROC [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: ROC curves for Gaussian score classifiers under different parameter settings. (a) Varying [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Generic ROC family generated from the transformation [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: PR curves for binomial positive and negative class scores under varying classifier parameters and [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: PR curves for binomial positive and negative class scores under fixed values of [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback-Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper studies the geometry of ROC and PR curves in binary classification. Its central claim is that many commonly used metrics are merely functions of the composition G := F_p ∘ F_n^{-1}, where F_p and F_n are the class-conditional CDFs of classifier scores in the positive and negative classes. The work examines how this view aids operating-point selection, threshold effects, classifier comparison and dominance conditions; it supplies analytical/numerical examples on separability and variance, derives a link between G and KL divergence, and discusses practical issues such as calibration, cost-sensitive learning, and capacity constraints.

Significance. If the geometric reduction is made precise (including explicit handling of prevalence), the framework could supply a compact language for comparing classifiers and selecting thresholds under application-specific constraints. The explicit examples on how separability and variance shape the curves, together with the KL link, would be useful strengths if they are derived without circularity.

major comments (1)
  1. [Abstract] Abstract (key finding paragraph): the statement that 'many of the most commonly used binary classification metrics are merely functions of the composition function G' does not hold for PR-derived quantities. ROC coordinates depend only on G (FPR = t, TPR = 1 - G(t)), but precision = (TPR · π) / (TPR · π + FPR · (1-π)) is a joint function of G and the prevalence π. Any claim that PR geometry is 'merely' a function of G therefore requires either an explicit restriction to ROC or a clear statement that π is carried as an additional parameter; otherwise the central reduction is overstated for the PR case.
minor comments (1)
  1. [Abstract] The abstract announces a derivation linking G to Kullback-Leibler divergence but supplies no equation or sketch; the full manuscript should state the precise relation (e.g., which integral or expectation) so readers can verify it is not tautological.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the precise observation regarding the abstract. We address the comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (key finding paragraph): the statement that 'many of the most commonly used binary classification metrics are merely functions of the composition function G' does not hold for PR-derived quantities. ROC coordinates depend only on G (FPR = t, TPR = 1 - G(t)), but precision = (TPR · π) / (TPR · π + FPR · (1-π)) is a joint function of G and the prevalence π. Any claim that PR geometry is 'merely' a function of G therefore requires either an explicit restriction to ROC or a clear statement that π is carried as an additional parameter; otherwise the central reduction is overstated for the PR case.

    Authors: We agree that the abstract phrasing is imprecise for the PR case. ROC coordinates (FPR, TPR) are indeed functions of G alone, whereas precision explicitly incorporates prevalence π. In the body of the manuscript we already treat π as an explicit parameter when deriving PR curves, but the abstract does not make this distinction clear. We will revise the abstract to state that ROC metrics depend only on G while PR metrics depend on G together with π (treated as a fixed problem parameter). This preserves the geometric framework while removing the overstatement. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation follows directly from CDF definitions

full rationale

The paper constructs its geometric framework for ROC and PR curves from the standard definitions of class-conditional CDFs F_p and F_n and their composition G. All subsequent expressions for metrics follow by algebraic substitution using the usual probabilistic definitions of TPR, FPR, precision, etc. No parameters are fitted and then relabeled as predictions, no self-citations serve as load-bearing uniqueness theorems, and no ansatz is smuggled in. The central claim that many metrics are functions of G is a direct consequence of the input definitions rather than a tautology that redefines its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified from the provided information.

pith-pipeline@v0.9.0 · 5742 in / 1059 out tokens · 48777 ms · 2026-05-22T21:21:01.943879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    W. J. Krzanowski and D. J. Hand, ROC curves for continuous data . Chapman and Hall/CRC, 2009

  2. [2]

    Bipartite Ranking: a Risk-Theoretic Perspective,

    A. K. Menon and R. C. Williamson, “Bipartite Ranking: a Risk-Theoretic Perspective,” Journal of Machine Learning Research, vol. 17, no. 195, pp. 1–102, 2016

  3. [3]

    Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,

    D. E. Leisman, “Rare Events in the ICU: An Emerging Challenge in Classification and Prediction,” Critical Care Medicine, vol. 46, p. 418–424, Mar. 2018

  4. [4]

    H. L. Van Trees, Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. John Wiley & Sons, 2004

  5. [5]

    The meaning and use of the area under a receiver operating characteristic (ROC) curve,

    J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, p. 29–36, Apr. 1982

  6. [6]

    An introduction to ROC analysis,

    T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, p. 861–874, June 2006. 1Remember that G(·) and g(·) are only defined over [0 , 1]. 16

  7. [7]

    The relationship between Precision-Recall and ROC curves,

    J. Davis and M. Goadrich, “The relationship between Precision-Recall and ROC curves,” in Proceedings of the 23rd international conference on Machine learning - ICML ’06 , ICML ’06, p. 233–240, ACM Press, 2006

  8. [8]

    Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,

    M. A. Reyna, J. Weigle, Z. Koscova, A. Elola, S. Seyedi, K. Campbell, M.-S. Hassannia, J. Pavlus, A. H. Ribeiro, A. L. P. Ribeiro, R. Sameni, and G. D. Clifford, “Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025,” 2025. Accessed: 2025-04-01

  9. [9]

    D. S. Naidu, Optimal control systems. CRC press, 2018

  10. [10]

    Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,

    R. Sameni, “Model-Based Prediction and Optimal Control of Pandemics by Non-Pharmaceutical Inter- ventions,” IEEE Journal of Selected Topics in Signal Processing , vol. 16, p. 307–317, Feb. 2022

  11. [11]

    Age, sex and race bias in automated arrhythmia detectors,

    E. A. Perez Alday, A. B. Rad, M. A. Reyna, N. Sadr, A. Gu, Q. Li, M. Dumitru, J. Xue, D. Albert, R. Sameni, and G. D. Clifford, “Age, sex and race bias in automated arrhythmia detectors,” Journal of Electrocardiology, vol. 74, p. 5–9, Sept. 2022

  12. [12]

    Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,

    M. A. Reyna, Y. Kiarashi, A. Elola, J. Oliveira, F. Renna, A. Gu, E. A. Perez Alday, N. Sadr, A. Sharma, J. Kpodonu, S. Mattos, M. T. Coimbra, R. Sameni, A. B. Rad, and G. D. Clifford, “Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022,” PLOS Digital Health, vol. 2, p. e0000324, Sept. 2023

  13. [13]

    Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,

    E. A. Perez Alday, A. Gu, A. J Shah, C. Robichaux, A.-K. Ian Wong, C. Liu, F. Liu, A. Bahrami Rad, A. Elola, S. Seyedi, Q. Li, A. Sharma, G. D. Clifford, and M. A. Reyna, “Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020,” Physiological Measurement, vol. 41, p. 124003, Dec. 2020

  14. [14]

    Papoulis and S

    A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes . McGraw-Hill, 4th ed., 2002. 17