pith. sign in

arxiv: 2605.17749 · v1 · pith:TNHGTJCFnew · submitted 2026-05-18 · 💻 cs.LG · stat.ML

Testable and Actionable Calibration for Full Swap Regret

Pith reviewed 2026-05-20 12:17 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords calibrationswap regrettestable calibrationactionable calibrationmachine learningdecision theorysoft binningfinite sample estimation
0
0 comments X

The pith

Soft-Binned Calibration Decision Loss bounds full swap regret exactly while estimating from finite samples at near-optimal rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks a calibration measure that is actionable, directly bounding the decision maker's utility loss when treating predictions as true probabilities through the full swap regret, and testable, allowing accurate measurement of calibration error from small samples of predictions and outcomes. Prior measures either relax actionability by bounding only a weaker form of regret or suffer from suboptimal sample complexity in estimation. The authors introduce Soft-Binned Calibration Decision Loss (SCDL) and prove it achieves both goals simultaneously without weakening either, while also meeting continuity and consistency. This combination matters because it lets users in high-stakes applications obtain reliable guarantees on both decision quality and measurement error from practical data sizes.

Core claim

SCDL is a calibration measure constructed via soft-binning that exactly upper-bounds the full swap regret incurred when predictions are used as probabilities, admits estimation from finite samples with error rate nearly matching the information-theoretic optimum, and satisfies continuity and consistency.

What carries the argument

The soft-binning construction, which softly discretizes prediction values to retain the exact full-swap-regret bound while supporting efficient finite-sample estimation.

If this is right

  • Decision makers obtain an explicit upper bound on utility loss from treating predictions as probabilities.
  • Calibration error can be estimated reliably from small datasets without requiring impractically large samples.
  • The measure remains well-behaved under small perturbations of the predictions due to continuity.
  • SCDL approaches zero if and only if the predictions are perfectly calibrated in the limit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The construction may allow direct incorporation of calibration auditing into online regret-minimization procedures.
  • Practitioners could apply SCDL to audit deployed predictors in sequential decision systems where both regret and sample efficiency matter.
  • The same soft-binning idea might extend to multi-class or structured output settings while preserving the exact regret bound.

Load-bearing premise

The soft-binning step preserves the precise full-swap-regret bound without any hidden relaxation or extra assumptions on the underlying data distribution.

What would settle it

A concrete prediction-outcome distribution and sample size where either the SCDL value fails to upper-bound the realized full swap regret or the estimation error exceeds the claimed near-optimal rate by more than a constant factor.

Figures

Figures reproduced from arXiv: 2605.17749 by Huy L. Nguyen, Jonathan Ullman, Konstantina Bairaktari, Lunjia Hu.

Figure 1
Figure 1. Figure 1: Prediction curves for different values of [PITH_FULL_IMAGE:figures/full_fig_p030_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Plots of calibration error measures versus swap regret for predictor [PITH_FULL_IMAGE:figures/full_fig_p030_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Plots of calibration error measures versus swap regret for predictor 1 [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗
read the original abstract

AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure that fully satisfies both properties, and all existing measures relax actionability by bounding a weaker notion of swap regret, or relax testability by having suboptimal estimation error. We introduce a new calibration measure, Soft-Binned Calibration Decision Loss (SCDL), which we prove is fully actionable without weakening either requirement, and testable with nearly optimal error rate. In addition, SCDL satisfies other desired properties such as continuity and consistency. We also provide a set of experiments confirming that the theoretical advantages of SCDL compared to other measures lead to better performance in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Soft-Binned Calibration Decision Loss (SCDL) as a new calibration measure for predictions. It claims to prove that SCDL is fully actionable by providing an exact (non-relaxed) upper bound on full swap regret for any decision maker, and testable via a finite-sample estimator whose error rate is near-optimal (matching information-theoretic bounds up to lower-order terms) without hidden modeling assumptions on the outcome distribution. Additional properties shown include continuity and consistency, with experiments demonstrating practical advantages over prior measures.

Significance. If the central derivations hold, SCDL would be the first calibration measure to achieve both exact actionability for full swap regret and near-optimal testability simultaneously, addressing a documented gap where existing measures relax one requirement or the other. This could improve the reliability of AI-assisted decisions in high-stakes settings by directly linking calibration error to utility loss without parameter tuning or distributional assumptions.

major comments (2)
  1. [§4.2] §4.2 (soft-binning construction and swap-regret inequality): The translation from soft-bin probabilities to the exact full-swap-regret bound is load-bearing for the actionability claim. The manuscript must explicitly show that the softness parameter can be fixed independently of unknown distribution properties (e.g., Lipschitz constants or density bounds) while preserving the non-relaxed inequality; otherwise the bound becomes approximate and one of the two central requirements is weakened.
  2. [§5.3] §5.3 (finite-sample analysis of the estimator): The near-optimal error rate claim requires that the estimator incurs no extra bias term scaling worse than the stated rate. If the analysis relies on the softness parameter being chosen as a function of unknown quantities to keep the regret bound exact, this must be stated and shown not to degrade the rate beyond lower-order terms; the current sketch leaves open whether hidden relaxations are present.
minor comments (2)
  1. [§3] Notation for the softness parameter should be introduced with an explicit range and independence statement in the definition section to avoid reader confusion about data-dependent tuning.
  2. [Experiments] Figure 2 (experimental comparison) would benefit from error bars or confidence intervals on the reported performance differences to make the practical advantage clearer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review. We address each major comment below with clarifications from the manuscript proofs and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (soft-binning construction and swap-regret inequality): The translation from soft-bin probabilities to the exact full-swap-regret bound is load-bearing for the actionability claim. The manuscript must explicitly show that the softness parameter can be fixed independently of unknown distribution properties (e.g., Lipschitz constants or density bounds) while preserving the non-relaxed inequality; otherwise the bound becomes approximate and one of the two central requirements is weakened.

    Authors: In the proof of the main actionability result (Theorem 4.1), the softness parameter is set to a fixed positive constant chosen independently of any unknown properties of the outcome distribution, such as Lipschitz constants or density bounds. The construction ensures the inequality relating SCDL to full swap regret remains exact (non-relaxed) because the soft bins are defined via a fixed smoothing that upper-bounds the decision loss without requiring distributional knowledge. We will revise §4.2 to include an explicit remark and a short lemma stating this independence and confirming that the non-relaxed bound holds for any fixed softness parameter in (0,1). revision: yes

  2. Referee: [§5.3] §5.3 (finite-sample analysis of the estimator): The near-optimal error rate claim requires that the estimator incurs no extra bias term scaling worse than the stated rate. If the analysis relies on the softness parameter being chosen as a function of unknown quantities to keep the regret bound exact, this must be stated and shown not to degrade the rate beyond lower-order terms; the current sketch leaves open whether hidden relaxations are present.

    Authors: The finite-sample analysis in §5.3 fixes the softness parameter independently of unknown quantities (as clarified in the response to the first comment) and shows that any additive bias introduced by soft-binning is absorbed into lower-order terms that do not affect the leading near-optimal rate. The concentration inequalities are derived under no modeling assumptions on the outcome distribution, matching the information-theoretic lower bound up to lower-order factors. We will expand the proof sketch in §5.3 with an explicit bias calculation and a remark confirming the absence of hidden relaxations or distribution-dependent parameter choices. revision: yes

Circularity Check

0 steps flagged

No circularity: SCDL defined independently then proven to satisfy external properties

full rationale

The derivation introduces SCDL via soft-binning on the decision loss, then separately establishes the exact full-swap-regret bound and the finite-sample estimation rate. Neither property is obtained by re-labeling a fitted quantity nor by a self-citation chain that substitutes for a proof. The central claims rest on explicit inequalities and concentration arguments that are not tautological with the definition itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard definitions of calibration and swap regret from prior literature; no new free parameters, invented entities, or non-standard axioms are mentioned in the abstract.

axioms (1)
  • standard math Standard definitions and properties of calibration error and swap regret from prior literature.
    The paper builds directly on existing notions of calibration and regret without re-deriving them.

pith-pipeline@v0.9.0 · 5768 in / 1145 out tokens · 47133 ms · 2026-05-20T12:17:16.875824+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    Metrics of calibration for probabilistic predictions

    Imanol Arrieta-Ibarra, Paman Gujral, Jonathan Tannen, Mark Tygert, and Cherie Xu. Metrics of calibration for probabilistic predictions. Journal of Machine Learning Research , 23(351):1--54, 2022. URL: http://jmlr.org/papers/v23/22-0658.html

  2. [2]

    Błasiok, P

    Jaros aw B asiok, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran. A unifying theory of distance from calibration. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing , STOC 2023, page 1727–1740, New York, NY, USA, 2023. Association for Computing Machinery. https://doi.org/10.1145/3564246.3585182 doi:10.1145/3564246.3585182

  3. [3]

    When does optimizing a proper loss yield calibration? In A

    Jaroslaw Blasiok, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran. When does optimizing a proper loss yield calibration? In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 72071--72095. Curran Associates, Inc., 2023. URL: https://proceedings.neurips.cc/pa...

  4. [4]

    Smooth ECE : Principled reliability diagrams via kernel smoothing

    Jaroslaw Blasiok and Preetum Nakkiran. Smooth ECE : Principled reliability diagrams via kernel smoothing. In The Twelfth International Conference on Learning Representations , 2024. URL: https://openreview.net/forum?id=XwiA1nDahv

  5. [5]

    A. P. Dawid. The well-calibrated bayesian. Journal of the American Statistical Association , 77(379):605--610, 1982. https://doi.org/10.1080/01621459.1982.10477856 doi:10.1080/01621459.1982.10477856

  6. [6]

    Breaking the T\^ (2/3) barrier for sequential calibration

    Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich, Robert Kleinberg, and Princewill Okoroafor. Breaking the T\^ (2/3) barrier for sequential calibration. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing , STOC '25, page 2007–2018, New York, NY, USA, 2025. Association for Computing Machinery. https://doi.org/10.11...

  7. [7]

    High-dimensional calibration from swap regret

    Maxwell Fishelson, Noah Golowich, Mehryar Mohri, and Jon Schneider. High-dimensional calibration from swap regret. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2026. URL: https://openreview.net/forum?id=UVDihUz0iT

  8. [8]

    Foster and Rakesh V

    Dean P. Foster and Rakesh V. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior , 21(589):40--55, 1997

  9. [9]

    Foster and Rakesh V

    Dean P. Foster and Rakesh V. Vohra. Asymptotic calibration. Biometrika , 85(2):379--390, 06 1998. https://doi.org/10.1093/biomet/85.2.379 doi:10.1093/biomet/85.2.379

  10. [10]

    Kim, Omer Reingold, and Udi Wieder

    Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. Loss Minimization Through the Lens Of Outcome Indistinguishability . In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023) , volume 251 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 60:1--60:20, Dagstuhl, Germ...

  11. [11]

    Rothblum

    Parikshit Gopalan, Lunjia Hu, and Guy N. Rothblum. On computationally efficient multi-class calibration. In Shipra Agrawal and Aaron Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory , volume 247 of Proceedings of Machine Learning Research , pages 1983--2026. PMLR, 30 Jun--03 Jul 2024. URL: https://proceedings.mlr.press/v247/gopal...

  12. [12]

    Oracle efficient online multicalibration and omniprediction

    Sumegha Garg, Christopher Jung, Omer Reingold, and Aaron Roth. Oracle efficient online multicalibration and omniprediction. In David P. Woodruff, editor, Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, Alexandria, VA, USA, January 7-10, 2024 , pages 2725--2792. SIAM , 2024. https://doi.org/10.1137/1.9781611977912.98 doi:10.11...

  13. [13]

    Omnipredictors

    Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors . In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022) , volume 215 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 79:1--79:21, Dagstuhl, Germany, 2022. Schloss Dagstuhl -- Leibniz-Zentrum f...

  14. [14]

    Swap agnostic learning, or characterizing omniprediction via multicalibration

    Parikshit Gopalan, Michael Kim, and Omer Reingold. Swap agnostic learning, or characterizing omniprediction via multicalibration. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 39936--39956. Curran Associates, Inc., 2023. URL: https://proceedings.neurips...

  15. [15]

    Omnipredictors for regression and the approximate rank of convex functions

    Parikshit Gopalan, Princewill Okoroafor, Prasad Raghavendra, Abhishek Sherry, and Mihir Singhal. Omnipredictors for regression and the approximate rank of convex functions. In Shipra Agrawal and Aaron Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory , volume 247 of Proceedings of Machine Learning Research , pages 2027--2070. PMLR...

  16. [16]

    Efficient calibration for decision making

    Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, and Pranay Tankala. Efficient calibration for decision making. arXiv preprint arXiv:2511.13699 , 2025

  17. [17]

    The importance of being smoothly calibrated

    Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, and Pranay Tankala. The importance of being smoothly calibrated. arXiv preprint arXiv:2603.16015 , 2026

  18. [18]

    A Perfectly Truthful Calibration Measure

    Jason Hartline, Lunjia Hu, and Yifan Wu. A perfectly truthful calibration measure. arXiv preprint arXiv:2508.13100 , 2025

  19. [19]

    Multicalibration: Calibration for the ( C omputationally-identifiable) masses

    Ursula Hebert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the ( C omputationally-identifiable) masses. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research , pages 1939--1948. PMLR, 10--15 Jul 201...

  20. [20]

    Omnipredictors for constrained optimization

    Lunjia Hu, Inbal Rachel Livni Navon, Omer Reingold, and Chutong Yang. Omnipredictors for constrained optimization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning , volume 202 of Proceedings of Machine Learning Research ...

  21. [21]

    Truthfulness of calibration measures

    Nika Haghtalab, Mingda Qiao, Kunhe Yang, and Eric Zhao. Truthfulness of calibration measures. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volume 37, pages 117237--117290. Curran Associates, Inc., 2024. URL: https://proceedings.neurips.cc/paper_files/pape...

  22. [22]

    Omnipredicting single-index models with multi-index models

    Lunjia Hu, Kevin Tian, and Chutong Yang. Omnipredicting single-index models with multi-index models. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing , STOC '25, page 1762–1773, New York, NY, USA, 2025. Association for Computing Machinery. https://doi.org/10.1145/3717823.3718223 doi:10.1145/3717823.3718223

  23. [23]

    Simultaneous blackwell approachability and applications to multiclass omniprediction

    Lunjia Hu, Kevin Tian, and Chutong Yang. Simultaneous blackwell approachability and applications to multiclass omniprediction. arXiv preprint arXiv:2602.17577 , 2026

  24. [24]

    Minor containment and disjoint paths in almost-linear time

    Lunjia Hu and Yifan Wu. Predict to minimize swap regret for all payoff-bounded tasks. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) , pages 244--263, 2024. https://doi.org/10.1109/FOCS61266.2024.00024 doi:10.1109/FOCS61266.2024.00024

  25. [25]

    Smooth Calibration and Decision Making

    Jason Hartline, Yifan Wu, and Yunran Yang. Smooth Calibration and Decision Making . In Mark Bun, editor, 6th Symposium on Foundations of Responsible Computing (FORC 2025) , volume 329 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 16:1--16:26, Dagstuhl, Germany, 2025. Schloss Dagstuhl -- Leibniz-Zentrum f \"u r Informatik. URL: https...

  26. [26]

    Kakade and Dean P

    Sham M. Kakade and Dean P. Foster. Deterministic calibration and Nash equilibrium. Journal of Computer and System Sciences , 74(1):115--130, 2008. Learning Theory 2004. URL: https://www.sciencedirect.com/science/article/pii/S0022000007000633, https://doi.org/10.1016/j.jcss.2007.04.017 doi:10.1016/j.jcss.2007.04.017

  27. [27]

    Kim, Christoph Kern, Shafi Goldwasser, Frauke Kreuter, and Omer Reingold

    Michael P. Kim, Christoph Kern, Shafi Goldwasser, Frauke Kreuter, and Omer Reingold. Universal adaptability: Target-independent inference that competes with propensity scoring. Proceedings of the National Academy of Sciences , 119(4):e2108097119, 2022. URL: https://www.pnas.org/doi/abs/10.1073/pnas.2108097119, https://arxiv.org/abs/https://www.pnas.org/do...

  28. [28]

    U-calibration: Forecasting for an unknown agent

    Bobby Kleinberg, Renato Paes Leme, Jon Schneider, and Yifeng Teng. U-calibration: Forecasting for an unknown agent. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory , volume 195 of Proceedings of Machine Learning Research , pages 5143--5145. PMLR, 12--15 Jul 2023. URL: https://proceedings.mlr.press/v19...

  29. [29]

    Kim and Juan C

    Michael P. Kim and Juan C. Perdomo. Making Decisions Under Outcome Performativity . In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023) , volume 251 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 79:1--79:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl -- Leibniz-Zentrum f \"u r Informa...

  30. [30]

    Sample efficient omniprediction and downstream swap regret for non-linear losses

    Jiuyao Lu, Aaron Roth, and Mirah Shi. Sample efficient omniprediction and downstream swap regret for non-linear losses. In Nika Haghtalab and Ankur Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 3829--3878. PMLR, 30 Jun--04 Jul 2025. URL: https://proceedings.mlr....

  31. [31]

    High-dimensional unbiased prediction for sequential decision making

    Georgy Noarov, Ramya Ramalingam, Aaron Roth, and Stephan Xie. High-dimensional unbiased prediction for sequential decision making. In OPT 2023: Optimization for Machine Learning , 2023. URL: https://openreview.net/forum?id=P4j4l45NUq

  32. [32]

    Princewill Okoroafor, Robert Kleinberg, and Michael P. Kim. Near-optimal algorithms for omniprediction. In 2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , pages 1595--1609, 2025. https://doi.org/10.1109/FOCS63196.2025.00084 doi:10.1109/FOCS63196.2025.00084

  33. [33]

    High dimensional online calibration in polynomial time

    Binghui Peng. High dimensional online calibration in polynomial time. arXiv preprint arXiv:2504.09096 , 2025

  34. [34]

    [KAP+23] Fivos Kalogiannis, Ioannis Anagnostides, Ioannis Pana geas, Emmanouil-Vasileios Vlatakis- Gkaragkounis, Vaggos Chatziafratis, and Stelios Stavroul akis

    Mingda Qiao and Gregory Valiant. Stronger calibration lower bounds via sidestepping. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , STOC 2021, page 456–466, New York, NY, USA, 2021. Association for Computing Machinery. https://doi.org/10.1145/3406325.3451050 doi:10.1145/3406325.3451050

  35. [35]

    Truthfulness of decision-theoretic calibration measures

    Mingda Qiao and Eric Zhao. Truthfulness of decision-theoretic calibration measures. In Nika Haghtalab and Ankur Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 4686--4739. PMLR, 30 Jun--04 Jul 2025. URL: https://proceedings.mlr.press/v291/qiao25a.html

  36. [36]

    Forecasting for swap regret for all downstream agents

    Aaron Roth and Mirah Shi. Forecasting for swap regret for all downstream agents. In Proceedings of the 25th ACM Conference on Economics and Computation , EC '24, page 466–488, New York, NY, USA, 2024. Association for Computing Machinery. https://doi.org/10.1145/3670865.3673622 doi:10.1145/3670865.3673622

  37. [37]

    Soloff, Rina Foygel Barber, Zhimei Ren, and Rebecca Willett

    Raphael Rossellini, Jake A. Soloff, Rina Foygel Barber, Zhimei Ren, and Rebecca Willett. Can a calibration metric be both testable and actionable? In Nika Haghtalab and Ankur Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 4937--4972. PMLR, 30 Jun--04 Jul 2025. UR...