pith. sign in

arxiv: 2605.09273 · v2 · pith:JOIDNAFSnew · submitted 2026-05-10 · 💻 cs.LG

Instance-Adaptive Online Multicalibration

Pith reviewed 2026-05-22 09:37 UTC · model grok-4.3

classification 💻 cs.LG
keywords online multicalibrationadaptive algorithmsdyadic gridthreshold complexityonline learningcalibrationpiecewise stationary processes
0
0 comments X

The pith

A single efficient algorithm for online multicalibration adapts its error to the complexity of the predictable mean process via dyadic grid refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces one algorithm that maintains online multicalibration by dynamically refining a dyadic grid over possible prediction values. Its total error is bounded by a function of the number of leaves that appear in the final refinement tree. This bound automatically recovers the known worst-case rate of order T to the two-thirds power, yet improves to square-root rates when the underlying means are stationary or change only a few times. The rate further depends on a threshold-complexity measure of the mean process relative to the given collection of groups. A reader would care because the same procedure works without prior knowledge of whether the data will be hard or easy.

Core claim

There exists a single efficient algorithm whose multicalibration error is controlled by the number of leaves in an adaptively refined dyadic grid, recovering the Õ(T^{2/3}) worst-case rate while automatically achieving Õ(√T) in the marginal stochastic setting and Õ(√(JT)) for piecewise-stationary means with J segments; the dependence on a threshold-complexity measure of the predictable mean process is tight up to logarithmic factors.

What carries the argument

Adaptively refined dyadic grid of prediction values, whose leaf count directly controls the multicalibration error bound.

If this is right

  • The algorithm matches the known optimal worst-case rate of Õ(T^{2/3}) without any special tuning.
  • In fully stochastic settings the same procedure automatically improves to Õ(√T).
  • When the mean process changes only J times the error scales as Õ(√(JT)).
  • The dependence on threshold complexity is information-theoretically tight up to logarithmic factors.
  • The algorithm remains efficient even while producing these instance-adaptive guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive-refinement idea could be applied to other online calibration or regret problems where instance difficulty varies.
  • One could test the method on real data streams that mix stationary periods with occasional shifts to check whether observed error tracks leaf count.
  • The threshold-complexity measure might serve as a new way to quantify predictability in sequential decision problems.
  • Extensions to continuous outcome spaces would require replacing the dyadic grid with a different adaptive partition.

Load-bearing premise

The multicalibration error incurred by the algorithm is bounded by a function of the number of leaves that appear in the refinement tree.

What would settle it

A concrete counter-example sequence whose predictable means have low threshold complexity yet produce multicalibration error strictly larger than the claimed bound on the number of leaves.

Figures

Figures reproduced from arXiv: 2605.09273 by Aaron Roth, Claire Jie Zhang, Jamie Morgenstern, Zhiming Huang.

Figure 1
Figure 1. Figure 1: A typical state of the dynamic-bin data structure. The learner predicts using the midpoints of the current leaves, and an interval is refined only after the total mass assigned to it reaches the threshold L/w2 I . Why this split threshold? The threshold N(I) ≥ L/w2 I is the scale at which further refinement becomes worthwhile. An interval of width wI contributes deterministic discretization error on the or… view at source ↗
read the original abstract

We study online multicalibration beyond the worst-case. We give a single, efficient algorithm which dynamically interpolates between benign and worst-case sequences by adaptively refining a dyadic grid of prediction values. Its error is controlled by the number of leaves in the refinement tree. Our analysis recovers the known $\widetilde O(T^{2/3})$ worst-case-optimal rate for online multicalibration, while simultaneously automatically adapting to easier instances: in the marginal stochastic setting it obtains a rate of $\widetilde O(\sqrt T)$, and for piecewise-stationary means with $J$ segments its rate is $\widetilde O(\sqrt{JT})$. More generally, the rate depends on a threshold-complexity measure of the predictable mean process relative to the group family. We show that this dependence is tight up to logarithmic factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to provide a single efficient algorithm for online multicalibration that uses adaptive dyadic grid refinement to control the error by the number of leaves in the refinement tree. This allows it to recover the Õ(T^{2/3}) worst-case rate while achieving Õ(√T) in marginal stochastic settings and Õ(√(JT)) for piecewise-stationary means with J segments. The rate depends on a threshold-complexity measure of the predictable mean process relative to the group family, and this dependence is shown to be tight up to logarithmic factors.

Significance. If the results hold, this work is significant for introducing a unified adaptive algorithm that automatically interpolates between worst-case and easier regimes in online multicalibration via dyadic refinement and a threshold-complexity measure of the mean process. The matching lower bounds up to logs and the explicit interpolation between Õ(T^{2/3}), Õ(√T), and Õ(√(JT)) regimes strengthen the contribution to instance-adaptive online learning.

minor comments (2)
  1. The abstract states the rates and tightness but provides no derivation steps, error-bar discussion, or explicit handling of post-hoc choices; a one-sentence outline of the proof strategy in the introduction would improve accessibility without altering the technical content.
  2. The threshold-complexity measure is presented as an external property of the mean process; an early informal definition or example in Section 1 would help readers connect it to the group family before the formal analysis.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work and for recommending minor revision. The referee's summary correctly captures the main contributions of the paper, including the single efficient algorithm that uses adaptive dyadic grid refinement to achieve instance-adaptive rates for online multicalibration.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines an adaptive dyadic refinement algorithm whose multicalibration error is bounded by a function of the number of leaves in the refinement tree. This leaf count is then shown to be controlled by an independently defined threshold-complexity measure of the predictable mean process relative to the given group family. The complexity measure is an external property of the input sequence, not constructed from the algorithm's outputs or fitted parameters. The resulting rates (worst-case Õ(T^{2/3}), stochastic Õ(√T), piecewise-stationary Õ(√(JT))) follow directly from this instance-dependent bound without reducing to self-definition or self-citation chains. Matching lower bounds are provided separately, confirming the analysis does not rely on circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim appears to rest on an implicit modeling assumption that the multicalibration error is governed by leaf count in a dyadic refinement tree and on the existence of a threshold-complexity measure for the mean process.

pith-pipeline@v0.9.0 · 5661 in / 1330 out tokens · 26311 ms · 2026-05-22T09:37:52.359558+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adaptive Calibration in Non-Stationary Environments

    cs.LG 2026-05 unverdicted novelty 7.0

    Online algorithms achieve adaptive calibration bounds of Õ(√T + (TC)^{1/3}) for ℓ1 error and Õ((1+C)^{1/3}) for ℓ2 and pseudo-KL error, matching stationary and adversarial extremes via epoch scheduling and non-uniform...

  2. Adaptive Calibration in Non-Stationary Environments

    cs.LG 2026-05 unverdicted novelty 7.0

    Algorithms achieve adaptive calibration bounds of order min{sqrt(T) + (T C)^{1/3}, sqrt(K T)} for l1 error and min{(1+C)^{1/3}, K} for l2 and pseudo-KL error, where K and C are unknown non-stationarity measures.

Reference graph

Works this paper leans on

132 extracted references · 132 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Electronic Communications in Probability , publisher =

    Joel Tropp , title =. Electronic Communications in Probability , publisher =

  2. [2]

    Achieving all with no parameters: Adanormalhedge , author=. Proc. Conference on Learning Theory (COLT) , pages=. 2015 , organization=

  3. [3]

    Sample Complexity of Uniform Convergence for Multicalibration , author =. Proc. Advances in Neural Information Processing Systems (NeurIPS) , volume =

  4. [4]

    Distribution-free calibration guarantees for histogram binning without sample splitting , author=. Proc. International conference on machine learning (ICML) , pages=. 2021 , organization=

  5. [5]

    Games and Economic Behavior , volume=

    Calibrated learning and correlated equilibrium , author=. Games and Economic Behavior , volume=. 1997 , publisher=

  6. [6]

    Operations Research & Management Science in the Age of Analytics , pages=

    Wasserstein distributionally robust optimization: Theory and applications in machine learning , author=. Operations Research & Management Science in the Age of Analytics , pages=. 2019 , publisher=

  7. [7]

    Calibrating predictions to decisions: A novel approach to multi-class calibration , author=. Proc. Advances in Neural Information Processing Systems (NeurIPS) , volume=

  8. [8]

    Lunjia Hu and Yifan Wu , title =. 65th

  9. [9]

    arXiv preprint arXiv:2501.17205 , year=

    Near-optimal algorithms for omniprediction , author=. arXiv preprint arXiv:2501.17205 , year=

  10. [10]

    American Economic Review , volume=

    Robustness and linear contracts , author=. American Economic Review , volume=. 2015 , publisher=

  11. [11]

    Forecasting for Swap Regret for All Downstream Agents , booktitle =

    Aaron Roth and Mirah Shi , editor =. Forecasting for Swap Regret for All Downstream Agents , booktitle =

  12. [12]

    Bobby Kleinberg and Renato Paes Leme and Jon Schneider and Yifeng Teng , title =. Proc. Annual Conference on Learning Theory (COLT) , series =

  13. [13]

    14th Innovations in Theoretical Computer Science Conference, ITCS 2023 , pages=

    Decision-Making Under Miscalibration , author=. 14th Innovations in Theoretical Computer Science Conference, ITCS 2023 , pages=. 2023 , organization=

  14. [14]

    Machine learning: ECML 2002: 13th European conference on machine learning Helsinki, Finland, August 19--23, 2002 proceedings 13 , pages=

    Inductive confidence machines for regression , author=. Machine learning: ECML 2002: 13th European conference on machine learning Helsinki, Finland, August 19--23, 2002 proceedings 13 , pages=. 2002 , organization=

  15. [15]

    The Annals of Mathematical Statistics , volume=

    Determination of sample sizes for setting tolerance limits , author=. The Annals of Mathematical Statistics , volume=. 1941 , publisher=

  16. [16]

    Non-Parametric Estimation. I. Validation of Order Statistics , author =. Annals of Mathematical Statistics , volume =. 1945 , month =. doi:10.1214/aoms/1177731119 , url =

  17. [17]

    Algorithmic Learning in a Random World , journal =

    Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , year =. Algorithmic Learning in a Random World , journal =

  18. [18]

    and Gammerman, A

    Saunders, C. and Gammerman, A. and Vovk, V. , title =. Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2 , pages =. 1999 , publisher =

  19. [19]

    Transduction with confidence and credibility , author=

  20. [20]

    Machine-learning applications of algorithmic randomness , author=

  21. [21]

    2005 , publisher=

    Algorithmic learning in a random world , author=. 2005 , publisher=

  22. [22]

    2024 , eprint=

    Length Optimization in Conformal Prediction , author=. 2024 , eprint=

  23. [23]

    2022 , eprint=

    Learning Optimal Conformal Classifiers , author=. 2022 , eprint=

  24. [24]

    and Ramdas, Aaditya , year=

    Gupta, Chirag and Kuchibhotla, Arun K. and Ramdas, Aaditya , year=. Nested conformal prediction and quantile out-of-bag ensemble methods , volume=. doi:10.1016/j.patcog.2021.108496 , journal=

  25. [25]

    2025 , eprint=

    Conformal Risk Minimization with Variance Reduction , author=. 2025 , eprint=

  26. [26]

    2017 , eprint=

    Distribution-Free Predictive Inference For Regression , author=. 2017 , eprint=

  27. [27]

    Journal of mathematical economics , volume=

    Maxmin expected utility with non-unique prior , author=. Journal of mathematical economics , volume=. 1989 , publisher=

  28. [28]

    arXiv preprint arXiv:2502.17830 , year=

    Certified Decisions , author=. arXiv preprint arXiv:2502.17830 , year=

  29. [29]

    Mathematical programming , volume=

    Robust optimization--methodology and applications , author=. Mathematical programming , volume=. 2002 , publisher=

  30. [30]

    Softmax probabilities (mostly) predict large language model correctness on multiple-choice q&a

    Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A , author=. arXiv preprint arXiv:2402.13213 , year=

  31. [31]

    The Thirty Seventh Annual Conference on Learning Theory , pages=

    Omnipredictors for regression and the approximate rank of convex functions , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

  32. [32]

    Sample Efficient Omniprediction and Downstream Swap Regret for Non-Linear Losses , booktitle =

    Jiuyao Lu and Aaron Roth and Mirah Shi , editor =. Sample Efficient Omniprediction and Downstream Swap Regret for Non-Linear Losses , booktitle =. 2025 , url =

  33. [33]

    International conference on machine learning , pages=

    On calibration of modern neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

  34. [34]

    International Conference on Learning Representations , year=

    Top-label calibration and multiclass-to-binary reductions , author=. International Conference on Learning Representations , year=

  35. [35]

    Advances in neural information processing systems , volume=

    Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration , author=. Advances in neural information processing systems , volume=

  36. [36]

    The Thirty Seventh Annual Conference on Learning Theory , pages=

    On computationally efficient multi-class calibration , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

  37. [37]

    Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=

    Outcome indistinguishability , author=. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=

  38. [38]

    Multicalibration: Calibration for the (computationally-identifiable) masses , author=. Proc. International Conference on Machine Learning (ICML) , pages=. 2018 , organization=

  39. [39]

    The Annals of Statistics , volume=

    Learning models with uniform performance via distributionally robust optimization , author=. The Annals of Statistics , volume=. 2021 , publisher=

  40. [40]

    American Economic Review , volume=

    Robust control and model uncertainty , author=. American Economic Review , volume=. 2001 , publisher=

  41. [41]

    Breakthroughs in Statistics: Foundations and Basic Theory , pages=

    Statistical decision functions , author=. Breakthroughs in Statistics: Foundations and Basic Theory , pages=. 1950 , publisher=

  42. [42]

    2019 , eprint=

    Conformalized Quantile Regression , author=. 2019 , eprint=

  43. [43]

    2020 , eprint=

    Classification with Valid and Adaptive Coverage , author=. 2020 , eprint=

  44. [44]

    2022 , eprint=

    Uncertainty Sets for Image Classifiers using Conformal Prediction , author=. 2022 , eprint=

  45. [45]

    2025 , eprint=

    Conformal Risk Control , author=. 2025 , eprint=

  46. [46]

    2023 , eprint=

    Safe Planning in Dynamic Environments using Conformal Prediction , author=. 2023 , eprint=

  47. [47]

    2024 , eprint=

    Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions , author=. 2024 , eprint=

  48. [48]

    Lecture Notes , volume=

    Uncertain: Modern topics in uncertainty estimation , author=. Lecture Notes , volume=

  49. [49]

    2025 , eprint=

    Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents , author=. 2025 , eprint=

  50. [50]

    2024 , eprint=

    Calibrated Selective Classification , author=. 2024 , eprint=

  51. [51]

    2018 , eprint=

    Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer , author=. 2018 , eprint=

  52. [52]

    2021 , eprint=

    Consistent Estimators for Learning to Defer to an Expert , author=. 2021 , eprint=

  53. [53]

    Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , articleno =

    Bansal, Gagan and Wu, Tongshuang and Zhou, Joyce and Fok, Raymond and Nushi, Besmira and Kamar, Ece and Ribeiro, Marco Tulio and Weld, Daniel , title =. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , articleno =. 2021 , isbn =. doi:10.1145/3411764.3445717 , abstract =

  54. [54]

    Towards Human-AI Complementarity with Prediction Sets , url =

    De Toni, Giovanni and Okati, Nastaran and Thejaswi, Suhas and Straitouri, Eleni and Gomez-Rodriguez, Manuel , booktitle =. Towards Human-AI Complementarity with Prediction Sets , url =

  55. [55]

    Proceedings of the 40th International Conference on Machine Learning , pages =

    Improving Expert Predictions with Conformal Prediction , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

  56. [56]

    Controlling Counterfactual Harm in Decision Support Systems Based on Prediction Sets , url =

    Straitouri, Eleni and Thejaswi, Suhas and Rodriguez, Manuel Gomez , booktitle =. Controlling Counterfactual Harm in Decision Support Systems Based on Prediction Sets , url =

  57. [57]

    2021 , eprint=

    Adaptive Conformal Inference Under Distribution Shift , author=. 2021 , eprint=

  58. [58]

    2023 , eprint=

    Conformal PID Control for Time Series Prediction , author=. 2023 , eprint=

  59. [59]

    Journal of Econometrics , volume=

    Identification Problems and Decisions under Ambiguity , author=. Journal of Econometrics , volume=

  60. [60]

    Econometrica , volume=

    Statistical Treatment Rules for Heterogeneous Populations , author=. Econometrica , volume=

  61. [61]

    Econometrica , volume=

    Admissible Treatment Rules for a Risk-Averse Planner , author=. Econometrica , volume=

  62. [62]

    Annual Review of Economics , volume=

    Choosing Treatment Policies under Ambiguity , author=. Annual Review of Economics , volume=

  63. [63]

    Progress in Artificial Intelligence , volume=

    Event labeling combining ensemble detectors and background knowledge , author=. Progress in Artificial Intelligence , volume=. 2014 , publisher=

  64. [64]

    Statistics & Probability Letters , volume=

    Sparse spatial autoregressions , author=. Statistics & Probability Letters , volume=. 1997 , publisher=

  65. [65]

    The Annals of Probability , volume=

    Distribution function inequalities for martingales , author=. The Annals of Probability , volume=. 1973 , publisher=

  66. [66]

    High-Dimensional Prediction for Sequential Decision Making , author=. Proc. International Conference on Machine Learning (ICML) , year=

  67. [67]

    Supersimulators

    Supersimulators , author=. arXiv preprint arXiv:2509.17994 , year=

  68. [68]

    Breaking the

    Dagan, Yuval and Daskalakis, Constantinos and Fishelson, Maxwell and Golowich, Noah and Kleinberg, Robert and Okoroafor, Princewill , booktitle=. Breaking the

  69. [69]

    Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society , pages=

    Multiaccuracy: Black-box post-processing for fairness in classification , author=. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society , pages=

  70. [70]

    Advances in Neural Information Processing Systems , volume=

    Truthfulness of calibration measures , author=. Advances in Neural Information Processing Systems , volume=

  71. [71]

    The Thirty Eighth Annual Conference on Learning Theory , pages=

    Truthfulness of Decision-Theoretic Calibration Measures , author=. The Thirty Eighth Annual Conference on Learning Theory , pages=. 2025 , organization=

  72. [72]

    A Perfectly Truthful Calibration Measure

    A Perfectly Truthful Calibration Measure , author=. arXiv preprint arXiv:2508.13100 , year=

  73. [73]

    2019 , publisher=

    Probability: Theory and Examples , author=. 2019 , publisher=

  74. [74]

    2013 , howpublished=

    Brownian Motion and Stochastic Calculus , author=. 2013 , howpublished=

  75. [75]

    Conference on Learning Theory , pages=

    Low-degree multicalibration , author=. Conference on Learning Theory , pages=. 2022 , organization=

  76. [76]

    2016 , eprint =

    Denisov, Denis and Sakhanenko, Alexander and Wachtel, Vitali , title =. 2016 , eprint =

  77. [77]

    Electronic Journal of Probability , volume=

    The First Hitting Time of a Single Point for Random Walks , author=. Electronic Journal of Probability , volume=. 2011 , publisher=

  78. [78]

    Stronger calibration lower bounds via sidestepping , author=. Proc. Annual ACM Symposium on Theory of Computing (STOC) , pages=

  79. [79]

    Innovations in Theoretical Computer Science Conference (ITCS) , volume=

    Advancing Subgroup Fairness via Sleeping Experts , author=. Innovations in Theoretical Computer Science Conference (ITCS) , volume=

  80. [80]

    The Twelfth International Conference on Learning Representations , year=

    Oracle Efficient Algorithms for Groupwise Regret , author=. The Twelfth International Conference on Learning Representations , year=

Showing first 80 references.