Proceedings of the AAAI conference on artificial intelligence , volume=

Obtaining well calibrated probabilities using bayesian binning , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Minimax Rate of Second-Order Calibration

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

The minimax rate of estimating second-order calibration error is Õ(1/√n) with a matching Ω(1/√n) lower bound, enabled by analyticity from the sech kernel and yielding the first finite-sample guarantee for second-order Platt scaling.

Risk-Controlled Post-Processing of Decision Policies

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i.i.d. settings and exact risk control under exchangeability.

Reading Calibrated Uncertainty from Language Model Trajectories

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.

When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

cs.CL · 2026-05-13 · conditional · novelty 6.0

Conflicting biomedical evidence triggers order-dependent prediction flips in RAG LLMs, and a new abstention score combining confidence with conflict detection raises selective accuracy by 7-33 points in the hardest conditions.

Calibrating Model-Based Evaluation Metrics for Summarization

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

citing papers explorer

Showing 5 of 5 citing papers.

The Minimax Rate of Second-Order Calibration cs.LG · 2026-05-08 · unverdicted · none · ref 18
The minimax rate of estimating second-order calibration error is Õ(1/√n) with a matching Ω(1/√n) lower bound, enabled by analyticity from the sech kernel and yielding the first finite-sample guarantee for second-order Platt scaling.
Risk-Controlled Post-Processing of Decision Policies stat.ML · 2026-05-07 · unverdicted · none · ref 237
Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i.i.d. settings and exact risk control under exchangeability.
Reading Calibrated Uncertainty from Language Model Trajectories cs.LG · 2026-05-19 · unverdicted · none · ref 10
Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.
When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering cs.CL · 2026-05-13 · conditional · none · ref 7
Conflicting biomedical evidence triggers order-dependent prediction flips in RAG LLMs, and a new abstention score combining confidence with conflict detection raises selective accuracy by 7-33 points in the hardest conditions.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 144
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

Proceedings of the AAAI conference on artificial intelligence , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer