Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

Carson Sobolewski; Navid Azizan; Young-Jin Park

arxiv: 2412.01782 · v4 · submitted 2024-12-02 · 💻 cs.CV · cs.AI

Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

Young-Jin Park , Carson Sobolewski , Navid Azizan This is my paper

Pith reviewed 2026-05-23 07:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords object detectionDETRcalibration erroruncertainty quantificationHungarian matchingtransformerpost-processing

0 comments

The pith

DETRs train one prediction per object to be well-calibrated while forcing the others to suppress their confidence scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

DETR object detectors output hundreds of predictions per image, far more than the number of objects. The paper establishes that the Hungarian matching loss creates a specialist division of labor: one prediction per object learns to produce accurate confidence, while the remaining predictions are pushed to output near-zero foreground probability. This division is optimal for the loss but leaves the calibrated predictions unidentifiable at inference time. As a result, any post-processing selection risks mixing predictions of different calibration quality, and standard metrics cannot evaluate the combined system. The authors introduce an object-centric calibration error to address this gap and propose a framework for image-level reliability prediction.

Core claim

DETRs employ an optimal specialist strategy: one prediction per object is trained to be well-calibrated, while the remaining predictions are trained to suppress their foreground confidence to near zero, even when maintaining accurate localization. This strategy emerges as the loss-minimizing solution to the Hungarian matching, fundamentally shaping DETRs' outputs. While selecting the well-calibrated predictions is ideal, they are unidentifiable at inference time. This means that any post-processing algorithm poses a risk of outputting a set of predictions with mixed calibration levels.

What carries the argument

The specialist strategy that arises as the loss-minimizing solution to the Hungarian matching in DETR training.

Load-bearing premise

The well-calibrated specialist predictions remain unidentifiable from the model's output at inference time.

What would settle it

Finding a post-processing algorithm that can reliably isolate only the well-calibrated predictions for every image would demonstrate that the unidentifiability assumption does not hold in practice.

Figures

Figures reproduced from arXiv: 2412.01782 by Carson Sobolewski, Navid Azizan, Young-Jin Park.

**Figure 1.** Figure 1: DETR generates hundreds of predictions for each image, resulting in multiple predictions per object, with at least one (i.e., [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A diagram of the DETR architecture. An input image is first processed [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualizations of the predictions generated by Cal-DETR. The optimal positive prediction (indexed by 0 and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Impact of confidence threshold selection on various performance and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 4.** Figure 4: A visualization of the difference in calibration between positive [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Impact of the scaling factor (λ) on image-level UQ performance of ContrastiveConf (OCE). Pearson correlation coefficient (PCC) using various scaling factors is reported. The optimal scaling factor lies within the range of 5.0 to 10.0, while this range generalizes well across out-of-distribution datasets. Furthermore, it shows the efficacy of ContrastiveConf over Conf+ (i.e., ContrastiveConf with λ = 0.0). … view at source ↗

**Figure 7.** Figure 7: Impact of parameter selection on OCE (y-axis inverted) and the Pearson correlation coefficient (PCC) between [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Exemplary visualization demonstrating the impact of parameter selection on the final subset of predictions in Cal-DETR for different post-processing [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: A visualization of the difference in calibration between positive and negative predictions on the [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Impact of confidence threshold selection on various performance metrics in UP-DETR, Deformable-DETR, Cal-DETR, and DINO on [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Impact of parameter selection on OCE (y-axis inverted) and the Pearson correlation coefficient (PCC) between [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

read the original abstract

DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly important for safety-critical applications, such as in autonomous vehicles. Addressing this concern, we provide empirical and theoretical evidence that predictions within the same image play distinct roles, resulting in varying reliability levels. Our analysis reveals that DETRs employ an optimal specialist strategy: one prediction per object is trained to be well-calibrated, while the remaining predictions are trained to suppress their foreground confidence to near zero, even when maintaining accurate localization. We show that this strategy emerges as the loss-minimizing solution to the Hungarian matching, fundamentally shaping DETRs' outputs. While selecting the well-calibrated predictions is ideal, they are unidentifiable at inference time. This means that any post-processing algorithm poses a risk of outputting a set of predictions with mixed calibration levels. Therefore, practical deployment necessitates a joint evaluation of both the model's calibration quality and the effectiveness of the post-processing algorithm. However, we demonstrate that existing metrics like average precision and expected calibration error are inadequate for this task. To address this issue, we further introduce Object-level Calibration Error (OCE): This object-centric design penalizes both retaining suppressed predictions and missed ground truth foreground objects, making OCE suitable for both evaluating models and identifying reliable prediction subsets. Finally, we present a post hoc uncertainty quantification framework that predicts per-image model accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DETRs learn a specialist calibration split via Hungarian matching, but the unidentifiability claim conflicts with the near-zero suppression on unmatched predictions.

read the letter

The paper observes that DETRs, via Hungarian matching, settle into a specialist pattern: one prediction per object becomes well-calibrated while the rest suppress foreground to near zero, even with accurate boxes. This is presented as the loss-minimizing outcome. They argue the good predictions cannot be identified at inference, so any post-processing risks mixing calibration levels, AP and ECE are inadequate, and they introduce OCE plus a post-hoc per-image accuracy predictor to handle the joint problem.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that DETRs adopt an optimal specialist strategy induced by the Hungarian matching loss: exactly one prediction per ground-truth object is trained to be well-calibrated while the remaining predictions suppress foreground confidence to near zero (even with accurate localization). It asserts that these well-calibrated predictions are unidentifiable at inference time, rendering standard metrics (AP, ECE) inadequate for joint assessment of calibration and post-processing; it therefore introduces the Object-level Calibration Error (OCE) and a post-hoc per-image uncertainty quantification framework.

Significance. If the specialist strategy and unidentifiability claims hold with supporting derivations and experiments, the work would offer a mechanistic explanation for DETR output structure and motivate an object-centric calibration metric suited to detection pipelines, with relevance to safety-critical applications.

major comments (2)

[Abstract] Abstract: The assertion that well-calibrated predictions are unidentifiable at inference time is placed in tension by the specialist strategy itself. If unmatched predictions are driven to near-zero foreground probability, a fixed confidence threshold or top-k selection would isolate the reliable subset without mixing calibration levels, undercutting the premise that identifiability is impossible and thereby weakening the necessity of the joint-evaluation argument and the OCE metric.
[Abstract] Abstract: The manuscript states that it provides 'empirical and theoretical evidence' for the specialist strategy and the inadequacy of AP/ECE, yet the visible abstract contains no methods, loss derivations, or experimental details that would allow verification of these claims; the support for the central load-bearing assertions therefore cannot be assessed from the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications drawn from the full paper and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that well-calibrated predictions are unidentifiable at inference time is placed in tension by the specialist strategy itself. If unmatched predictions are driven to near-zero foreground probability, a fixed confidence threshold or top-k selection would isolate the reliable subset without mixing calibration levels, undercutting the premise that identifiability is impossible and thereby weakening the necessity of the joint-evaluation argument and the OCE metric.

Authors: The specialist strategy does drive unmatched predictions toward near-zero foreground probability as the loss-minimizing outcome of Hungarian matching. However, this does not resolve identifiability at inference. Because matching occurs only during training against ground truth, no equivalent signal exists at test time to designate which of the (often hundreds of) predictions per object is the specialist. Empirical analysis in the manuscript shows that confidence distributions of specialist and suppressed predictions exhibit overlap due to optimization dynamics, initialization, and image-specific factors; consequently, any fixed threshold or top-k selection risks retaining suppressed predictions (with poor calibration) or discarding well-calibrated ones. This mixing is precisely why standard post-processing cannot be assumed to isolate reliable subsets, motivating the joint calibration-plus-post-processing evaluation and the object-centric OCE metric. We will add a clarifying paragraph in Section 3.2 and the discussion to make this distinction explicit. revision: partial
Referee: [Abstract] Abstract: The manuscript states that it provides 'empirical and theoretical evidence' for the specialist strategy and the inadequacy of AP/ECE, yet the visible abstract contains no methods, loss derivations, or experimental details that would allow verification of these claims; the support for the central load-bearing assertions therefore cannot be assessed from the provided text.

Authors: Abstracts are intentionally concise summaries and do not contain derivations or experimental details; the full manuscript supplies both. Section 3 derives the specialist strategy as the unique loss-minimizing assignment under the Hungarian bipartite matching objective, while Section 4 presents extensive experiments (including per-object calibration histograms and comparisons against AP/ECE) demonstrating that standard metrics fail to capture the mixed-calibration risk. The abstract's phrasing is therefore supported by the body of the paper. No revision to the abstract itself is required, though we can expand the contribution statement in the introduction if the editor prefers. revision: no

Circularity Check

0 steps flagged

No circularity; claims derive from loss analysis without reduction to inputs

full rationale

The paper states that the specialist strategy 'emerges as the loss-minimizing solution to the Hungarian matching' and that well-calibrated predictions are unidentifiable, leading to the need for OCE. No equations or steps are shown that reduce this claim to a fitted parameter, self-definition, or self-citation chain by construction. The derivation is presented as an analysis of standard DETR training and post-processing, remaining self-contained against external benchmarks like the Hungarian algorithm itself. No load-bearing self-citations or ansatzes are quoted that would force the result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the specialist strategy is described as emerging from an existing loss rather than from new postulates.

pith-pipeline@v0.9.0 · 5824 in / 1115 out tokens · 29115 ms · 2026-05-23T07:45:44.315433+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 3 internal anchors

[1]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014
[2]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ,

S. Ren, K. He, R. Girshick, and J. Sun, “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ,” IEEE Transactions on Pattern Analysis & Machine Intelligence , vol. 39, no. 06, pp. 1137–1149, Jun. 2017

work page 2017
[3]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 779– 788

work page 2016
[4]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 42, no. 2, pp. 386–397, 2020

work page 2020
[5]

Sparse r-cnn: An end-to-end framework for object detection,

P. Sun, R. Zhang, Y . Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, Z. Yuan, and P. Luo, “Sparse r-cnn: An end-to-end framework for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 15 650–15 664, 2023

work page 2023
[6]

Cascade r-cnn: High quality object detection and instance segmentation,

Z. Cai and N. Vasconcelos, “Cascade r-cnn: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483–1498, 2021

work page 2021
[7]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision . Springer, 2020, pp. 213– 229

work page 2020
[8]

Multi- variate confidence calibration for object detection,

F. Kuppers, J. Kronenberger, A. Shantia, and A. Haselhoff, “Multi- variate confidence calibration for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 326–327

work page 2020
[9]

Towards improving calibration in object detection under domain shift,

M. A. Munir, M. H. Khan, M. Sarfraz, and M. Ali, “Towards improving calibration in object detection under domain shift,” Advances in Neural Information Processing Systems , vol. 35, pp. 38 706–38 718, 2022

work page 2022
[10]

Bridging precision and confidence: A train-time loss for calibrating object detection,

M. A. Munir, M. H. Khan, S. Khan, and F. S. Khan, “Bridging precision and confidence: A train-time loss for calibrating object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 474–11 483

work page 2023
[11]

Multiclass confidence and localization calibration for object detection,

B. Pathiraja, M. Gunawardhana, and M. H. Khan, “Multiclass confidence and localization calibration for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 19 734–19 743

work page 2023
[12]

Cal-detr: calibrated detection transformer,

M. A. Munir, S. H. Khan, M. H. Khan, M. Ali, and F. Shahbaz Khan, “Cal-detr: calibrated detection transformer,” Advances in neural infor- mation processing systems , vol. 36, 2024

work page 2024
[13]

Domain adaptive object detection via balancing between self-training and adversarial learning,

M. A. Munir, M. H. Khan, M. S. Sarfraz, and M. Ali, “Domain adaptive object detection via balancing between self-training and adversarial learning,” IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 45, no. 12, pp. 14 353–14 365, 2023

work page 2023
[14]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” arXiv preprint arXiv:2203.03605 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Introduction to modern information retrieval,

G. Salton, “Introduction to modern information retrieval,” McGrawHill Book Co, 1983

work page 1983
[17]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,” International journal of computer vision , vol. 88, pp. 303–338, 2010

work page 2010
[18]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer, 2014, pp. 740–755

work page 2014
[19]

Towards building self-aware object detectors via reliable uncertainty quantification and calibration,

K. Oksuz, T. Joy, and P. K. Dokania, “Towards building self-aware object detectors via reliable uncertainty quantification and calibration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9263–9274

work page 2023
[20]

On calibration of object detectors: Pitfalls, evaluation and baselines,

S. Kuzucu, K. Oksuz, J. Sadeghi, and P. K. Dokania, “On calibration of object detectors: Pitfalls, evaluation and baselines,” in European Conference on Computer Vision . Springer, 2025, pp. 185–204

work page 2025
[21]

Localization recall precision (lrp): A new performance metric for object detection,

K. Oksuz, B. C. Cam, E. Akbas, and S. Kalkan, “Localization recall precision (lrp): A new performance metric for object detection,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 504–519

work page 2018
[22]

Out-of-distribution identification: Let detector tell which i am not sure,

R. Li, C. Zhang, H. Zhou, C. Shi, and Y . Luo, “Out-of-distribution identification: Let detector tell which i am not sure,” in European Conference on Computer Vision . Springer, 2022, pp. 638–654

work page 2022
[23]

V os: Learning what you don’t know by virtual outlier synthesis,

X. Du, Z. Wang, M. Cai, and Y . Li, “V os: Learning what you don’t know by virtual outlier synthesis,” arXiv preprint arXiv:2202.01197 , 2022

work page arXiv 2022
[24]

Siren: Shaping representations for detecting out-of-distribution objects,

X. Du, G. Gozum, Y . Ming, and Y . Li, “Siren: Shaping representations for detecting out-of-distribution objects,” Advances in Neural Informa- tion Processing Systems , vol. 35, pp. 20 434–20 449, 2022

work page 2022
[25]

Safe: Sensitivity-aware features for out-of-distribution object detection,

S. Wilson, T. Fischer, F. Dayoub, D. Miller, and N. S ¨underhauf, “Safe: Sensitivity-aware features for out-of-distribution object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 23 565–23 576

work page 2023
[26]

How certain is your transformer?

A. Shelmanov, E. Tsymbalov, D. Puzyrev, K. Fedyanin, A. Panchenko, and M. Panov, “How certain is your transformer?” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , 2021, pp. 1833–1840

work page 2021
[27]

Sketching curvature for efficient out-of-distribution detection for deep neural networks,

A. Sharma, N. Azizan, and M. Pavone, “Sketching curvature for efficient out-of-distribution detection for deep neural networks,” in Uncertainty in artificial intelligence . PMLR, 2021, pp. 1958–1967

work page 2021
[28]

Uncertainty estimation of transformer predictions for misclassification detection,

A. Vazhentsev, G. Kuzmin, A. Shelmanov, A. Tsvigun, E. Tsymbalov, K. Fedyanin, M. Panov, A. Panchenko, G. Gusev, M. Burtsev et al. , “Uncertainty estimation of transformer predictions for misclassification detection,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8237– 8252

work page 2022
[29]

Quantifying repre- sentation reliability in self-supervised learning models,

Y .-J. Park, H. Wang, S. Ardeshir, and N. Azizan, “Quantifying repre- sentation reliability in self-supervised learning models,” arXiv preprint arXiv:2306.00206, 2023

work page arXiv 2023
[30]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning,

Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning . PMLR, 2016, pp. 1050–1059

work page 2016
[31]

Generalized intersection over union: A metric and a loss for bounding box regression,

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , 2019, pp. 658–666

work page 2019
[32]

The hungarian method for the assignment problem,

H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955

work page 1955
[33]

Embedding reliability: On the predictability of downstream performance,

S. Ardeshir and N. Azizan, “Embedding reliability: On the predictability of downstream performance,” in NeurIPS ML Safety Workshop , 2022

work page 2022
[34]

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Y .-S. Chuang, Y . Xie, H. Luo, Y . Kim, J. Glass, and P. He, “Dola: Decoding by contrasting layers improves factuality in large language models,” arXiv preprint arXiv:2309.03883 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Up-detr: Unsupervised pre-training for object detection with transformers,

Z. Dai, B. Cai, Y . Lin, and J. Chen, “Up-detr: Unsupervised pre-training for object detection with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1601– 1610

work page 2021
[36]

One metric to measure them all: Localisation recall precision (lrp) for evaluating visual detection tasks,

K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, “One metric to measure them all: Localisation recall precision (lrp) for evaluating visual detection tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9446–9463, 2022. PREPRINT. 12 person: 0.99 skis: 0.69 skis: 0.12 person: 0.10 person: 0.13 skis: 0.12 (a) Thresholdin...

work page 2022

[1] [1]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014

[2] [2]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ,

S. Ren, K. He, R. Girshick, and J. Sun, “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ,” IEEE Transactions on Pattern Analysis & Machine Intelligence , vol. 39, no. 06, pp. 1137–1149, Jun. 2017

work page 2017

[3] [3]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 779– 788

work page 2016

[4] [4]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 42, no. 2, pp. 386–397, 2020

work page 2020

[5] [5]

Sparse r-cnn: An end-to-end framework for object detection,

P. Sun, R. Zhang, Y . Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, Z. Yuan, and P. Luo, “Sparse r-cnn: An end-to-end framework for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 15 650–15 664, 2023

work page 2023

[6] [6]

Cascade r-cnn: High quality object detection and instance segmentation,

Z. Cai and N. Vasconcelos, “Cascade r-cnn: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483–1498, 2021

work page 2021

[7] [7]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision . Springer, 2020, pp. 213– 229

work page 2020

[8] [8]

Multi- variate confidence calibration for object detection,

F. Kuppers, J. Kronenberger, A. Shantia, and A. Haselhoff, “Multi- variate confidence calibration for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 326–327

work page 2020

[9] [9]

Towards improving calibration in object detection under domain shift,

M. A. Munir, M. H. Khan, M. Sarfraz, and M. Ali, “Towards improving calibration in object detection under domain shift,” Advances in Neural Information Processing Systems , vol. 35, pp. 38 706–38 718, 2022

work page 2022

[10] [10]

Bridging precision and confidence: A train-time loss for calibrating object detection,

M. A. Munir, M. H. Khan, S. Khan, and F. S. Khan, “Bridging precision and confidence: A train-time loss for calibrating object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 474–11 483

work page 2023

[11] [11]

Multiclass confidence and localization calibration for object detection,

B. Pathiraja, M. Gunawardhana, and M. H. Khan, “Multiclass confidence and localization calibration for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 19 734–19 743

work page 2023

[12] [12]

Cal-detr: calibrated detection transformer,

M. A. Munir, S. H. Khan, M. H. Khan, M. Ali, and F. Shahbaz Khan, “Cal-detr: calibrated detection transformer,” Advances in neural infor- mation processing systems , vol. 36, 2024

work page 2024

[13] [13]

Domain adaptive object detection via balancing between self-training and adversarial learning,

M. A. Munir, M. H. Khan, M. S. Sarfraz, and M. Ali, “Domain adaptive object detection via balancing between self-training and adversarial learning,” IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 45, no. 12, pp. 14 353–14 365, 2023

work page 2023

[14] [14]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” arXiv preprint arXiv:2203.03605 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Introduction to modern information retrieval,

G. Salton, “Introduction to modern information retrieval,” McGrawHill Book Co, 1983

work page 1983

[17] [17]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,” International journal of computer vision , vol. 88, pp. 303–338, 2010

work page 2010

[18] [18]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer, 2014, pp. 740–755

work page 2014

[19] [19]

Towards building self-aware object detectors via reliable uncertainty quantification and calibration,

K. Oksuz, T. Joy, and P. K. Dokania, “Towards building self-aware object detectors via reliable uncertainty quantification and calibration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9263–9274

work page 2023

[20] [20]

On calibration of object detectors: Pitfalls, evaluation and baselines,

S. Kuzucu, K. Oksuz, J. Sadeghi, and P. K. Dokania, “On calibration of object detectors: Pitfalls, evaluation and baselines,” in European Conference on Computer Vision . Springer, 2025, pp. 185–204

work page 2025

[21] [21]

Localization recall precision (lrp): A new performance metric for object detection,

K. Oksuz, B. C. Cam, E. Akbas, and S. Kalkan, “Localization recall precision (lrp): A new performance metric for object detection,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 504–519

work page 2018

[22] [22]

Out-of-distribution identification: Let detector tell which i am not sure,

R. Li, C. Zhang, H. Zhou, C. Shi, and Y . Luo, “Out-of-distribution identification: Let detector tell which i am not sure,” in European Conference on Computer Vision . Springer, 2022, pp. 638–654

work page 2022

[23] [23]

V os: Learning what you don’t know by virtual outlier synthesis,

X. Du, Z. Wang, M. Cai, and Y . Li, “V os: Learning what you don’t know by virtual outlier synthesis,” arXiv preprint arXiv:2202.01197 , 2022

work page arXiv 2022

[24] [24]

Siren: Shaping representations for detecting out-of-distribution objects,

X. Du, G. Gozum, Y . Ming, and Y . Li, “Siren: Shaping representations for detecting out-of-distribution objects,” Advances in Neural Informa- tion Processing Systems , vol. 35, pp. 20 434–20 449, 2022

work page 2022

[25] [25]

Safe: Sensitivity-aware features for out-of-distribution object detection,

S. Wilson, T. Fischer, F. Dayoub, D. Miller, and N. S ¨underhauf, “Safe: Sensitivity-aware features for out-of-distribution object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 23 565–23 576

work page 2023

[26] [26]

How certain is your transformer?

A. Shelmanov, E. Tsymbalov, D. Puzyrev, K. Fedyanin, A. Panchenko, and M. Panov, “How certain is your transformer?” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , 2021, pp. 1833–1840

work page 2021

[27] [27]

Sketching curvature for efficient out-of-distribution detection for deep neural networks,

A. Sharma, N. Azizan, and M. Pavone, “Sketching curvature for efficient out-of-distribution detection for deep neural networks,” in Uncertainty in artificial intelligence . PMLR, 2021, pp. 1958–1967

work page 2021

[28] [28]

Uncertainty estimation of transformer predictions for misclassification detection,

A. Vazhentsev, G. Kuzmin, A. Shelmanov, A. Tsvigun, E. Tsymbalov, K. Fedyanin, M. Panov, A. Panchenko, G. Gusev, M. Burtsev et al. , “Uncertainty estimation of transformer predictions for misclassification detection,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8237– 8252

work page 2022

[29] [29]

Quantifying repre- sentation reliability in self-supervised learning models,

Y .-J. Park, H. Wang, S. Ardeshir, and N. Azizan, “Quantifying repre- sentation reliability in self-supervised learning models,” arXiv preprint arXiv:2306.00206, 2023

work page arXiv 2023

[30] [30]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning,

Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning . PMLR, 2016, pp. 1050–1059

work page 2016

[31] [31]

Generalized intersection over union: A metric and a loss for bounding box regression,

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , 2019, pp. 658–666

work page 2019

[32] [32]

The hungarian method for the assignment problem,

H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955

work page 1955

[33] [33]

Embedding reliability: On the predictability of downstream performance,

S. Ardeshir and N. Azizan, “Embedding reliability: On the predictability of downstream performance,” in NeurIPS ML Safety Workshop , 2022

work page 2022

[34] [34]

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Y .-S. Chuang, Y . Xie, H. Luo, Y . Kim, J. Glass, and P. He, “Dola: Decoding by contrasting layers improves factuality in large language models,” arXiv preprint arXiv:2309.03883 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Up-detr: Unsupervised pre-training for object detection with transformers,

Z. Dai, B. Cai, Y . Lin, and J. Chen, “Up-detr: Unsupervised pre-training for object detection with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1601– 1610

work page 2021

[36] [36]

One metric to measure them all: Localisation recall precision (lrp) for evaluating visual detection tasks,

K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, “One metric to measure them all: Localisation recall precision (lrp) for evaluating visual detection tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9446–9463, 2022. PREPRINT. 12 person: 0.99 skis: 0.69 skis: 0.12 person: 0.10 person: 0.13 skis: 0.12 (a) Thresholdin...

work page 2022