Segment Anything with Robust Uncertainty-Accuracy Correlation
Pith reviewed 2026-05-21 08:08 UTC · model grok-4.3
The pith
RUAC adds an uncertainty head to SAM and trains it against joint style-deformation attacks so uncertainty reliably flags pixel errors even after domain shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RUAC adds a lightweight uncertainty head to the Segment Anything Model, trains it with a collaborative style-deformation attack that jointly perturbs texture and geometry, and applies Uncertainty-Accuracy Alignment to ensure uncertainty consistently highlights erroneous pixels even under adversarial perturbations, resulting in improved segmentation quality and stronger uncertainty-accuracy correlation across 23 zero-shot domains.
What carries the argument
The collaborative style-deformation attack that jointly perturbs texture and geometry, together with Uncertainty-Accuracy Alignment that enforces consistent correlation between uncertainty values and segmentation errors.
If this is right
- Segmentation quality rises in zero-shot domains that differ in both appearance and shape from the training data.
- Uncertainty maps more accurately mark pixels near erroneous boundaries rather than giving uniform mask-level scores.
- The correlation between uncertainty and accuracy remains high even when inputs undergo simultaneous texture and geometry changes.
- Downstream tasks can use the uncertainty values to reject or re-query unreliable regions with greater confidence.
Where Pith is reading between the lines
- The same uncertainty-head approach could be attached to other large segmentation or detection models that currently rely on single scalar scores.
- Real deployment in medical imaging or robotics might gain from using the improved uncertainty to trigger human review only on high-uncertainty regions.
- Further tests with deformation patterns absent from the current attack would clarify how broadly the joint-training idea generalizes.
Load-bearing premise
The collaborative style-deformation attack that jointly perturbs texture and geometry sufficiently models real out-of-domain appearance shifts and non-rigid deformations.
What would settle it
Measuring the uncertainty-accuracy correlation on a fresh collection of real images that contain appearance and deformation shifts not generated by the style-deformation attack; if the correlation drops to the level of the original SAM, the central claim does not hold.
Figures
read the original abstract
Despite strong zero-shot performance, SAM is unreliable under domain shift due to Mask-level Confidence Confusion (MCC), where a single IoU-based mask score fails to reflect pixel-wise reliability near boundaries. Motivated by the contrast between texture-biased shortcuts in neural networks and shape-centric processing in human vision, we model out-of-domain variation as appearance shifts and non-rigid deformations that jointly stress calibration. We propose Segment Anything with Robust Uncertainty-Accuracy Correlation (RUAC) for robust pixel-wise uncertainty estimation under appearance and deformation shifts. RUAC adds a lightweight uncertainty head, trains it with a collaborative style-deformation attack that jointly perturbs texture and geometry, and applies Uncertainty-Accuracy Alignment to ensure uncertainty consistently highlights erroneous pixels even under adversarial perturbations. Across 23 zero-shot domains, RUAC improves segmentation quality and yields more faithful uncertainty with stronger uncertainty-accuracy correlation. Project page: https://hongyouzhou.github.io/ruac/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes RUAC as an extension to SAM that adds a lightweight uncertainty head. This head is trained via a collaborative style-deformation attack that jointly perturbs texture and geometry, together with an Uncertainty-Accuracy Alignment loss intended to ensure uncertainty maps consistently flag erroneous pixels under adversarial perturbations. The central claim is that RUAC improves segmentation quality and produces more faithful pixel-wise uncertainty estimates exhibiting stronger uncertainty-accuracy correlation across 23 zero-shot domains, thereby mitigating Mask-level Confidence Confusion (MCC) under appearance shifts and non-rigid deformations.
Significance. If the reported gains in correlation and segmentation quality hold under scrutiny, the work offers a practical route to more reliable uncertainty quantification for SAM in out-of-domain settings. The motivation contrasting texture-biased network shortcuts with shape-centric human vision is well-grounded. Credit is due for the public project page and the explicit focus on zero-shot evaluation across a broad set of domains.
major comments (2)
- [§ Experiments] § Experiments (results on 23 domains): the central claim of improved uncertainty-accuracy correlation and segmentation quality across all 23 zero-shot domains requires quantitative support (e.g., correlation coefficients, IoU deltas, error bars, and ablation tables). If these metrics are present only in supplementary material or figures without clear statistical testing against baselines, the generalization statement remains under-supported.
- [Method] Method section describing the collaborative style-deformation attack: the training distribution is defined by joint texture and geometry perturbations. The manuscript does not report any overlap metric, qualitative comparison, or ablation that verifies this synthetic distribution sufficiently covers the actual appearance and deformation shifts present in the 23 test domains (e.g., local elastic warps or sensor-specific artifacts). This assumption is load-bearing for the claim that the alignment loss yields generally stronger correlation rather than attack-specific behavior.
minor comments (2)
- [Abstract] Abstract: the statement of improvements across 23 domains would be strengthened by a single sentence citing the key quantitative gains (e.g., average correlation increase or IoU lift) rather than leaving all numbers to the body.
- [Introduction] Notation: ensure MCC and RUAC are defined on first use and used consistently; the current abbreviation list appears incomplete for readers unfamiliar with the prior SAM literature.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. We believe the revisions will strengthen the presentation of our results and methodological justifications.
read point-by-point responses
-
Referee: [§ Experiments] § Experiments (results on 23 domains): the central claim of improved uncertainty-accuracy correlation and segmentation quality across all 23 zero-shot domains requires quantitative support (e.g., correlation coefficients, IoU deltas, error bars, and ablation tables). If these metrics are present only in supplementary material or figures without clear statistical testing against baselines, the generalization statement remains under-supported.
Authors: We thank the referee for this observation. While the main results on correlation and segmentation quality across the 23 domains are presented in Section 4 with supporting figures, we acknowledge that more explicit statistical testing would enhance the presentation. In the revised manuscript, we will include a dedicated table with correlation coefficients, IoU deltas, standard errors, and results of statistical significance tests against baselines. We will also bring key ablation results from the supplementary material into the main text for better visibility. revision: yes
-
Referee: [Method] Method section describing the collaborative style-deformation attack: the training distribution is defined by joint texture and geometry perturbations. The manuscript does not report any overlap metric, qualitative comparison, or ablation that verifies this synthetic distribution sufficiently covers the actual appearance and deformation shifts present in the 23 test domains (e.g., local elastic warps or sensor-specific artifacts). This assumption is load-bearing for the claim that the alignment loss yields generally stronger correlation rather than attack-specific behavior.
Authors: We agree that additional verification would strengthen the claims. The attack is motivated by the need to cover joint appearance and deformation shifts observed in the test domains. In the revised manuscript, we will add qualitative comparisons of the perturbations with examples from the 23 domains, as well as an ablation analyzing the coverage through metrics such as distribution similarity and deformation statistics. revision: yes
Circularity Check
No circularity: new components and empirical validation
full rationale
The paper introduces a lightweight uncertainty head, a collaborative style-deformation attack for training, and an Uncertainty-Accuracy Alignment loss. These are presented as novel additions to SAM rather than quantities derived from previously fitted parameters or self-referential equations. Claims of improved segmentation quality and stronger uncertainty-accuracy correlation are supported by empirical results across 23 zero-shot domains, not by construction from the training distribution itself. No self-citations, uniqueness theorems, or ansatzes that reduce the central result to its inputs appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Out-of-domain variation can be modeled as appearance shifts and non-rigid deformations that jointly stress calibration
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Segment Anything with Robust Uncertainty-Accuracy Correlation (RUAC) ... trains it with a collaborative style-deformation attack that jointly perturbs texture and geometry, and applies Uncertainty-Accuracy Alignment
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We model out-of-domain variation as appearance shifts and non-rigid deformations ... bio-inspired perturbations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
URL https://arxiv.org/abs/2106.0 2740. Bengio, Y ., Léonard, N., and Courville, A. Estimating or propagating gradients through stochastic neurons for con- ditional computation.arXiv preprint arXiv:1308.3432, 2013. Brooks, T., Holynski, A., and Efros, A. A. InstructPix2Pix: Learning to follow image editing instructions. InProceed- ings of the IEEE/CVF Conf...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s41592-019-0612-7 2013
-
[2]
Minervini, M., Fischbach, A., Scharr, H., and Tsaftaris, S
doi: 10.1145/307400.307435. Minervini, M., Fischbach, A., Scharr, H., and Tsaftaris, S. A. Finely-grained annotated datasets for image-based plant phenotyping.Pattern Recognition Letters, 81:80– 89, 2016. doi: 10.1016/j.patrec.2015.10.013. Mukhoti, J. and Gal, Y . Evaluating Bayesian deep learn- ing methods for semantic segmentation.arXiv preprint arXiv:1...
-
[3]
Walk in the cloud: Learning curves for point clouds shape analysis, pp
doi: 10.1109/ICCV48922.2021.01073. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with la- tent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 10684–10695, 2022. URL https: //openaccess.thecvf.com/content/CVPR 2022/html/Rombach_High-Resol...
-
[4]
mask A is confident but mask B is uncertain
URL https://arxiv.org/abs/2109.1 5068. Ye, K., Chen, T., Wei, H., and Zhan, L. Uncertainty regu- larized evidential regression. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 38, pp. 16460–16468, 2024. doi: 10.1609/aaai.v38i15.29583. URL https://ojs.aaai.org/index.php/A AAI/article/view/29583. Yogamani, S., Hughes, C., Horg...
-
[5]
Rank preservation: The Pearson correlation between simplified and full variance exceeds0.99across diverse inputs. Since calibration metrics (AUROC, PAvPU) de- pend only on therankingof uncertainty values, not their absolute magnitudes, the approximation preserves all task-relevant information
-
[6]
Constant scaling: The full variance is approximately 2–3×the simplified variance. This constant factor is absorbed by the subsequent MacKay approxima- tion, which maps variance to probability through κ= 1/ √ 1 +πv/8. A constant scaling of v merely shifts the sigmoid operating point uniformly, without affecting relative ordering
-
[7]
This variancev is then used in MacKay’s probit approxima- tion to compute analytic uncertainty
Analytic-sampling agreement: Comparing our analytic uncertainty against 100-sample Monte Carlo ground truth yields Pearson correlation >0.90 , confirming that the end-to-end pipeline (variance approximation + MacKay) produces well-ranked uncertainty estimates. This variancev is then used in MacKay’s probit approxima- tion to compute analytic uncertainty. ...
-
[8]
Spatial overlap (IoU):Objects with overlapping masks should have coordinated perturbations: wIoU ij =IoU(M i,Mj)·⊮[IoU(Mi,Mj)>τIoU]. (S14)
-
[9]
Geometric proximity:Nearby objects likely share light- ing conditions: wdist ij = max ( 0,1−dboundary(Mi,Mj) dmax ) ·⊮[dij <τd], (S15) where dboundary is the minimum boundary distance be- tween masks
-
[10]
Semantic similarity:Visually similar objects should receive similar perturbations: wsem ij = cos(fi,fj)·⊮[cos(fi,fj)>τsim].(S16) The final edge weight combines all criteria:wij =w IoU ij + wdist ij +w sem ij . Self-loops with unit weight are added to pre- serve node identity. Node Feature Initialization.Each node is initialized with the concatenation of s...
-
[11]
Uniform uncertainty collapse: the model could mini- mize the loss by raisingu everywhere, ignoring whether eis actually large
-
[12]
The dual stop-gradient blocks both shortcuts
Attackeru-shortcut: the attacker could maximize the loss by manipulatingu via BNDL rather than producing genuinely hard inputs. The dual stop-gradient blocks both shortcuts. sg[e] in the u-channel terms makesu-updates conditional on the actual error: u is only pushed up wheree is observed to be high, preventing uniform collapse. sg[u] in thee-channel term...
work page 2020
-
[13]
demonstrate effective generalization, not memorization of source-domain GT patterns. D. Dataset Details Source domain:MOSE (Ding et al., 2023) contains 2,149 video clips with 5,200 objects across 36 categories. We usefirst frames onlyfor both training and evaluation, oper- ating SAM2 in single-frame mode without memory prop- agation. This isolates image s...
work page 2023
-
[14]
Standard Training (Sharp Minima):Standard ERM often converges to sharp minima where the Hessian Hw has large eigenvalues (high curvature). To minimize the expected NLL term Ew[NLL]≈NLL(µ) +1 2Tr(H(Σ)) , the optimizer is forced to reduce Σ significantly. This re- sults in small posterior variance, low logit variance, and consequently overconfident predicti...
-
[15]
AUE Training (Flat Minima):Optimizing against perturbationsδ∗effectively minimizes the worst-case loss within a neighborhood, which necessitates finding a solu- tionµthat is robust to local changes. This implies finding a region where the loss surface is flat (low curvature, small Hw). Because Hw is smaller, the penalty term Tr(H(Σ)) is reduced, allowing ...
-
[16]
At matched accuracy: When comparing predictions at the same coverage level (i.e., same effective accuracy), RUAC’s uncertainty ranking is more reliable
-
[17]
Actionable uncertainty: The improvement in AURC directly translates to better performance in downstream tasks that use uncertainty for rejection, active learning, or human-in-the-loop correction
-
[18]
Consistent across domains: The benefit is observed on the majority of OOD datasets, not just those where RUAC has higher accuracy. J. Statistical Significance Analysis To rigorously evaluate calibration improvements, we con- duct Wilcoxon signed-rank tests across all 23 OOD datasets. This non-parametric test is appropriate for paired compar- 9 Segment Any...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.