arxiv: 2605.08191 · v1 · submitted 2026-05-05 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing

Maria Stoica , Abdelrahman Hekal , Alessio Lomuscio

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords out-of-distribution detectionadversarial robustnessmedian smoothingscore instabilitypost-hoc detectorcomputer visionsymmetric defence

0 comments

The pith

Median smoothing on OOD scores lets the same noise samples quantify local instability, producing a detector robust to both minimizing and maximizing attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper starts from any baseline OOD detector and wraps its scores with median smoothing to balance clean and adversarial performance. The noisy samples created during that smoothing are then reused to compute a simple instability measure around each input. OOD inputs turn out to be markedly less stable than in-distribution inputs under the same small perturbations. ROSS feeds this instability signal back into the decision rule, yielding a post-hoc detector that resists attacks trying to either suppress or inflate the OOD score. Experiments on CIFAR-10, CIFAR-100 and ImageNet show gains of up to 40 AUROC points over prior robust detectors under symmetric adversarial conditions.

Core claim

The central claim is that OOD samples exhibit higher local instability under perturbation than ID samples, and that the noisy samples already generated by median smoothing of a base OOD score can be repurposed to measure this instability reliably, resulting in the ROSS detector that achieves symmetric robustness against both score-minimising and score-maximising adversarial attacks.

What carries the argument

The instability metric obtained by repurposing the perturbation samples from median smoothing, which quantifies how much the base OOD score fluctuates in a local neighbourhood and is observed to be systematically larger for OOD inputs.

If this is right

ROSS maintains competitive clean AUROC while delivering large gains under both attack directions.
The method is post-hoc and can be applied on top of any existing OOD scoring function.
Symmetric robustness holds across CIFAR-10, CIFAR-100 and ImageNet-scale experiments.
The same perturbation budget used for smoothing supplies the instability signal at no extra cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The instability signal may prove useful for other safety tasks that must distinguish natural from anomalous inputs under perturbation.
Because the defence reuses existing smoothing samples, it could be combined with other certified or empirical smoothing schemes without additional queries.
If instability remains a reliable separator even under stronger adaptive attacks, it suggests that distribution shift and perturbation sensitivity are more tightly coupled than previously exploited.

Load-bearing premise

The assumption that OOD samples are reliably more unstable than ID samples under small perturbations and that the noise samples from median smoothing capture this difference without introducing fresh vulnerabilities.

What would settle it

Finding a set of OOD images whose base scores remain more stable under the median-smoothing perturbations than ID images from the same dataset, or constructing an attack that defeats ROSS while leaving the instability signal intact.

Figures

Figures reproduced from arXiv: 2605.08191 by Abdelrahman Hekal, Alessio Lomuscio, Maria Stoica.

**Figure 2.** Figure 2: Distributions of Median, MAD, and ROSS scores for ID (CIFAR-10) and various OOD datasets (e.g., SVHN, Texture). Columns correspond to different OOD datasets; rows show Median Smoothed Score (top), MAD (middle, negated so ID is larger), and ROSS (bottom). Blue: ID, orange: OOD. ROSS yields the best ID–OOD separation. aging median smoothing and the instability of the model’s score landscape. 4.1. Perturbatio… view at source ↗

**Figure 3.** Figure 3: Trade-off between clean and adversarial (PGD-Max) performance. OOD detection AUROC on CIFAR-10 under attacks of varying strength (ϵ). Each line shows a different ROSS noise level (σnoise), revealing the balance between clean accuracy (ϵ = 0) and robustness. Together, these components yield a scoring function that adaptively rewards both high confidence and local score stability, two key hallmarks of in-d… view at source ↗

read the original abstract

Reliable out-of-distribution (OOD) detection is a critical requirement for the safe deployment of machine learning systems. Despite recent progress, state-of-the-art OOD detectors are highly susceptible to adversarial attacks, which undermines their trustworthiness in automated systems. To address this vulnerability, we apply median smoothing to baseline OOD detection scores, balancing clean and adversarial accuracies. Our key insight is that the noisy samples generated for median smoothing can be repurposed to quantify the local instability of the base score. We observe that OOD samples exhibit higher instability under perturbation. Based on this, we propose ROSS, a novel and robust post-hoc OOD detector that leverages the instability of baseline scores to further distinguish between in-distribution (ID) and OOD samples. ROSS achieves symmetric robustness, performing strongly against both score-minimising and score-maximising attacks, unlike prior work. This symmetric defence leads to state-of-the-art robustness, outperforming prior methods by up to 40 AUROC points. We demonstrate ROSS's effectiveness on extensive experiments across CIFAR-10, CIFAR-100, and ImageNet. Code is available at: https://github.com/Abdu-Hekal/ROSS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ROSS reuses median smoothing noise as an instability signal to get symmetric attack robustness on OOD scores, but the gains look vulnerable to adaptive attacks on the combined metric.

read the letter

The paper's core move is to take a baseline OOD score, apply median smoothing for robustness, and then treat the spread across the noisy samples as an extra instability feature. They report that OOD inputs produce more unstable scores under the same perturbations, so adding that term improves separation even when an attacker tries to push scores up or down. This produces the claimed symmetric defense and the large AUROC lifts on CIFAR-10, CIFAR-100, and ImageNet.

Referee Report

3 major / 2 minor

Summary. The paper proposes ROSS, a post-hoc OOD detector that applies median smoothing to baseline OOD scores for balancing clean and adversarial performance, then repurposes the generated noisy samples to quantify local instability of the base score. It observes that OOD samples exhibit higher instability under perturbation than ID samples and combines this with the smoothed score to achieve symmetric robustness against both score-minimizing and score-maximizing attacks, reporting up to 40 AUROC gains over prior methods on CIFAR-10, CIFAR-100, and ImageNet.

Significance. If the central empirical claims hold under rigorous adaptive evaluation, the work would meaningfully advance reliable OOD detection for safety-critical systems by addressing the documented vulnerability of existing detectors to adversarial perturbations. The approach is a lightweight post-hoc construction that reuses computation already performed for smoothing, and the public code release supports reproducibility.

major comments (3)

[abstract, §3, experiments] The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.
[experiments] Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.
[§3] The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.

minor comments (2)

[§3] Notation for the instability metric and its combination with the smoothed score should be defined more explicitly with equations to improve clarity.
[figures/tables] Figure captions and table headers could more clearly distinguish clean vs. adversarial settings and report exact attack strengths used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us identify areas for improvement in the manuscript. We address each major comment below and have revised the paper accordingly to strengthen the empirical evidence and clarity of our claims.

read point-by-point responses

Referee: The central claim of symmetric robustness and large AUROC gains rests on the assumption that OOD samples reliably show higher local instability than ID samples when quantified from the same median-smoothing perturbations (abstract and §3). No adaptive attack evaluation targeting the full ROSS metric (smoothed score + instability term) is described; an adversary jointly optimizing against both components could potentially suppress the distinction, undermining the reported gains.

Authors: We appreciate this observation on the evaluation scope. Our experiments demonstrate that OOD samples exhibit higher instability under the median-smoothing perturbations used for the smoothed score, leading to symmetric robustness against both score-minimizing and score-maximizing attacks. However, we agree that joint adaptive attacks on the combined ROSS metric would provide stronger validation. In the revised manuscript, we have added such adaptive attack evaluations (using PGD-style optimization targeting both terms simultaneously) on CIFAR-10/100 and ImageNet, confirming that the performance gains hold with only minor degradation. revision: yes
Referee: Experimental details on attack implementations, baseline choices, statistical testing, and ablation of the instability component are insufficient to support the claimed improvements (abstract reports extensive experiments but lacks full attack protocols and variance reporting). This weakens the evidence for SOTA performance across CIFAR-10/100 and ImageNet.

Authors: We acknowledge the need for greater detail to support reproducibility and the SOTA claims. The revised manuscript now includes: (i) full attack protocols with hyperparameters, step counts, and perturbation budgets; (ii) rationale for baseline selections with additional comparisons; (iii) statistical testing including mean AUROC with standard deviations over 5 random seeds; and (iv) a dedicated ablation study isolating the instability term's contribution. These additions are placed in the experiments section and supplementary material. revision: yes
Referee: The paper does not provide a formal analysis or bound showing that the instability term is non-exploitable or that the synergistic combination preserves the observed separation under worst-case perturbations; the construction remains an empirical observation without reduction to a parameter-free or provably robust quantity.

Authors: We agree that the instability term is an empirical observation rather than a formally bounded quantity. The manuscript does not claim a parameter-free or provably robust construction; ROSS is presented as a practical, lightweight post-hoc method. In the revision, we have expanded §3 to discuss the empirical nature of the separation, added sensitivity analysis under varying perturbation strengths, and clarified limitations regarding worst-case exploitability. A rigorous theoretical bound remains an open direction for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical post-hoc construction with independent experimental validation.

full rationale

The paper's core construction applies median smoothing to baseline OOD scores and repurposes the generated noisy samples to measure local instability, observing (rather than deriving) that OOD samples exhibit higher instability. ROSS is then defined directly from this combined metric. No equations reduce the claimed AUROC gains or symmetric robustness to a fitted parameter, self-referential definition, or self-citation chain. The robustness results are presented as empirical outcomes across CIFAR-10/100 and ImageNet, not as predictions forced by the input assumptions. This matches the default expectation of a non-circular empirical method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical observation that OOD samples show measurably higher score instability under the smoothing perturbations; no free parameters, axioms or invented entities are introduced beyond standard ML training and evaluation practices.

pith-pipeline@v0.9.0 · 5515 in / 1064 out tokens · 36664 ms · 2026-05-12T01:13:46.933924+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply median smoothing to baseline OOD detection scores... repurposed to quantify the local instability of the base score... σ_med (MAD)... S_ROSS = min(S95, Smed) + Δscore · (1 + λ / σmed)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ROSS achieves symmetric robustness... outperforming prior methods by up to 40 AUROC points

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Chris- tiano, John Schulman, and Dan Man ´e. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016. 1

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InProceedings of the In- ternational Conference on Machine Learning (ICML), 2018. 2

work page 2018
[3]

Your out-of-distribution de- tection method is not robust! InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2022

Mohammad Azizmalayeri, Arshia Soltani Moakar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, and Mohammad Hossein Rohban. Your out-of-distribution de- tection method is not robust! InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2022. 1, 3

work page 2022
[4]

ATOM: Robustifying out-of-distribution detection using outlier mining

Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. ATOM: Robustifying out-of-distribution detection using outlier mining. InMachine Learning and Knowledge Dis- covery in Databases. Research Track. Springer International Publishing, 2021. 3, 6

work page 2021
[5]

Robust out-of-distribution detection for neural net- works

Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. Robust out-of-distribution detection for neural net- works. InThe AAAI-22 Workshop on Adversarial Machine Learning and Beyond, 2021. 3

work page 2021
[6]

Yeh, Shaoshuai Mou, and Yan Gu

Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, and Yan Gu. Leveraging perturbation robustness to enhance out-of- distribution detection. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2025. 2, 3, 4, 6, 8, 1

work page 2025
[7]

De- tection as regression: Certified object detection with median smoothing

Ping-yeh Chiang, Michael Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, and Tom Goldstein. De- tection as regression: Certified object detection with median smoothing. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2020. 5

work page 2020
[8]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2014. 5

work page 2014
[9]

Zico Kolter

Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), 2019. 5, 2

work page 2019
[10]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2009. 2, 5

work page 2009
[11]

Extremely simple activation shaping for out- of-distribution detection

Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out- of-distribution detection. InProceedings of the International Conference on Learning Representations (ICLR), 2023. 3

work page 2023
[12]

On the limitations of stochastic pre-processing de- fenses

Yue Gao, Ilia Shumailov, Kassem Fawaz, and Nicolas Pa- pernot. On the limitations of stochastic pre-processing de- fenses. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2022. 2

work page 2022
[13]

MIT Press, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. 3

work page 2016
[14]

The influence curve and its role in robust estimation.Journal of the American Statistical Association,

Frank R Hampel. The influence curve and its role in robust estimation.Journal of the American Statistical Association,

work page
[15]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR). IEEE, 2016. 6

work page 2016
[16]

A baseline for detecting misclassified and out-of-distribution examples in neural net- works

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2017. 1, 2

work page 2017
[17]

Evalu- ating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? InProceedings of the Inter- national Conference on Machine Learning (ICML)

Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Z¨ugner, Sandhya Giri, and Stephan G ¨unnemann. Evalu- ating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? InProceedings of the Inter- national Conference on Machine Learning (ICML). PMLR,

work page
[18]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 2, 5

work page 2009
[19]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Yann Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 5

work page 2015
[20]

LeCun, C

Y . LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits, 1998. 5

work page 1998
[21]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2018. 3

work page 2018
[22]

Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 2, 3, 8

work page 2018
[23]

Kochenderfer

Changliu Liu, Tomer Arnon, Christopher Lazarus, Christo- pher Strong, Clark Barrett, and Mykel J. Kochenderfer. Al- gorithms for verifying deep neural networks.F oundations and Trends in Optimization, 2021. 1

work page 2021
[24]

Fast decision boundary based out- of-distribution detector

Litian Liu and Yao Qin. Fast decision boundary based out- of-distribution detector. InProceedings of the International Conference on Machine Learning (ICML), 2024. 3

work page 2024
[25]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 2

work page 2020
[26]

GEN: Pushing the limits of softmax-based out-of-distribution de- tection

Xixi Liu, Yaroslava Lochman, and Christopher Zach. GEN: Pushing the limits of softmax-based out-of-distribution de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2023. 2, 4, 6

work page 2023
[27]

Deciphering the definition of adversarial robust- ness for post-hoc ood detectors

Peter Lorenz, Mario Ruben Fernandez, Jens M ¨uller, and Ull- rich Koethe. Deciphering the definition of adversarial robust- ness for post-hoc ood detectors. InICML 2024 Next Gener- ation of AI Safety Workshop, 2024. 1, 3

work page 2024
[28]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InProceedings of the International Conference on Learning Representations (ICLR), 2018. 6

work page 2018
[29]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Work- shop on Deep Learning and Unsupervised Feature Learning,

work page
[30]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InProceedings of the Computer Vi- sion and Pattern Recognition Conference (CVPR), 2015. 2

work page 2015
[31]

Provably robust deep learning via adversarially trained smoothed clas- sifiers

Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and S ´ebastien Bubeck. Provably robust deep learning via adversarially trained smoothed clas- sifiers. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2019. 2

work page 2019
[32]

Out-of-distribution detection using counterfactual distance

Maria Stoica, Francesco Leofante, and Alessio Lomuscio. Out-of-distribution detection using counterfactual distance. arXiv preprint arXiv:2508.10148, 2025. 3

work page arXiv 2025
[33]

ReAct: Out-of- distribution detection with rectified activations

Yiyou Sun, Chuan Guo, and Yixuan Li. ReAct: Out-of- distribution detection with rectified activations. InAdvances in Neural Information Processing Systems (NeurIPS). Cur- ran Associates, Inc., 2021. 3

work page 2021
[34]

Out-of- distribution detection with deep nearest neighbors

Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of- distribution detection with deep nearest neighbors. InPro- ceedings of the International Conference on Machine Learn- ing (ICML). PMLR, 2022. 3

work page 2022
[35]

OpenOOD: Benchmarking generalized out-of-distribution detection.Advances in Neu- ral Information Processing Systems (NeurIPS), 2022

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. OpenOOD: Benchmarking generalized out-of-distribution detection.Advances in Neu- ral Information Processing Systems (NeurIPS), 2022. 3

work page 2022
[36]

Openood v1

Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, et al. OpenOOD v1.5: Enhanced benchmark for out-of-distribution detection.arXiv preprint arXiv:2306.09301, 2024. 5, 6

work page arXiv 2024
[37]

Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018. 5 A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing Supplementary Material The appendix is organised as fol...

work page 2018