Statistical Test for Diffusion-Based Anomaly Localization via Selective Inference
Pith reviewed 2026-05-24 04:14 UTC · model grok-4.3
The pith
Selective inference supplies p-values for regions flagged as anomalous by diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a selective-inference procedure applied to the anomaly scores produced by a diffusion model yields p-values whose distribution under the null correctly reflects the probability of false-positive detection, and that these p-values can therefore be used to control error rates in anomaly localization tasks.
What carries the argument
Selective inference procedure that conditions on the regions selected by the diffusion model and produces adjusted p-values for those regions.
If this is right
- Anomaly detections can be thresholded at any desired false-positive level while maintaining statistical control.
- The same p-value machinery can be used to compare the strength of evidence across different anomalous regions within one image.
- High-stakes applications such as medical diagnosis gain a quantitative reliability measure that was previously unavailable.
- The framework supplies a template that could be reused with other generative models once the selection event is properly characterized.
Where Pith is reading between the lines
- The approach may generalize to other generative architectures if their selection mechanisms can be expressed in a form that selective inference can handle.
- In practice the method could be combined with existing anomaly-scoring pipelines to add post-hoc statistical filtering without retraining the diffusion model.
- Repeated application across many images would allow empirical calibration of the procedure's power under realistic anomaly sizes and contrasts.
Load-bearing premise
The selective inference procedure can be correctly applied to the outputs of a diffusion model to produce valid p-values without bias from the generative process itself.
What would settle it
A simulation or real-data check in which images known to contain no anomalies are processed; if the resulting p-values are not uniformly distributed between 0 and 1, or if the observed false-positive rate exceeds the nominal level, the validity claim is falsified.
Figures
read the original abstract
Anomaly localization in images -- identifying regions that deviate from normal patterns -- is vital in applications such as medical diagnosis and industrial inspection. A recent trend is the use of image generation models in anomaly localization, where these models generate normal-looking counterparts of anomalous images, thereby allowing flexible and adaptive anomaly localization. However, these methods inherit the uncertainty and bias implicitly embedded in the employed generative model, raising concerns about the reliability. To address this, we propose a statistical framework based on selective inference to quantify the significance of detected anomalous regions. Our method provides $p$-values to assess the false positive detection rates, providing a principled measure of reliability. As a proof of concept, we consider anomaly localization using a diffusion model and its applications to medical diagnoses and industrial inspections. The results indicate that the proposed method effectively controls the risk of false positive detection, supporting its use in high-stakes decision-making tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a selective-inference framework that attaches p-values to anomalous regions identified by comparing an observed image against a diffusion-generated normal counterpart. The method is presented as a way to quantify and control false-positive risk in diffusion-based anomaly localization, with proof-of-concept experiments on medical and industrial imaging tasks.
Significance. If the selective-inference construction is shown to be exact for the diffusion selection event, the work would supply a statistically grounded reliability measure for generative anomaly detectors, which is valuable in high-stakes settings where uncontrolled false positives carry real costs. The empirical claim of false-positive control, if rigorously validated, would strengthen the case for deploying such methods.
major comments (2)
- [Abstract] Abstract: the central claim that the method 'provides p-values to assess the false positive detection rates' and 'effectively controls the risk of false positive detection' rests on the assumption that the diffusion-induced selection event can be exactly represented as a known constraint (linear, convex, or polyhedral) on the data. No such representation or conditioning-set derivation is supplied, leaving open the possibility that the p-values are biased by misspecification of the iterative denoising trajectory.
- [Abstract] The weakest assumption identified in the reader's report—that the selective-inference procedure can be applied to diffusion outputs without bias from the generative process—is load-bearing: standard selective-inference theory (truncated Gaussian or polyhedral conditioning) requires an exactly known selection region. Without an explicit construction or error bound on any approximation of that region, the uniformity of the p-values under the null is not guaranteed.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments correctly identify that the validity of our selective-inference p-values depends on an explicit characterization of the diffusion-induced selection event. We address each point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'provides p-values to assess the false positive detection rates' and 'effectively controls the risk of false positive detection' rests on the assumption that the diffusion-induced selection event can be exactly represented as a known constraint (linear, convex, or polyhedral) on the data. No such representation or conditioning-set derivation is supplied, leaving open the possibility that the p-values are biased by misspecification of the iterative denoising trajectory.
Authors: We agree that an explicit representation of the selection event is required for the p-values to be exactly valid. The manuscript introduces the selective-inference framework at a conceptual level but does not supply the step-by-step derivation that maps the iterative denoising trajectory to a polyhedral constraint on the observed image. We will add this derivation (including the linear operators corresponding to each denoising step under the null) in a new subsection of the revised manuscript. revision: yes
-
Referee: [Abstract] The weakest assumption identified in the reader's report—that the selective-inference procedure can be applied to diffusion outputs without bias from the generative process—is load-bearing: standard selective-inference theory (truncated Gaussian or polyhedral conditioning) requires an exactly known selection region. Without an explicit construction or error bound on any approximation of that region, the uniformity of the p-values under the null is not guaranteed.
Authors: We concur that uniformity under the null is guaranteed only when the selection region is exactly known. The current version does not provide either the explicit polyhedral construction or a quantitative bound on any approximation error arising from the diffusion process. In the revision we will supply the exact construction and, if any approximation is retained for computational reasons, include a corresponding error analysis. revision: yes
Circularity Check
No circularity: selective inference p-values derived from standard conditioning without reduction to fitted inputs or self-citations
full rationale
The paper applies selective inference to diffusion-based anomaly localization to obtain p-values. The abstract and described framework invoke standard selective inference constructions on the outputs of a generative model. No equations or steps are shown that define the target p-value in terms of itself, rename a fitted quantity as a prediction, or rely on a load-bearing self-citation whose validity is unverified. The central claim (valid false-positive control) rests on the correctness of characterizing the selection event, which is an external modeling assumption rather than a definitional tautology. This is the common case of an independent statistical construction applied to a new domain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Selective inference assumptions hold for the regions selected by the diffusion-based anomaly detector
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
selective p-value … conditional on the selection event induced by the diffusion model … piecewise-linear mapping … truncated Gaussian
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 … PH0(pselective ≤ α | …) = α
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Autoencoders for unsupervised anomaly segmentation in brain mr images: A comparative study
Christoph Baur, Stefan Denner, Benedikt Wiestler, Nassir Navab, and Shadi Albarqouni. Autoencoders for unsupervised anomaly segmentation in brain mr images: A comparative study. Medical Image Analysis, 69: 0 101952, 2021. ISSN 1361-8415. doi:https://doi.org/10.1016/j.media.2020.101952. URL https://www.sciencedirect.com/science/article/pii/S1361841520303169
-
[2]
Mvtec ad ― a comprehensive real-world dataset for unsupervised anomaly detection
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad ― a comprehensive real-world dataset for unsupervised anomaly detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9584--9592, 2019. doi:10.1109/CVPR.2019.00982
-
[3]
Valid inference corrected for outlier removal
Shuxiao Chen and Jacob Bien. Valid inference corrected for outlier removal. Journal of Computational and Graphical Statistics, 29 0 (2): 0 323--334, 2020
work page 2020
-
[4]
Unsupervised detection of lesions in brain MRI using constrained adversarial auto-encoders
Xiaoran Chen and Ender Konukoglu. Unsupervised detection of lesions in brain MRI using constrained adversarial auto-encoders. In Medical Imaging with Deep Learning, 2018. URL https://openreview.net/forum?id=H1nGLZ2oG
work page 2018
-
[5]
Anomaly detection of defects on concrete structures with the convolutional autoencoder
Jun Kang Chow, Zhaoyu Su, Jimmy Wu, Pin Siang Tan, Xin Mao, and Yu-Hsing Wang. Anomaly detection of defects on concrete structures with the convolutional autoencoder. Advanced Engineering Informatics, 45: 0 101105, 2020
work page 2020
-
[6]
Fast and more powerful selective inference for sparse high-order interaction model
Diptesh Das, Vo Nguyen Le Duy, Hiroyuki Hanada, Koji Tsuda, and Ichiro Takeuchi. Fast and more powerful selective inference for sparse high-order interaction model. arXiv preprint arXiv:2106.04929, 2021
-
[7]
More powerful conditional selective inference for generalized lasso by parametric programming
Vo Nguyen Le Duy and Ichiro Takeuchi. More powerful conditional selective inference for generalized lasso by parametric programming. The Journal of Machine Learning Research, 23 0 (1): 0 13544--13580, 2022
work page 2022
-
[8]
Computing valid p-value for optimal changepoint by selective inference using dynamic programming
Vo Nguyen Le Duy, Hiroki Toda, Ryota Sugiyama, and Ichiro Takeuchi. Computing valid p-value for optimal changepoint by selective inference using dynamic programming. In Advances in Neural Information Processing Systems, 2020
work page 2020
-
[9]
Vo Nguyen Le Duy, Shogo Iwazaki, and Ichiro Takeuchi. Quantifying statistical significance of neural network-based image segmentation by selective inference. Advances in Neural Information Processing Systems, 35: 0 31627--31639, 2022
work page 2022
-
[10]
Optimal Inference After Model Selection
William Fithian, Dennis Sun, and Jonathan Taylor. Optimal inference after model selection. arXiv preprint arXiv:1410.2597, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
Diffusion models for counterfactual generation and anomaly detection in brain images
Alessandro Fontanella, Grant Mair, Joanna Wardlaw, Emanuele Trucco, and Amos Storkey. Diffusion models for counterfactual generation and anomaly detection in brain images. IEEE Transactions on Medical Imaging, 2024
work page 2024
-
[12]
Selective inference for hierarchical clustering
Lucy L Gao, Jacob Bien, and Daniela Witten. Selective inference for hierarchical clustering. Journal of the American Statistical Association, pages 1--11, 2022
work page 2022
-
[13]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840--6851. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1a...
work page 2020
-
[14]
Exact post-selection inference for the generalized lasso path
Sangwon Hyun, Max G’sell, and Ryan J Tibshirani. Exact post-selection inference for the generalized lasso path. Electronic Journal of Statistics, 12 0 (1): 0 1053--1097, 2018
work page 2018
-
[15]
Unsupervised anomaly detection in medical images using masked diffusion model
Hasan Iqbal, Umar Khalid, Chen Chen, and Jing Hua. Unsupervised anomaly detection in medical images using masked diffusion model. In International Workshop on Machine Learning in Medical Imaging, pages 372--381. Springer, 2023
work page 2023
-
[16]
Debasish Jana, Jayant Patil, Sudheendra Herkal, Satish Nagarajaiah, and Leonardo Duenas-Osorio. Cnn and convolutional autoencoder (cae) based real-time sensor fault detection, localization, and correction. Mechanical Systems and Signal Processing, 169: 0 108723, 2022
work page 2022
-
[17]
Testing for a change in mean after changepoint detection
Sean Jewell, Paul Fearnhead, and Daniela Witten. Testing for a change in mean after changepoint detection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84 0 (4): 0 1082--1104, 2022
work page 2022
-
[18]
Alexandros Karargyris, Renato Umeton, Micah J. Sheller, Alejandro Aristizabal, Johnu George, Anna Wuest, Sarthak Pati, Hasan Kassem, Maximilian Zenk, Ujjwal Baid, Prakash Narayana Moorthy , Alexander Chowdhury, Junyi Guo, Sahil Nalawade, Jacob Rosenthal, David Kanter, Maria Xenochristou, Daniel J. Beutel, Verena Chung, Timothy Bergquist, James Eddy, Abuba...
-
[19]
The asnr-miccai brain tumor segmentation (brats) challenge 2023: Intracranial meningioma, 2023
Dominic LaBella, Maruf Adewole, Michelle Alonso-Basanta, Talissa Altes, Syed Muhammad Anwar, Ujjwal Baid, Timothy Bergquist, Radhika Bhalerao, Sully Chen, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Devon Godfrey, Fathi Hilal, Ariana Familiar, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang, Elaine Johanson, Anahita Fathi Kaz...
-
[20]
Cad-da: Controllable anomaly detection after domain adaptation by statistical inference
Vo Nguyen Le Duy, Hsuan-Tien Lin, and Ichiro Takeuchi. Cad-da: Controllable anomaly detection after domain adaptation by statistical inference. In International Conference on Artificial Intelligence and Statistics, pages 1828--1836. PMLR, 2024
work page 2024
-
[21]
Exact post model selection inference for marginal screening
Jason D Lee and Jonathan E Taylor. Exact post model selection inference for marginal screening. Advances in neural information processing systems, 27, 2014
work page 2014
-
[22]
Evaluating the statistical significance of biclusters
Jason D Lee, Yuekai Sun, and Jonathan E Taylor. Evaluating the statistical significance of biclusters. Advances in neural information processing systems, 28, 2015
work page 2015
-
[23]
Exact post-selection inference, with application to the lasso
Jason D Lee, Dennis L Sun, Yuekai Sun, and Jonathan E Taylor. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44 0 (3): 0 907--927, 2016
work page 2016
-
[24]
Fast non-markovian diffusion model for weakly supervised anomaly detection in brain mr images
Jinpeng Li, Hanqun Cao, Jiaze Wang, Furui Liu, Qi Dou, Guangyong Chen, and Pheng-Ann Heng. Fast non-markovian diffusion model for weakly supervised anomaly detection in brain mr images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 579--589. Springer, 2023
work page 2023
-
[25]
Removing anomalies as noises for industrial defect localization
Fanbin Lu, Xufeng Yao, Chi-Wing Fu, and Jiaya Jia. Removing anomalies as noises for industrial defect localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16166--16175, 2023
work page 2023
-
[26]
Valid p-value for deep learning-driven salient region
Daiki Miwa, Duy Vo Nguyen Le, and Ichiro Takeuchi. Valid p-value for deep learning-driven salient region. In Proceedings of the 11th International Conference on Learning Representation, 2023
work page 2023
-
[27]
Statistical test for anomaly detections by variational auto-encoders
Daiki Miwa, Tomohiro Shiraishi, Vo Nguyen Le Duy, Teruyuki Katsuoka, and Ichiro Takeuchi. Statistical test for anomaly detections by variational auto-encoders. arXiv preprint arXiv:2402.03724, 2024
-
[28]
Anomaly detection with conditioned denoising diffusion models, 2023
Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffusion models, 2023
work page 2023
-
[29]
Fast unsupervised brain anomaly detection and segmentation with diffusion models
Walter HL Pinaya, Mark S Graham, Robert Gray, Pedro F Da Costa, Petru-Daniel Tudosiu, Paul Wright, Yee H Mah, Andrew D MacKinnon, James T Teo, Rolf Jager, et al. Fast unsupervised brain anomaly detection and segmentation with diffusion models. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 705--714. Springer, 2022
work page 2022
-
[30]
David R \"u gamer and Sonja Greven. Inference for l 2-boosting. Statistics and computing, 30 0 (2): 0 279--289, 2020
work page 2020
-
[31]
Surface anomaly detection and localization with diffusion-based reconstruction
Xinyu Sheng, Shande Tuo, and Lu Wang. Surface anomaly detection and localization with diffusion-based reconstruction. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2024
work page 2024
-
[32]
Statistical test for attention map in vision transformers
Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka, Vo Nguyen Le Duy, Koichi Taji, and Ichiro Takeuchi. Statistical test for attention map in vision transformers. International Conference on Machine Learning, 2024
work page 2024
-
[33]
Denoising diffusion implicit models, 2022
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022
work page 2022
-
[34]
Selective inference for sparse high-order interaction models
Shinya Suzumura, Kazuya Nakagawa, Yuta Umezu, Koji Tsuda, and Ichiro Takeuchi. Selective inference for sparse high-order interaction models. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3338--3347. JMLR. org, 2017
work page 2017
-
[35]
Computing valid p-values for image segmentation by selective inference
Kosuke Tanizaki, Noriaki Hashimoto, Yu Inatsu, Hidekata Hontani, and Ichiro Takeuchi. Computing valid p-values for image segmentation by selective inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9553--9562, 2020
work page 2020
-
[36]
Statistical learning and selective inference
Jonathan Taylor and Robert J Tibshirani. Statistical learning and selective inference. Proceedings of the National Academy of Sciences, 112 0 (25): 0 7629--7634, 2015
work page 2015
-
[37]
Dynamic addition of noise in a diffusion model for anomaly detection
Justin Tebbe and Jawad Tayyub. Dynamic addition of noise in a diffusion model for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3940--3949, 2024
work page 2024
-
[38]
Exact post-selection inference for sequential regression procedures
Ryan J Tibshirani, Jonathan Taylor, Richard Lockhart, and Robert Tibshirani. Exact post-selection inference for sequential regression procedures. Journal of the American Statistical Association, 111 0 (514): 0 600--620, 2016
work page 2016
-
[39]
Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy, and Ichiro Takeuchi. Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation. arXiv preprint arXiv:2104.10840, 2021
-
[40]
Julian Wyatt, Adam Leach, Sebastian M. Schmon, and Chris G. Willcocks. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 650--656, June 2022
work page 2022
-
[41]
Selective inference for group-sparse linear models
Fan Yang, Rina Foygel Barber, Prateek Jain, and John Lafferty. Selective inference for group-sparse linear models. In Advances in Neural Information Processing Systems, pages 2469--2477, 2016
work page 2016
-
[42]
Unsupervised surface anomaly detection with diffusion probabilistic model
Xinyi Zhang, Naiqi Li, Jiawei Li, Tao Dai, Yong Jiang, and Shu-Tao Xia. Unsupervised surface anomaly detection with diffusion probabilistic model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6782--6791, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.