Extremal Contours: Gradient-driven contours for compact visual attribution
Pith reviewed 2026-05-18 01:22 UTC · model grok-4.3
The pith
Smooth star-convex contours optimized by gradients deliver compact, faithful visual attributions that match dense masks in fidelity while using far fewer parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A training-free method parameterizes attribution regions as star-convex contours via truncated Fourier series and optimizes them under an extremal preserve/delete objective driven by classifier gradients, producing single simply-connected masks that achieve extremal fidelity comparable to dense masks yet with substantially lower complexity and improved consistency.
What carries the argument
Truncated Fourier series parameterization of star-convex contours optimized under an extremal preserve/delete objective using classifier gradients.
Load-bearing premise
Restricting the solution space to low-dimensional smooth star-convex contours will preserve faithfulness to the model's decision while preventing adversarial masking artifacts during gradient-based optimization.
What would settle it
A side-by-side evaluation on the same ImageNet images where the contour method yields lower preserve or delete scores than an unconstrained dense mask baseline optimized to the same objective.
Figures
read the original abstract
Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Extremal Contours, a training-free visual attribution method that parameterizes a star-convex region via a truncated Fourier series on the radius function r(θ) and optimizes the coefficients under an extremal preserve/delete objective driven by classifier gradients. It claims this yields compact, simply-connected masks that match the fidelity of dense perturbation masks on ImageNet classifiers, improve relevance mass and run-to-run consistency, reduce complexity relative to gradient and perturbation baselines, and deliver especially large gains (>15% relevance mass) on DINO models while supporting explicit area control and multi-contour extensions.
Significance. If the central claims hold, the work provides a principled way to obtain faithful yet low-complexity attributions by restricting the feasible set to smooth, low-dimensional star-convex contours. The reduction from thousands of mask pixels to ~2N Fourier coefficients, the built-in guarantee of a single connected region, and the transparent fidelity-area profiles constitute concrete advances over fragmented dense-mask baselines. Strong reported gains on self-supervised DINO models and the absence of post-processing cleanup are additional strengths.
major comments (1)
- [Abstract and experimental results] The central claim that the method 'matches the extremal fidelity of dense masks' (abstract) is load-bearing yet unsupported by a direct quantitative comparison. No table or section reports the final preserve/delete objective value (or relevance mass) achieved by the Fourier-contour optimizer versus an unconstrained dense mask optimized under identical loss, gradient steps, and initialization. Because the star-convex smooth parameterization is a strict subset of all possible masks, any reported fidelity match must be verified by showing that the restricted optimum reaches the same score; otherwise the restriction may silently incur a fidelity penalty when the true extremal region contains concavities or disconnected pixels.
minor comments (2)
- [Abstract] The abstract states 'improved run-to-run consistency' and 'over 15% relevance mass' gains on DINO models but supplies no error bars, number of runs, or exact baseline implementations; these quantitative details should be added to the experimental section.
- [Method] The truncation order N of the Fourier series and the area-control parameter are listed as free parameters; their chosen values and sensitivity analysis should be reported explicitly (e.g., in a table or appendix) to allow reproduction.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The point raised about verifying the fidelity claim with a direct comparison is well-taken, and we address it below along with plans for revision.
read point-by-point responses
-
Referee: [Abstract and experimental results] The central claim that the method 'matches the extremal fidelity of dense masks' (abstract) is load-bearing yet unsupported by a direct quantitative comparison. No table or section reports the final preserve/delete objective value (or relevance mass) achieved by the Fourier-contour optimizer versus an unconstrained dense mask optimized under identical loss, gradient steps, and initialization. Because the star-convex smooth parameterization is a strict subset of all possible masks, any reported fidelity match must be verified by showing that the restricted optimum reaches the same score; otherwise the restriction may silently incur a fidelity penalty when the true extremal region contains concavities or disconnected pixels.
Authors: We agree that the manuscript would be strengthened by an explicit head-to-head comparison of the final preserve/delete objective (and relevance mass) between the Fourier-contour optimizer and an unconstrained dense mask under identical loss, gradient steps, and initialization. The current version reports that Extremal Contours match the fidelity of dense perturbation baselines on ImageNet classifiers and achieve higher relevance mass than gradient/perturbation methods, but does not include this specific controlled experiment against an optimized dense mask. In the revised manuscript we will add a new table (and corresponding text in the experimental section) that performs this direct comparison, reporting the achieved objective values for both approaches. This will allow readers to assess whether the star-convex restriction incurs any measurable fidelity penalty in practice. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper introduces a training-free method that parameterizes star-convex regions via truncated Fourier series and optimizes them directly under an extremal preserve/delete objective using standard classifier gradients. All load-bearing steps (parameterization, optimization, and reported fidelity gains) are defined from first principles and external benchmarks rather than reducing to self-fitted quantities, prior author results, or tautological redefinitions. No self-citation chains, ansatzes smuggled via citation, or uniqueness theorems imported from the same authors appear in the provided text; the central claims rest on empirical comparisons that remain independently falsifiable.
Axiom & Free-Parameter Ledger
free parameters (2)
- Fourier series truncation order
- Area control parameter
axioms (2)
- domain assumption Attribution regions can be adequately represented as star-convex sets
- domain assumption Classifier gradients supply reliable signals for boundary optimization under extremal preserve/delete objectives
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Imagenet classi- fication with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi- fication with deep convolutional neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, pages 1097–1105, 2012
work page 2012
-
[2]
Deep residual 12 learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual 12 learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016
work page 2016
-
[3]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al. Learning transferable visual models from natural language supervision. InProceedings of the Interna- tional Conference on Machine Learning (ICML), volume 139, pages 8748– 8763, 2021
work page 2021
-
[4]
Albert Alonso and Julius B. Kirkegaard. Fast detection of slender bodies in high density microscopy data.Communications Biology, 6, 2023. doi: 10.1038/s42003-023-05098-1
-
[5]
A survey on deep learning in medical image analysis.Medical Image Analy- sis, 42:60–88, 2017
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, et al. A survey on deep learning in medical image analysis.Medical Image Analy- sis, 42:60–88, 2017
work page 2017
-
[6]
A guide to deep learning in healthcare.Nature Medicine, 25(1):24–29, 2019
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Kristin Chou, et al. A guide to deep learning in healthcare.Nature Medicine, 25(1):24–29, 2019
work page 2019
-
[7]
End to End Learning for Self-Driving Cars
Mariusz Bojarski, Davide Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[8]
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez and Been Kim. Towards a rigorous science of inter- pretable machine learning.arXiv preprint arXiv:1702.08608, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
NEMt: Fast targeted explanations for medical image models via neural explanation masks
Bjørn Leth Møller, Sepideh Amiri, Christian Igel, Kristoffer Knutsen Wick- strøm, Robert Jenssen, Matthias Keicher, Mohammad Farid Azampour, Nassir Navab, and Bulat Ibragimov. NEMt: Fast targeted explanations for medical image models via neural explanation masks. InProceedings of the 6th Northern Lights Deep Learning Conference (NLDL), volume 265 ofPro- c...
-
[10]
URLhttps://proceedings.mlr.press/v265/moller25a.html
-
[11]
Deep inside con- volutional networks: Visualising image classification models and saliency maps
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside con- volutional networks: Visualising image classification models and saliency maps. InICLR Workshop, 2014
work page 2014
-
[12]
Striving for simplicity: The all convolutional net
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. InICLR, 2015
work page 2015
-
[13]
Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakr- ishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual ex- planations from deep networks via gradient-based localization.Interna- tional Journal of Computer Vision, 128(2):336–359, February 2020. doi: 10.1007/s11263-019-01228-7. 13
-
[14]
Axiomatic Attribution for Deep Networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017. URLhttps://arxiv.org/abs/1703.01365
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Sanity checks for saliency maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InNeurIPS, 2018
work page 2018
-
[16]
Ruth C. Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3429–3437, 2017. doi: 10. 1109/ICCV.2017.371
work page 2017
-
[17]
Understanding deep networks via extremal perturbations and smooth masks, 2019
Ruth Fong, Mandela Patrick, and Andrea Vedaldi. Understanding deep networks via extremal perturbations and smooth masks, 2019. URLhttps: //arxiv.org/abs/1910.08485
-
[18]
A benchmark for interpretability methods in deep neural networks
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for interpretability methods in deep neural networks. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, pages 9737–9748, 2019
work page 2019
-
[19]
Methods for interpreting and understanding deep neural networks
Gr´ egoire Montavon, Wojciech Samek, and Klaus-Robert M¨ uller. Methods for interpreting and understanding deep neural networks. InDigital Signal Processing, pages 1–10. Springer, 2018
work page 2018
-
[20]
Restricting the flow: Information bottlenecks for attribution
Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. Restricting the flow: Information bottlenecks for attribution. InProceedings of the International Conference on Learning Representations (ICLR), 2020
work page 2020
-
[21]
Bjørn Leth Møller, Christian Igel, Kristoffer Knutsen Wickstrøm, Jon Sporring, Robert Jenssen, and Bulat Ibragimov. Finding NEM-u: Explain- ing unsupervised representation learning through neural network generated explanation masks. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp...
work page 2024
-
[22]
Generating visual explanations from deep networks using implicit neural representations
Michal Byra and Henrik Skibbe. Generating visual explanations from deep networks using implicit neural representations. In2025 IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 3310–
-
[23]
Pengfei Wang, Jiantao Song, Lei Wang, Shiqing Xin, Dong-Ming Yan, Shuangmin Chen, Changhe Tu, and Wenping Wang. Towards voronoi di- agrams of surface patches.IEEE Transactions on Visualization and Com- puter Graphics, 2025. 14
work page 2025
-
[24]
Spline refinement with differentiable rendering
Frans Zdyb, Albert Alonso, and Julius B Kirkegaard. Spline refinement with differentiable rendering. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 558–567. Springer, 2025
work page 2025
-
[25]
Emerging properties in self- supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self- supervised vision transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9650–9660, 2021
work page 2021
-
[26]
Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017
work page 2017
-
[27]
Grad-cam++: Generalized gradient-based visual expla- nations for deep convolutional networks
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad-cam++: Generalized gradient-based visual expla- nations for deep convolutional networks. In2018 IEEE winter conference on applications of computer vision (WACV), pages 839–847. IEEE, 2018
work page 2018
-
[28]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014
work page 2014
-
[30]
Kristoffer K Wickstrøm, Daniel J Trosten, Sigurd Løkse, Ahcene Boubekki, Karl Øyvind Mikalsen, Michael C Kampffmeyer, and Robert Jenssen. Re- lax: Representation learning explainability.International Journal of Com- puter Vision, 131(6):1584–1610, 2023
work page 2023
-
[32]
Top-down neural attention by excitation backprop
Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10):1084–1102, 2018
work page 2018
-
[33]
Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations
Leila Arras, Ahmed Osman, and Wojciech Samek. Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40, 2022
work page 2022
-
[34]
Evaluating and aggre- gating feature-based model explanations.arXiv preprint arXiv:2005.00631, 2020
Umang Bhatt, Adrian Weller, and Jos´ e MF Moura. Evaluating and aggre- gating feature-based model explanations.arXiv preprint arXiv:2005.00631, 2020
-
[35]
Concise explanations of neural networks using adversarial training
Prasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Xi Wu, and Somesh Jha. Concise explanations of neural networks using adversarial training. InInternational Conference on Machine Learning, pages 1383–
-
[36]
On the Robustness of Interpretability Methods
David Alvarez-Melis and Tommi S Jaakkola. On the robustness of inter- pretability methods.arXiv preprint arXiv:1806.08049, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Rise: Randomized input sampling for explanation of black-box models
Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. InBMVC, 2018. URL http://bmvc2018.org/contents/papers/1064.pdf
work page 2018
-
[38]
Grad-cam: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017
work page 2017
-
[39]
Attention-based deep learning segmentation: Application to brain tumor delineation
Reza Karimzadeh, Emad Fatemizadeh, and Hossein Arabi. Attention-based deep learning segmentation: Application to brain tumor delineation. In 2021 28th National and 6th International Iranian Conference on Biomedical Engineering (ICBME), pages 248–252. IEEE, 2021
work page 2021
-
[40]
Reza Karimzadeh, Emad Fatemizadeh, and Hossein Arabi. A novel shape- based loss function for machine learning-based seminal organ segmentation in medical imaging.arXiv preprint arXiv:2203.03336, 2022. A Implementation Details The optimization process involves a few practical considerations that make the method stable and reproducible. Initialization.The c...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.