pith. sign in

arxiv: 2511.01411 · v1 · submitted 2025-11-03 · 💻 cs.CV · cs.LG· eess.IV

Extremal Contours: Gradient-driven contours for compact visual attribution

Pith reviewed 2026-05-18 01:22 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV
keywords visual attributionexplainable AIcontour optimizationFourier parameterizationstar-convex masksgradient-based explanationImageNet classifiers
0
0 comments X

The pith

Smooth star-convex contours optimized by gradients deliver compact, faithful visual attributions that match dense masks in fidelity while using far fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes replacing fragmented dense perturbation masks with smooth, tunable contours for explaining image classifier decisions. A star-convex region is defined by a truncated Fourier series and refined directly from classifier gradients under an extremal preserve-or-delete objective. This restriction to low-dimensional smooth shapes cuts the parameter count by orders of magnitude, guarantees a single connected region, and removes the need for post-processing cleanup. The resulting attributions maintain high relevance mass and run-to-run stability on ImageNet models, with larger gains reported on self-supervised DINO backbones. Explicit control over contour area further allows construction of importance profiles that trade fidelity against mask size.

Core claim

A training-free method parameterizes attribution regions as star-convex contours via truncated Fourier series and optimizes them under an extremal preserve/delete objective driven by classifier gradients, producing single simply-connected masks that achieve extremal fidelity comparable to dense masks yet with substantially lower complexity and improved consistency.

What carries the argument

Truncated Fourier series parameterization of star-convex contours optimized under an extremal preserve/delete objective using classifier gradients.

Load-bearing premise

Restricting the solution space to low-dimensional smooth star-convex contours will preserve faithfulness to the model's decision while preventing adversarial masking artifacts during gradient-based optimization.

What would settle it

A side-by-side evaluation on the same ImageNet images where the contour method yields lower preserve or delete scores than an unconstrained dense mask baseline optimized to the same objective.

Figures

Figures reproduced from arXiv: 2511.01411 by Albert Alonso, Bulat Ibragimov, Frans Zdyb, Julius B. Kirkegaard, Reza Karimzadeh.

Figure 1
Figure 1. Figure 1: Comparison of explanation methods on ImageNet validation images for [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative results on ImageNet images. Each column: input image, our [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Robustness of the method. (Top) Red circles denote different initial positions [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Area-fidelity trade-off. (Left) Single closed contours at target areas [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of multiple contour optimization with [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence behavior of the Extremal Contour method across 10 images. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
read the original abstract

Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Extremal Contours, a training-free visual attribution method that parameterizes a star-convex region via a truncated Fourier series on the radius function r(θ) and optimizes the coefficients under an extremal preserve/delete objective driven by classifier gradients. It claims this yields compact, simply-connected masks that match the fidelity of dense perturbation masks on ImageNet classifiers, improve relevance mass and run-to-run consistency, reduce complexity relative to gradient and perturbation baselines, and deliver especially large gains (>15% relevance mass) on DINO models while supporting explicit area control and multi-contour extensions.

Significance. If the central claims hold, the work provides a principled way to obtain faithful yet low-complexity attributions by restricting the feasible set to smooth, low-dimensional star-convex contours. The reduction from thousands of mask pixels to ~2N Fourier coefficients, the built-in guarantee of a single connected region, and the transparent fidelity-area profiles constitute concrete advances over fragmented dense-mask baselines. Strong reported gains on self-supervised DINO models and the absence of post-processing cleanup are additional strengths.

major comments (1)
  1. [Abstract and experimental results] The central claim that the method 'matches the extremal fidelity of dense masks' (abstract) is load-bearing yet unsupported by a direct quantitative comparison. No table or section reports the final preserve/delete objective value (or relevance mass) achieved by the Fourier-contour optimizer versus an unconstrained dense mask optimized under identical loss, gradient steps, and initialization. Because the star-convex smooth parameterization is a strict subset of all possible masks, any reported fidelity match must be verified by showing that the restricted optimum reaches the same score; otherwise the restriction may silently incur a fidelity penalty when the true extremal region contains concavities or disconnected pixels.
minor comments (2)
  1. [Abstract] The abstract states 'improved run-to-run consistency' and 'over 15% relevance mass' gains on DINO models but supplies no error bars, number of runs, or exact baseline implementations; these quantitative details should be added to the experimental section.
  2. [Method] The truncation order N of the Fourier series and the area-control parameter are listed as free parameters; their chosen values and sensitivity analysis should be reported explicitly (e.g., in a table or appendix) to allow reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The point raised about verifying the fidelity claim with a direct comparison is well-taken, and we address it below along with plans for revision.

read point-by-point responses
  1. Referee: [Abstract and experimental results] The central claim that the method 'matches the extremal fidelity of dense masks' (abstract) is load-bearing yet unsupported by a direct quantitative comparison. No table or section reports the final preserve/delete objective value (or relevance mass) achieved by the Fourier-contour optimizer versus an unconstrained dense mask optimized under identical loss, gradient steps, and initialization. Because the star-convex smooth parameterization is a strict subset of all possible masks, any reported fidelity match must be verified by showing that the restricted optimum reaches the same score; otherwise the restriction may silently incur a fidelity penalty when the true extremal region contains concavities or disconnected pixels.

    Authors: We agree that the manuscript would be strengthened by an explicit head-to-head comparison of the final preserve/delete objective (and relevance mass) between the Fourier-contour optimizer and an unconstrained dense mask under identical loss, gradient steps, and initialization. The current version reports that Extremal Contours match the fidelity of dense perturbation baselines on ImageNet classifiers and achieve higher relevance mass than gradient/perturbation methods, but does not include this specific controlled experiment against an optimized dense mask. In the revised manuscript we will add a new table (and corresponding text in the experimental section) that performs this direct comparison, reporting the achieved objective values for both approaches. This will allow readers to assess whether the star-convex restriction incurs any measurable fidelity penalty in practice. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces a training-free method that parameterizes star-convex regions via truncated Fourier series and optimizes them directly under an extremal preserve/delete objective using standard classifier gradients. All load-bearing steps (parameterization, optimization, and reported fidelity gains) are defined from first principles and external benchmarks rather than reducing to self-fitted quantities, prior author results, or tautological redefinitions. No self-citation chains, ansatzes smuggled via citation, or uniqueness theorems imported from the same authors appear in the provided text; the central claims rest on empirical comparisons that remain independently falsifiable.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on geometric constraints and gradient utility for a new parameterization; limited details available from abstract only.

free parameters (2)
  • Fourier series truncation order
    Controls the number of coefficients and thus the smoothness versus expressiveness of the contour boundary; chosen to achieve orders-of-magnitude parameter reduction.
  • Area control parameter
    Explicit scalar used to generate importance contour maps at varying region sizes.
axioms (2)
  • domain assumption Attribution regions can be adequately represented as star-convex sets
    Invoked to guarantee a single simply connected mask without fragmentation.
  • domain assumption Classifier gradients supply reliable signals for boundary optimization under extremal preserve/delete objectives
    Used to drive the training-free contour updates.

pith-pipeline@v0.9.0 · 5755 in / 1529 out tokens · 49276 ms · 2026-05-18T01:22:21.845171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 5 internal anchors

  1. [1]

    Imagenet classi- fication with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi- fication with deep convolutional neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, pages 1097–1105, 2012

  2. [2]

    Deep residual 12 learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual 12 learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  3. [3]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al. Learning transferable visual models from natural language supervision. InProceedings of the Interna- tional Conference on Machine Learning (ICML), volume 139, pages 8748– 8763, 2021

  4. [4]

    Kirkegaard

    Albert Alonso and Julius B. Kirkegaard. Fast detection of slender bodies in high density microscopy data.Communications Biology, 6, 2023. doi: 10.1038/s42003-023-05098-1

  5. [5]

    A survey on deep learning in medical image analysis.Medical Image Analy- sis, 42:60–88, 2017

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, et al. A survey on deep learning in medical image analysis.Medical Image Analy- sis, 42:60–88, 2017

  6. [6]

    A guide to deep learning in healthcare.Nature Medicine, 25(1):24–29, 2019

    Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Kristin Chou, et al. A guide to deep learning in healthcare.Nature Medicine, 25(1):24–29, 2019

  7. [7]

    End to End Learning for Self-Driving Cars

    Mariusz Bojarski, Davide Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016

  8. [8]

    Towards A Rigorous Science of Interpretable Machine Learning

    Finale Doshi-Velez and Been Kim. Towards a rigorous science of inter- pretable machine learning.arXiv preprint arXiv:1702.08608, 2017

  9. [9]

    NEMt: Fast targeted explanations for medical image models via neural explanation masks

    Bjørn Leth Møller, Sepideh Amiri, Christian Igel, Kristoffer Knutsen Wick- strøm, Robert Jenssen, Matthias Keicher, Mohammad Farid Azampour, Nassir Navab, and Bulat Ibragimov. NEMt: Fast targeted explanations for medical image models via neural explanation masks. InProceedings of the 6th Northern Lights Deep Learning Conference (NLDL), volume 265 ofPro- c...

  10. [10]

    URLhttps://proceedings.mlr.press/v265/moller25a.html

  11. [11]

    Deep inside con- volutional networks: Visualising image classification models and saliency maps

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside con- volutional networks: Visualising image classification models and saliency maps. InICLR Workshop, 2014

  12. [12]

    Striving for simplicity: The all convolutional net

    Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. InICLR, 2015

  13. [13]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakr- ishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual ex- planations from deep networks via gradient-based localization.Interna- tional Journal of Computer Vision, 128(2):336–359, February 2020. doi: 10.1007/s11263-019-01228-7. 13

  14. [14]

    Axiomatic Attribution for Deep Networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017. URLhttps://arxiv.org/abs/1703.01365

  15. [15]

    Sanity checks for saliency maps

    Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InNeurIPS, 2018

  16. [16]

    Fong and Andrea Vedaldi

    Ruth C. Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3429–3437, 2017. doi: 10. 1109/ICCV.2017.371

  17. [17]

    Understanding deep networks via extremal perturbations and smooth masks, 2019

    Ruth Fong, Mandela Patrick, and Andrea Vedaldi. Understanding deep networks via extremal perturbations and smooth masks, 2019. URLhttps: //arxiv.org/abs/1910.08485

  18. [18]

    A benchmark for interpretability methods in deep neural networks

    Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for interpretability methods in deep neural networks. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, pages 9737–9748, 2019

  19. [19]

    Methods for interpreting and understanding deep neural networks

    Gr´ egoire Montavon, Wojciech Samek, and Klaus-Robert M¨ uller. Methods for interpreting and understanding deep neural networks. InDigital Signal Processing, pages 1–10. Springer, 2018

  20. [20]

    Restricting the flow: Information bottlenecks for attribution

    Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. Restricting the flow: Information bottlenecks for attribution. InProceedings of the International Conference on Learning Representations (ICLR), 2020

  21. [21]

    Finding NEM-u: Explain- ing unsupervised representation learning through neural network generated explanation masks

    Bjørn Leth Møller, Christian Igel, Kristoffer Knutsen Wickstrøm, Jon Sporring, Robert Jenssen, and Bulat Ibragimov. Finding NEM-u: Explain- ing unsupervised representation learning through neural network generated explanation masks. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp...

  22. [22]

    Generating visual explanations from deep networks using implicit neural representations

    Michal Byra and Henrik Skibbe. Generating visual explanations from deep networks using implicit neural representations. In2025 IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 3310–

  23. [23]

    Towards voronoi di- agrams of surface patches.IEEE Transactions on Visualization and Com- puter Graphics, 2025

    Pengfei Wang, Jiantao Song, Lei Wang, Shiqing Xin, Dong-Ming Yan, Shuangmin Chen, Changhe Tu, and Wenping Wang. Towards voronoi di- agrams of surface patches.IEEE Transactions on Visualization and Com- puter Graphics, 2025. 14

  24. [24]

    Spline refinement with differentiable rendering

    Frans Zdyb, Albert Alonso, and Julius B Kirkegaard. Spline refinement with differentiable rendering. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 558–567. Springer, 2025

  25. [25]

    Emerging properties in self- supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self- supervised vision transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9650–9660, 2021

  26. [26]

    Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

    Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

  27. [27]

    Grad-cam++: Generalized gradient-based visual expla- nations for deep convolutional networks

    Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad-cam++: Generalized gradient-based visual expla- nations for deep convolutional networks. In2018 IEEE winter conference on applications of computer vision (WACV), pages 839–847. IEEE, 2018

  28. [28]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

  29. [29]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  30. [30]

    Re- lax: Representation learning explainability.International Journal of Com- puter Vision, 131(6):1584–1610, 2023

    Kristoffer K Wickstrøm, Daniel J Trosten, Sigurd Løkse, Ahcene Boubekki, Karl Øyvind Mikalsen, Michael C Kampffmeyer, and Robert Jenssen. Re- lax: Representation learning explainability.International Journal of Com- puter Vision, 131(6):1584–1610, 2023

  31. [32]

    Top-down neural attention by excitation backprop

    Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10):1084–1102, 2018

  32. [33]

    Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations

    Leila Arras, Ahmed Osman, and Wojciech Samek. Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40, 2022

  33. [34]

    Evaluating and aggre- gating feature-based model explanations.arXiv preprint arXiv:2005.00631, 2020

    Umang Bhatt, Adrian Weller, and Jos´ e MF Moura. Evaluating and aggre- gating feature-based model explanations.arXiv preprint arXiv:2005.00631, 2020

  34. [35]

    Concise explanations of neural networks using adversarial training

    Prasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Xi Wu, and Somesh Jha. Concise explanations of neural networks using adversarial training. InInternational Conference on Machine Learning, pages 1383–

  35. [36]

    On the Robustness of Interpretability Methods

    David Alvarez-Melis and Tommi S Jaakkola. On the robustness of inter- pretability methods.arXiv preprint arXiv:1806.08049, 2018

  36. [37]

    Rise: Randomized input sampling for explanation of black-box models

    Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. InBMVC, 2018. URL http://bmvc2018.org/contents/papers/1064.pdf

  37. [38]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  38. [39]

    Attention-based deep learning segmentation: Application to brain tumor delineation

    Reza Karimzadeh, Emad Fatemizadeh, and Hossein Arabi. Attention-based deep learning segmentation: Application to brain tumor delineation. In 2021 28th National and 6th International Iranian Conference on Biomedical Engineering (ICBME), pages 248–252. IEEE, 2021

  39. [40]

    soft mask

    Reza Karimzadeh, Emad Fatemizadeh, and Hossein Arabi. A novel shape- based loss function for machine learning-based seminal organ segmentation in medical imaging.arXiv preprint arXiv:2203.03336, 2022. A Implementation Details The optimization process involves a few practical considerations that make the method stable and reproducible. Initialization.The c...