pith. sign in

arxiv: 2605.16171 · v1 · pith:XL2UKIC6new · submitted 2026-05-15 · 💻 cs.CV

Res²CLIP: Few-Shot Generalist Anomaly Detection with Residual-to-Residual Alignment

Pith reviewed 2026-05-20 18:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords few-shot anomaly detectionresidual alignmentCLIPgeneralist anomaly detectionmultimodal alignmentresidual spacevisual promptscross-category generalization
0
0 comments X

The pith

Shifting multimodal alignment into CLIP's residual space resolves cross-granularity mismatch and domain shift for few-shot generalist anomaly detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the problem of detecting anomalies in novel categories using only a few samples without retraining the model. Existing CLIP approaches struggle because unified text prompts cannot handle fine foreground-background differences and because fine-tuning auxiliary data erodes CLIP's ability to generalize across categories. The authors claim that moving all alignment operations into a single residual space lets residual features cancel out normal variations and class-specific biases while retaining anomaly signals. This insight leads to Res²CLIP, a symmetric residual-to-residual alignment framework built from text-prompt, visual-prompt, and residual-alignment branches, with all learnable parts kept inside the residual domain and objectives that emphasize relative deviations rather than absolute class features. Experiments across datasets show the resulting model improves generalization under few-shot conditions.

Core claim

By conducting multimodal alignment entirely inside CLIP's residual space, residual representations eliminate fine-grained normal feature differences across regions and class-specific biases at once; the resulting Res²CLIP framework uses three branches (text prompt, visual prompt, residual-to-residual alignment) whose optimization forces attention to relative anomaly deviations instead of class-specific features, thereby avoiding both cross-granularity mismatch and cross-category generalization loss.

What carries the argument

The residual-to-residual alignment branch that symmetrically connects visual and text residuals inside CLIP's residual space and whose objectives are designed to emphasize relative anomaly deviations.

If this is right

  • Alignment adapts automatically to fine-grained foreground-background differences without custom prompts per region.
  • All learnable parameters remain inside the residual domain, preserving CLIP's open-world generalization.
  • Optimization objectives direct the model toward relative deviations rather than absolute class features.
  • The three-branch architecture supports symmetric bridging of visual and text modalities without domain shift from auxiliary training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The residual-space idea could be tested on other vision-language models beyond CLIP to check whether the bias-elimination effect is architecture-specific.
  • Combining the residual alignment branch with existing prompt-learning methods might further reduce the number of shots needed for new categories.
  • The focus on relative deviations suggests the framework could help in related tasks such as open-set recognition where class biases also limit generalization.

Load-bearing premise

Residual representations naturally remove fine-grained normal feature differences across regions and class-specific biases while keeping anomaly signals intact.

What would settle it

A test set of completely novel anomaly categories where the residual-to-residual model shows no improvement in detection accuracy over standard CLIP prompt tuning or auxiliary fine-tuning baselines.

Figures

Figures reproduced from arXiv: 2605.16171 by Biao Leng, Jianyuan Wang, Shuo Zhang, Xinyue Liu.

Figure 1
Figure 1. Figure 1: Motivation and comparison with prior work. Existing CLIP-based methods suffer from [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of Res2CLIP, which consists of three branches in the residual domain. 3.1.2 Residual Representation in CLIP Feature Space A core challenge in few-shot GAD lies in decoupling anomaly-state semantics from category-specific information. The original absolute feature space is dominated by object identity, making direct anomaly measurement prone to cross-domain failure. A relative represent… view at source ↗
Figure 3
Figure 3. Figure 3: Optimization objectives for the three branches. (a) Text branch: the adapted text residual is [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Anomaly map comparison with state-of-the-art methods under the 1-shot setting. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: t-SNE visualization of the residual features of two modalities before and after optimization. 4.2 Main Results We evaluate two settings of our method, Res2CLIP∗ and Res2CLIP† , corresponding to the training￾free mode where the adapter A(·) is the identity mapping, and the fine-tune mode where the adapters are optimized using the auxiliary training set, respectively. Quantitative Comparison. Tab. 1 compares… view at source ↗
Figure 6
Figure 6. Figure 6: Trade-off between inference speed, localization performance, and parameter efficiency. The [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on the selection of hyperparameter [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: t-SNE visualization of text features on MVTecAD. [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: t-SNE visualization of visual features on MVTecAD. [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: More t-SNE visualizations of residual distributions in the residual branch before and after [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Anomaly map comparison between three branches under the 1-shot setting. [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
read the original abstract

Few-shot Generalist Anomaly Detection requires models to generalize to novel categories without retraining, posing significant challenges in real-world scenarios with scarce samples and rapidly changing categories. Existing CLIP-based methods face two major challenges: coarse-grained unified text prompts struggle to adapt to fine-grained foreground-background differences, causing cross-granularity mismatch; and fine-tuning on auxiliary datasets disrupts CLIP's inherent open-world generalization due to domain shift, leading to cross-category generalization degradation. To address these, we propose to shift multimodal alignment entirely into a unified residual space, where residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases, simultaneously resolving both problems. Based on this insight, Res$^2$CLIP, the first residual-to-residual alignment framework that symmetrically bridges visual and text modalities within CLIP's residual space, is designed. The framework is developed from a residual perspective into three branches: a text prompt-based branch, a visual prompt-based branch, and a novel residual-to-residual alignment branch. All learnable optimizations are constrained within the residual domain, and the residual alignment optimization objectives are designed to force the model to focus on relative anomaly deviations rather than optimizing class-specific features. Experiments on multiple datasets demonstrate the effectiveness of our architecture. The code is available at https://github.com/hito2448/Res2CLIP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Res²CLIP, a residual-to-residual alignment framework for few-shot generalist anomaly detection in CLIP. It identifies two challenges in prior CLIP-based methods—cross-granularity mismatch from coarse unified text prompts and loss of open-world generalization from fine-tuning on auxiliary data—and addresses them by shifting multimodal alignment into a unified residual space. The framework consists of three branches (text prompt-based, visual prompt-based, and residual-to-residual alignment) with all learnable optimizations constrained to the residual domain; residual alignment objectives are designed to emphasize relative anomaly deviations rather than class-specific features. Experiments on multiple datasets are reported to demonstrate effectiveness, with code released.

Significance. If the core premise holds—that operating in residual space selectively suppresses fine-grained normal variations and class biases while preserving anomaly signals—this would offer a principled way to improve generalization in few-shot anomaly detection without sacrificing CLIP’s open-vocabulary capabilities. The symmetric residual bridging and constrained optimization constitute a clean architectural contribution; public code is a positive factor for reproducibility.

major comments (2)
  1. [Abstract / Introduction] Abstract and introduction: The central claim that 'residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases' is asserted without a formal derivation, proof sketch, or controlled ablation that isolates the residual operation from prompt engineering and training constraints. This premise directly motivates the three-branch design and residual alignment objectives; its lack of isolated validation is load-bearing for the paper’s novelty claim.
  2. [Method] Method section (three-branch framework): The optimization objectives are described as forcing focus on relative deviations, yet no equation or analysis shows how the residual-to-residual loss mathematically differs from standard contrastive alignment in suppressing normal intra-class variance while retaining anomaly signals. A concrete comparison (e.g., gradient flow or feature distribution analysis) is needed.
minor comments (2)
  1. [Experiments] The abstract states that experiments demonstrate effectiveness, but the provided text contains no quantitative results, tables, or ablation studies. Adding at least one table summarizing key metrics and a dedicated ablation on the residual branch would strengthen the manuscript.
  2. [Method] Notation for residual computation (e.g., how visual and text residuals are obtained and aligned) should be introduced with explicit equations early in the method section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Introduction] Abstract and introduction: The central claim that 'residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases' is asserted without a formal derivation, proof sketch, or controlled ablation that isolates the residual operation from prompt engineering and training constraints. This premise directly motivates the three-branch design and residual alignment objectives; its lack of isolated validation is load-bearing for the paper’s novelty claim.

    Authors: We acknowledge that the original manuscript presents this property as an empirical insight motivating the design rather than a formally derived result. In the revision we will add a dedicated subsection providing a proof sketch based on the centering effect of residual computation (subtracting a class- or region-agnostic prototype) and its impact on intra-class variance in the CLIP embedding space. We will also include a controlled ablation that isolates the residual operation while freezing prompt engineering and training constraints, directly addressing the load-bearing nature of this premise for the novelty claim. revision: yes

  2. Referee: [Method] Method section (three-branch framework): The optimization objectives are described as forcing focus on relative deviations, yet no equation or analysis shows how the residual-to-residual loss mathematically differs from standard contrastive alignment in suppressing normal intra-class variance while retaining anomaly signals. A concrete comparison (e.g., gradient flow or feature distribution analysis) is needed.

    Authors: We agree that an explicit mathematical comparison would improve clarity. In the revised manuscript we will add an analysis (main text or appendix) that contrasts the residual-to-residual loss with standard contrastive alignment, including a derivation of how the loss gradients preferentially attenuate normal intra-class directions while preserving anomaly signals. We will further support this with feature-distribution visualizations (e.g., t-SNE or variance statistics) before and after residual alignment to illustrate the selective suppression effect. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents the shift to residual space and the claim that residual representations naturally eliminate fine-grained normal feature differences and class-specific biases as a motivating insight for the three-branch architecture, without any equations, derivations, or self-citations that reduce this premise or the resulting predictions back to the inputs by construction. The optimization objectives are described as independent constraints within the residual domain to focus on relative deviations, and effectiveness is validated through experiments on multiple datasets rather than tautological fits or renamed known results. No load-bearing self-citation chains, ansatzes smuggled via prior work, or uniqueness theorems imported from the authors appear in the provided text, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that residual representations eliminate fine-grained normal differences and class biases; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases
    This premise is invoked directly in the abstract as the basis for shifting alignment into residual space and resolving the two stated challenges.

pith-pipeline@v0.9.0 · 5784 in / 1135 out tokens · 40465 ms · 2026-05-20T18:52:22.638001+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    shift multimodal alignment entirely into a unified residual space, where residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    T. Aota, L. T. T. Tong, and T. Okatani. Zero-shot versus many-shot: Unsupervised texture anomaly detection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5564–5572, 2023. 7, 18, 19, 22, 25

  2. [2]

    Bergmann, M

    P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 7, 18, 19, 25

  3. [3]

    Y . Cao, J. Zhang, L. Frittoli, Y . Cheng, W. Shen, and G. Boracchi. Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection. InEuropean conference on computer vision, pages 55–72. Springer, 2024. 3

  4. [4]

    X. Chen, Y . Han, and J. Zhang. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad.arXiv preprint arXiv:2305.17382, 2023. 3, 7, 19, 20, 21, 22

  5. [5]

    Defard, A

    T. Defard, A. Setkov, A. Loesch, and R. Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInternational Conference on Pattern Recognition, pages 475–489. Springer, 2021. 2

  6. [6]

    Deng and X

    H. Deng and X. Li. Anomaly detection via reverse distillation from one-class embedding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9737–9746, 2022. 2

  7. [7]

    Z. Fang, X. Wang, H. Li, J. Liu, Q. Hu, and J. Xiao. Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17481–17490, 2023. 3

  8. [8]

    B.-B. Gao. Metauas: Universal anomaly segmentation with one-prompt meta-learning.Advances in Neural Information Processing Systems, 37:39812–39836, 2024. 3

  9. [9]

    B.-B. Gao, Y . Zhou, J. Yan, Y . Cai, W. Zhang, M. Wang, J. Liu, Y . Liu, L. Wang, and C. Wang. Adaptclip: Adapting clip for universal visual anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4095–4103, 2026. 3, 7, 19, 20, 21, 22

  10. [10]

    Z. Gu, L. Liu, X. Chen, R. Yi, J. Zhang, Y . Wang, C. Wang, A. Shu, G. Jiang, and L. Ma. Remembering normality: Memory-guided knowledge distillation for unsupervised anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16401–16409, 2023. 2

  11. [11]

    Gudovskiy, S

    D. Gudovskiy, S. Ishizaka, and K. Kozuka. Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 98–107, 2022. 2

  12. [12]

    H. Guo, L. Ren, J. Fu, Y . Wang, Z. Zhang, C. Lan, H. Wang, and X. Hou. Template-guided hierarchical feature restoration for anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6447–6458, 2023. 2

  13. [13]

    J. Guo, L. Jia, W. Zhang, H. Li, et al. Recontrast: Domain-specific anomaly detection via contrastive reconstruction.Advances in Neural Information Processing Systems, 36, 2024. 2

  14. [14]

    Huang, H

    C. Huang, H. Guan, A. Jiang, Y . Zhang, M. Spratling, and Y .-F. Wang. Registration based few-shot anomaly detection. InEuropean conference on computer vision, pages 303–319. Springer, 2022. 3

  15. [15]

    Jeong, Y

    J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer. Winclip: Zero-/few-shot anomaly classification and segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19606–19616, 2023. 1, 3, 7, 19, 20, 21, 22

  16. [16]

    Jezek, M

    S. Jezek, M. Jonak, R. Burget, P. Dvorak, and M. Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT), pages 66–71. IEEE, 2021. 7, 18, 19, 21, 25

  17. [17]

    X. Li, Z. Zhang, X. Tan, C. Chen, Y . Qu, Y . Xie, and L. Ma. Promptad: Learning prompts with only normal samples for few-shot anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16838–16848, 2024. 3

  18. [18]

    Y . Li, H. Wang, Y . Duan, and X. Li. Clip surgery for better explainability with enhancement in open- vocabulary tasks.arXiv e-prints, pages arXiv–2304, 2023. 3, 7 10

  19. [19]

    X. Liu, J. Wang, B. Leng, and S. Zhang. Unlocking the potential of reverse distillation for anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5640–5648,

  20. [20]

    W. Lv, Q. Su, and W. Xu. One-for-all few-shot anomaly detection via instance-induced prompt learning. InThe Thirteenth International Conference on Learning Representations, 2025. 3

  21. [21]

    H. Ma, G. Yang, D. Zhao, Y . Ji, and W. Zuo. Remp-ad: Retrieval-enhanced multi-modal prompt fusion for few-shot industrial visual anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20425–20434, 2025. 3, 7, 19, 20, 21, 22

  22. [22]

    W. Ma, X. Zhang, Q. Yao, F. Tang, C. Wu, Y . Li, R. Yan, Z. Jiang, and S. K. Zhou. Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4744–4754, 2025. 3

  23. [23]

    Martins and R

    A. Martins and R. Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. InInternational conference on machine learning, pages 1614–1623. PMLR, 2016. 5, 15

  24. [24]

    Mishra, R

    P. Mishra, R. Verk, D. Fornasier, C. Piciarelli, and G. L. Foresti. Vt-adl: A vision transformer network for image anomaly detection and localization. In2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), pages 01–06. IEEE, 2021. 7, 18, 19, 20, 25

  25. [25]

    Z. Qu, X. Tao, M. Prasad, F. Shen, Z. Zhang, X. Gong, and G. Ding. Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation. InEuropean Conference on Computer Vision, pages 301–317. Springer, 2024. 3

  26. [26]

    Z. Qu, X. Tao, X. Gong, S. Qu, Q. Chen, Z. Zhang, X. Wang, and G. Ding. Bayesian prompt flow learning for zero-shot anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 30398–30408, 2025. 3

  27. [27]

    Z. Qu, X. Tao, X. Gong, S. Qu, X. Zhang, X. Wang, F. Shen, Z. Zhang, M. Prasad, and G. Ding. Dictas: A framework for class-generalizable few-shot anomaly segmentation via dictionary lookup. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20519–20528, 2025. 3, 5, 15

  28. [28]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 1, 3, 7, 25

  29. [29]

    K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler. Towards total recall in indus- trial anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2022. 2

  30. [30]

    C. Xu, C. Lv, Q. Chen, F. Zhang, and Z. Zhang. Mrad: Zero-shot anomaly detection with memory-driven retrieval.arXiv preprint arXiv:2602.00522, 2026. 3

  31. [31]

    X. Yao, Z. Chen, C. Gao, G. Zhai, and C. Zhang. Resad: A simple framework for class generalizable anomaly detection.Advances in Neural Information Processing Systems, 37:125287–125311, 2024. 3

  32. [32]

    Z. You, L. Cui, Y . Shen, K. Yang, X. Lu, Y . Zheng, and X. Le. A unified model for multi-class anomaly detection.Advances in Neural Information Processing Systems, 35:4571–4584, 2022. 2

  33. [33]

    G. Zhai, Y . Zhou, X. Deng, L. Heckler, N. Navab, and B. Busam. Foundation visual encoders are secretly few-shot anomaly detectors.arXiv preprint arXiv:2510.01934, 2025. 3

  34. [34]

    Zhang, N

    X. Zhang, N. Li, J. Li, T. Dai, Y . Jiang, and S.-T. Xia. Unsupervised surface anomaly detection with diffusion probabilistic model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6782–6791, 2023. 2

  35. [35]

    Q. Zhou, G. Pang, Y . Tian, S. He, and J. Chen. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection.arXiv preprint arXiv:2310.18961, 2023. 3, 7, 14, 19, 20, 21, 22

  36. [36]

    Y . Zhou, X. Xu, J. Song, F. Shen, and H. T. Shen. Msflow: Multiscale flow-based framework for unsupervised anomaly detection.IEEE Transactions on Neural Networks and Learning Systems, 2024. 2

  37. [37]

    Zhu and G

    J. Zhu and G. Pang. Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17826–17836, 2024. 1, 3

  38. [38]

    Y . Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. InEuropean Conference on Computer Vision, pages 392–408. Springer, 2022. 7, 18, 19, 20, 25 11 A Theoretical Derivations Notation.Throughout this appendix we adopt the following conventions consistent with the main...

  39. [39]

    Sort the entries ofzin descending order:z (1) ≥z (2) ≥ · · · ≥z (M)

  40. [40]

    Identify the support sizek(z) = max m∈[M] : 1 +m z (m) >Pm j=1 z(j)

  41. [41]

    Compute the thresholdτ(z) = Pk(z) j=1 z(j) −1 /k(z)

  42. [42]

    bottle”, “hazelnut

    Output the sparse weightsSparsemax(z) i = max 0, z i −τ(z) . Entries with zi ≤τ(z) are mapped to exactly zero, while the remaining entries form a probability distribution that sums to 1. Application to reference retrieval.Applying the operator above row-wise to Sl yields a sparse weight matrix Wl ∈R N×(K·N) , where each query row contains a small number o...