Dual Feature Decoupling for Fine-Grained OOD Detection

Qingji Guan; Xiaokun Li; Yaping Huang

arxiv: 2606.05536 · v1 · pith:BYXNOMDPnew · submitted 2026-06-04 · 💻 cs.CV

Dual Feature Decoupling for Fine-Grained OOD Detection

Xiaokun Li , Yaping Huang , Qingji Guan This is my paper

Pith reviewed 2026-06-28 03:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords fine-grained OOD detectionfeature disentanglementspatial-frequency decouplingreconstruction-guided decouplingout-of-distribution detectionimage classificationDFDNetfeature decoupling

0 comments

The pith

Dual feature decoupling separates content from style and low-level noise to improve out-of-distribution detection in fine-grained image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Dual Feature Decoupling Network to address out-of-distribution detection in fine-grained tasks where classes differ only subtly. It argues that separating discriminative content from irrelevant style and background information through two specific modules allows models to better identify unknown samples. Standard methods struggle here because they assume large distributional differences between classes. If this holds, it would enable more reliable deployment of classifiers in domains like medical imaging and vehicle recognition where fine distinctions matter and unknowns are common.

Core claim

The central claim is that the Dual Feature Decoupling Network, consisting of a spatial-frequency decoupling module and a reconstruction-guided decoupling module, achieves improved out-of-distribution detection for fine-grained image classification by preserving content features while suppressing style and low-level non-discriminative information.

What carries the argument

The Dual Feature Decoupling Network (DFDNet) with its spatial-frequency decoupling module, which preserves discriminative content and suppresses style, and reconstruction-guided decoupling module, which uses pixel-level adversarial reconstruction to remove low-level information.

If this is right

Enhances OOD detection accuracy on fine-grained datasets with high visual similarity among classes.
Reduces the impact of background factors and task-irrelevant information on detection performance.
Provides competitive improvements over existing methods across multiple evaluation datasets.
Supports better handling of subtle variations in applications requiring fine-grained classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the same decoupling strategy to other modalities such as text or audio could extend the benefits to non-image fine-grained OOD tasks.
Integrating this network with existing OOD scoring functions might yield further gains without altering the core architecture.
Testing the method on datasets with even finer granularity or more complex backgrounds would reveal its scalability limits.

Load-bearing premise

The spatial-frequency and reconstruction-guided modules achieve effective disentanglement of discriminative features from non-discriminative ones, and this disentanglement is what drives the OOD detection improvements.

What would settle it

An ablation study on fine-grained OOD benchmarks where removing either the spatial-frequency or reconstruction-guided module shows no significant drop in detection performance.

Figures

Figures reproduced from arXiv: 2606.05536 by Qingji Guan, Xiaokun Li, Yaping Huang.

**Figure 2.** Figure 2: Overview of the DFDNet. We employ ResNet as share-weight backbone for extracting features. The spatial-frequency decoupling (SFD) modules are [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the proposed Spatial-Frequency Decoupling (SFD) [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Structure of the reconstruction-guided decoupling (RGD) module. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Experiments on the split 0 of North American Birds dataset. (a) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The influence of varying thresholds τ in class activation mappings on model performance. #4) can effectively preserve content-relevant discriminative features while removing task-irrelevant features. In order to further remove the style information, we add the Fourier decoupling (model #5), which further improves the coarsegrained detection performance about 1.5%. To further enhance the discriminative ca… view at source ↗

read the original abstract

Out-of-distribution detection (OOD) is an indispensable technique when applying machine learning models to real-world scenarios. Most existing OOD detection methods have been developed under the idealized assumption of large inter-class distributional differences, while largely overlooking fine-grained tasks characterized by subtle variations, such as medical image classification and vehicle recognition. The high visual similarity among fine-grained subcategories, together with the interference of background factors, makes OOD detection extremely challenging. To tackle this problem, we propose a novel Dual Feature Decoupling Network (DFDNet), which addresses fine-grained OOD detection from the perspective of feature disentanglement. The proposed DFDNet comprises two key components: a spatial-frequency decoupling module and a reconstruction-guided decoupling module. The spatial-frequency decoupling module is designed to preserve content features that are discriminative for classification while suppressing task-irrelevant style information. On the other hand, the reconstruction-guided decoupling module introduces a novel pixel-level adversarial reconstruction task to further remove low-level, non-discriminative information and enhance category-specific high-level semantic representations. Extensive experiments demonstrate that our method achieves competitive performance improvements on multiple datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DFDNet pairs a spatial-frequency module with a reconstruction-guided one to tackle fine-grained OOD, but the abstract supplies no numbers or baselines so the gains stay unverified.

read the letter

The one or two things your colleague should know: this paper puts forward DFDNet with a spatial-frequency decoupling module that keeps classification content while dropping style, plus a reconstruction-guided module that adds adversarial pixel-level reconstruction to strip low-level non-discriminative information. The goal is better OOD performance on fine-grained tasks where classes look alike and backgrounds add noise.

What is actually new is the particular dual-module combination aimed at this setting. It extends existing disentanglement ideas rather than deriving something from first principles, but the pairing for fine-grained OOD is the concrete step.

The paper does well at naming the gap. Standard OOD work often assumes large distributional differences, which does not hold for medical images or vehicle subtypes, and the modules are described clearly enough to see the intended mechanism.

The soft spots sit in the evidence. The abstract states that experiments show competitive improvements on multiple datasets, yet it contains no quantitative results, error bars, baseline tables, or dataset names. Without those, it is impossible to judge whether the modules produce the claimed disentanglement or whether any gains trace to other training details. The weakest assumption remains that the design choices directly cause better separation of discriminative from non-discriminative features. The stress-test found no internal contradictions in the argument structure, which is positive, but that does not replace seeing the actual numbers and controls.

This paper is for computer vision people who work on OOD in fine-grained classification. A reader already thinking about feature disentanglement for robustness might pick up the architecture idea.

It deserves a serious referee to examine the full experiments, ablations, and reproducibility details. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes Dual Feature Decoupling Network (DFDNet) for fine-grained out-of-distribution (OOD) detection. It introduces a spatial-frequency decoupling module to preserve discriminative content features while suppressing task-irrelevant style information, and a reconstruction-guided decoupling module that uses a pixel-level adversarial reconstruction task to remove low-level non-discriminative information and enhance high-level semantic representations. The central claim is that this dual decoupling improves OOD detection performance under high visual similarity among fine-grained categories, with extensive experiments demonstrating competitive improvements on multiple datasets.

Significance. If the claimed disentanglement mechanism is shown to drive the gains (rather than ancillary training choices), the work could meaningfully extend OOD detection to fine-grained domains such as medical imaging and vehicle recognition, where existing methods assuming large inter-class gaps are insufficient. The introduction of explicit spatial-frequency and reconstruction-guided modules provides a concrete architectural direction that could be adopted or extended by others.

major comments (2)

[§4, Table 2] §4 (Experiments), Table 2: the reported AUROC improvements over baselines are presented without error bars or statistical significance tests across the N runs; this weakens the claim of 'competitive performance improvements' because it is impossible to assess whether the gains are robust or within variance of the baselines.
[§3.2] §3.2 (Reconstruction-guided decoupling module): the adversarial reconstruction loss is defined without an explicit analysis of how the pixel-level reconstruction objective guarantees removal of only non-discriminative low-level information rather than also discarding useful high-level cues; an ablation isolating the contribution of this loss versus standard reconstruction would be needed to support the mechanism.

minor comments (2)

[Abstract] The abstract and §1 claim 'competitive performance improvements' but do not specify the exact metrics or datasets in the opening paragraphs; moving a concise quantitative summary to the abstract would improve readability.
[§3] Notation for the two modules (SFD and RGD) is introduced in §3 but not used consistently in the experimental tables; uniform naming would reduce reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the recommendation of minor revision. The comments highlight important aspects of experimental reporting and mechanistic analysis that we will address. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [§4, Table 2] §4 (Experiments), Table 2: the reported AUROC improvements over baselines are presented without error bars or statistical significance tests across the N runs; this weakens the claim of 'competitive performance improvements' because it is impossible to assess whether the gains are robust or within variance of the baselines.

Authors: We agree that the absence of error bars and statistical tests in Table 2 limits the ability to evaluate robustness. In the revised manuscript we will report results over multiple random seeds (N=5), including mean AUROC and standard deviation for all methods, and will add paired t-tests or Wilcoxon tests with p-values to assess significance of the observed improvements. revision: yes
Referee: [§3.2] §3.2 (Reconstruction-guided decoupling module): the adversarial reconstruction loss is defined without an explicit analysis of how the pixel-level reconstruction objective guarantees removal of only non-discriminative low-level information rather than also discarding useful high-level cues; an ablation isolating the contribution of this loss versus standard reconstruction would be needed to support the mechanism.

Authors: We acknowledge that the current text provides limited mechanistic justification for why the adversarial reconstruction selectively removes low-level non-discriminative information. In the revision we will expand §3.2 with a theoretical discussion of the adversarial objective's effect on feature scales, supported by additional feature-map visualizations. We will also add an ablation study that directly compares the proposed adversarial reconstruction loss against a standard (non-adversarial) reconstruction loss to isolate its contribution to OOD performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes DFDNet with two novel modules (spatial-frequency decoupling and reconstruction-guided decoupling) for feature disentanglement in fine-grained OOD detection. Claims rest on architectural design choices and empirical results across datasets rather than any mathematical derivation chain. No equations reduce outputs to fitted parameters by construction, no self-citations serve as load-bearing uniqueness theorems, and no ansatzes or renamings of prior results are invoked to force the central claims. The argument is self-contained in its empirical validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim rests on the unverified effectiveness of the two invented modules and on the assumption that standard OOD benchmarks capture the fine-grained regime. No free parameters, axioms, or invented physical entities are stated in the abstract; the modules themselves function as new algorithmic components whose independent evidence is limited to the summarized experiments.

invented entities (2)

spatial-frequency decoupling module no independent evidence
purpose: preserve content features while suppressing style information
New component introduced to address fine-grained OOD; no independent evidence outside the paper's experiments.
reconstruction-guided decoupling module no independent evidence
purpose: remove low-level non-discriminative information via pixel-level adversarial reconstruction
New component introduced to enhance high-level semantic representations; no independent evidence outside the paper's experiments.

pith-pipeline@v0.9.1-grok · 5721 in / 1471 out tokens · 53867 ms · 2026-06-28T03:05:28.686625+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 10 canonical work pages · 5 internal anchors

[1]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034

2015
[2]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations, 2017. [Online]. Available: https://openreview.net/forum?id=Hkg4TI9xl

2017
[3]

Mood: Multi-level out-of-distribution detection,

Z. Lin, S. D. Roy, and Y . Li, “Mood: Multi-level out-of-distribution detection,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021, pp. 15 313–15 323

2021
[4]

Mitigating neural network overconfidence with logit normalization,

H. Wei, R. Xie, H. Cheng, L. Feng, B. An, and Y . Li, “Mitigating neural network overconfidence with logit normalization,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 23 631–23 644

2022
[5]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” Advances in neural information processing systems, vol. 31, 2018

2018
[6]

Uncertainty estima- tion using a single deep deterministic neural network,

J. Van Amersfoort, L. Smith, Y . W. Teh, and Y . Gal, “Uncertainty estima- tion using a single deep deterministic neural network,” in International conference on machine learning. PMLR, 2020, pp. 9690–9700

2020
[7]

Detecting out-of-distribution examples with gram matrices,

C. S. Sastry and S. Oore, “Detecting out-of-distribution examples with gram matrices,” in International Conference on Machine Learning. PMLR, 2020, pp. 8491–8501

2020
[8]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009

2009
[9]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

F. Yu, A. Seff, Y . Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[10]

arXiv preprint arXiv:1706.02690 (2017)

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017

work page arXiv 2017
[11]

Likelihood regret: An out-of-distribution detection score for variational auto-encoder,

Z. Xiao, Q. Yan, and Y . Amit, “Likelihood regret: An out-of-distribution detection score for variational auto-encoder,” Advances in neural information processing systems, vol. 33, pp. 20 685–20 696, 2020

2020
[12]

Out-of-distribution detection using union of 1- dimensional subspaces,

A. Zaeemzadeh, N. Bisagno, Z. Sambugaro, N. Conci, N. Rah- navard, and M. Shah, “Out-of-distribution detection using union of 1- dimensional subspaces,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021, pp. 9452–9461

2021
[13]

arXiv preprint arXiv:2106.09022 , year=

J. Ren, S. Fort, J. Liu, A. G. Roy, S. Padhy, and B. Lakshminarayanan, “A simple fix to mahalanobis distance for improving near-ood detection,” arXiv preprint arXiv:2106.09022, 2021

work page arXiv 2021
[14]

Vim: Out-of-distribution with virtual-logit matching,

H. Wang, Z. Li, L. Feng, and W. Zhang, “Vim: Out-of-distribution with virtual-logit matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4921–4930

2022
[15]

Leveraging perturbation robustness to enhance out-of-distribution detection,

W. Chen, R. A. Yeh, S. Mou, and Y . Gu, “Leveraging perturbation robustness to enhance out-of-distribution detection,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4724–4733

2025
[16]

Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments,

J. Zhang, N. Inkawhich, R. Linderman, Y . Chen, and H. Li, “Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5531–5540

2023
[17]

Energy-based out-of-distribution detection,

W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,” Advances in neural information processing systems, vol. 33, pp. 21 464–21 475, 2020

2020
[18]

Decoupling maxlogit for out-of-distribution detection,

Z. Zhang and X. Xiang, “Decoupling maxlogit for out-of-distribution detection,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 3388–3397

2023
[19]

Sle: Out-of-distribution detection with shallow layer-driven enhancement,

Z. Yang, C. Liu, and X. Qian, “Sle: Out-of-distribution detection with shallow layer-driven enhancement,” IEEE Transactions on Multimedia, 2025

2025
[20]

Multimodal classification and out-of-distribution detection for multimodal intent understanding,

H. Zhang, Q. Zhou, H. Xu, J. Su, R. Evans, and K. Gao, “Multimodal classification and out-of-distribution detection for multimodal intent understanding,” IEEE Transactions on Multimedia, 2025

2025
[21]

Class incremental learning for image classification with out-of-distribution task identification,

X. Cao, H. Lu, X. Liu, and M.-M. Cheng, “Class incremental learning for image classification with out-of-distribution task identification,” IEEE Transactions on Multimedia, 2025

2025
[22]

Learning confidence for out-of- distribution detection in neural networks,

T. DeVries and G. W. Taylor, “Learning confidence for out-of- distribution detection in neural networks,” 2018

2018
[23]

Musia: Exploiting multi-source information fusion with abnormal activations for out-of-distribution detection,

H. yang Lu, X. Guo, W. Jiang, C. Fan, Y . Du, Z. Shao, W. Fang, and X. Wu, “Musia: Exploiting multi-source information fusion with abnormal activations for out-of-distribution detection,” Neural Networks, vol. 188, p. 107427, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608025003065

2025
[24]

Augmix: A simple data processing method to improve robustness and uncertainty.arXiv preprint arXiv:1912.02781, 2019

D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lak- shminarayanan, “Augmix: A simple data processing method to improve robustness and uncertainty,” arXiv preprint arXiv:1912.02781, 2019

work page arXiv 1912
[25]

Certifiably adversarially robust detection of out-of-distribution data,

J. Bitterwolf, A. Meinke, and M. Hein, “Certifiably adversarially robust detection of out-of-distribution data,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 085–16 095, 2020

2020
[26]

On mixup training: Improved calibration and predictive uncertainty for deep neural networks,

S. Thulasidasan, G. Chennupati, J. A. Bilmes, T. Bhattacharya, and S. Michalak, “On mixup training: Improved calibration and predictive uncertainty for deep neural networks,” Advances in Neural Information Processing Systems, vol. 32, 2019

2019
[27]

Deep Anomaly Detection with Outlier Exposure

D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” arXiv preprint arXiv:1812.04606, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Generalized outlier exposure: Towards a trustworthy out-of-distribution detector without sacrificing accuracy,

J. Koo, S. Choi, and S. Hwang, “Generalized outlier exposure: Towards a trustworthy out-of-distribution detector without sacrificing accuracy,” Neurocomputing, vol. 577, p. 127371, 2024

2024
[29]

Atom: Robustifying out- of-distribution detection using outlier mining,

J. Chen, Y . Li, X. Wu, Y . Liang, and S. Jha, “Atom: Robustifying out- of-distribution detection using outlier mining,” 2021

2021
[30]

Domain agnostic learn- ing with disentangled representations,

X. Peng, Z. Huang, X. Sun, and K. Saenko, “Domain agnostic learn- ing with disentangled representations,” in International Conference on Machine Learning. PMLR, 2019, pp. 5102–5112

2019
[31]

Disentangled representation for age- invariant face recognition: A mutual information minimization perspec- tive,

X. Hou, Y . Li, and S. Wang, “Disentangled representation for age- invariant face recognition: A mutual information minimization perspec- tive,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3692–3701

2021
[32]

Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization,

S. Lee, J. Bae, and H. Y . Kim, “Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 776–11 785

2023
[33]

Crada: Cross domain object detection with cyclic reconstruction and decoupling adaptation,

Y . Liu, J. Wang, W. Wang, Y . Hu, Y . Wang, and Y . Xu, “Crada: Cross domain object detection with cyclic reconstruction and decoupling adaptation,” IEEE Transactions on Multimedia, vol. 26, pp. 6250–6261, 2024

2024
[34]

Multi-layer decoupling attention network for weakly supervised object localization,

A. Zhang, Z. Ling, and Y . Wang, “Multi-layer decoupling attention network for weakly supervised object localization,” IEEE Transactions on Multimedia, vol. 26, pp. 4469–4479, 2023

2023
[35]

Batch normalization: Accelerating deep network training by reducing internal covariate shift,

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. pmlr, 2015, pp. 448–456

2015
[36]

Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6924–6932

2017
[37]

Learning deep features for discriminative localization,

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921– 2929

2016
[38]

Fine-Grained Visual Classification of Aircraft

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine- grained visual classification of aircraft,”arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[39]

3d object representations for fine-grained categorization,

J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” inProceedings of the IEEE international conference on computer vision workshops, 2013, pp. 554–561

2013
[40]

Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding,

T. Chen, W. Wu, Y . Gao, L. Dong, X. Luo, and L. Lin, “Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding,” in Proceedings of the 26th ACM International JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9 Conference on Multimedia, ser. MM ’18. New York, NY , USA: Association for Computing Machinery, ...

work page doi:10.1145/3240508.3240523 2021
[41]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,

G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie, “Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 595–604

2015
[42]

WebVision Database: Visual Learning and Understanding from Web Data

W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool, “Webvision database: Visual learning and understanding from web data,” arXiv preprint arXiv:1708.02862, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Detecting semantic anomalies,

F. Ahmed and A. Courville, “Detecting semantic anomalies,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3154–3162

2020
[44]

Scaling for training time and post-hoc out-of-distribution detection enhancement,

K. Xu, R. Chen, G. Franchi, and A. Yao, “Scaling for training time and post-hoc out-of-distribution detection enhancement,” arXiv preprint arXiv:2310.00227, 2023

work page arXiv 2023
[45]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034

2015

[2] [2]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations, 2017. [Online]. Available: https://openreview.net/forum?id=Hkg4TI9xl

2017

[3] [3]

Mood: Multi-level out-of-distribution detection,

Z. Lin, S. D. Roy, and Y . Li, “Mood: Multi-level out-of-distribution detection,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021, pp. 15 313–15 323

2021

[4] [4]

Mitigating neural network overconfidence with logit normalization,

H. Wei, R. Xie, H. Cheng, L. Feng, B. An, and Y . Li, “Mitigating neural network overconfidence with logit normalization,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 23 631–23 644

2022

[5] [5]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” Advances in neural information processing systems, vol. 31, 2018

2018

[6] [6]

Uncertainty estima- tion using a single deep deterministic neural network,

J. Van Amersfoort, L. Smith, Y . W. Teh, and Y . Gal, “Uncertainty estima- tion using a single deep deterministic neural network,” in International conference on machine learning. PMLR, 2020, pp. 9690–9700

2020

[7] [7]

Detecting out-of-distribution examples with gram matrices,

C. S. Sastry and S. Oore, “Detecting out-of-distribution examples with gram matrices,” in International Conference on Machine Learning. PMLR, 2020, pp. 8491–8501

2020

[8] [8]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009

2009

[9] [9]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

F. Yu, A. Seff, Y . Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[10] [10]

arXiv preprint arXiv:1706.02690 (2017)

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017

work page arXiv 2017

[11] [11]

Likelihood regret: An out-of-distribution detection score for variational auto-encoder,

Z. Xiao, Q. Yan, and Y . Amit, “Likelihood regret: An out-of-distribution detection score for variational auto-encoder,” Advances in neural information processing systems, vol. 33, pp. 20 685–20 696, 2020

2020

[12] [12]

Out-of-distribution detection using union of 1- dimensional subspaces,

A. Zaeemzadeh, N. Bisagno, Z. Sambugaro, N. Conci, N. Rah- navard, and M. Shah, “Out-of-distribution detection using union of 1- dimensional subspaces,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021, pp. 9452–9461

2021

[13] [13]

arXiv preprint arXiv:2106.09022 , year=

J. Ren, S. Fort, J. Liu, A. G. Roy, S. Padhy, and B. Lakshminarayanan, “A simple fix to mahalanobis distance for improving near-ood detection,” arXiv preprint arXiv:2106.09022, 2021

work page arXiv 2021

[14] [14]

Vim: Out-of-distribution with virtual-logit matching,

H. Wang, Z. Li, L. Feng, and W. Zhang, “Vim: Out-of-distribution with virtual-logit matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4921–4930

2022

[15] [15]

Leveraging perturbation robustness to enhance out-of-distribution detection,

W. Chen, R. A. Yeh, S. Mou, and Y . Gu, “Leveraging perturbation robustness to enhance out-of-distribution detection,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4724–4733

2025

[16] [16]

Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments,

J. Zhang, N. Inkawhich, R. Linderman, Y . Chen, and H. Li, “Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5531–5540

2023

[17] [17]

Energy-based out-of-distribution detection,

W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,” Advances in neural information processing systems, vol. 33, pp. 21 464–21 475, 2020

2020

[18] [18]

Decoupling maxlogit for out-of-distribution detection,

Z. Zhang and X. Xiang, “Decoupling maxlogit for out-of-distribution detection,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 3388–3397

2023

[19] [19]

Sle: Out-of-distribution detection with shallow layer-driven enhancement,

Z. Yang, C. Liu, and X. Qian, “Sle: Out-of-distribution detection with shallow layer-driven enhancement,” IEEE Transactions on Multimedia, 2025

2025

[20] [20]

Multimodal classification and out-of-distribution detection for multimodal intent understanding,

H. Zhang, Q. Zhou, H. Xu, J. Su, R. Evans, and K. Gao, “Multimodal classification and out-of-distribution detection for multimodal intent understanding,” IEEE Transactions on Multimedia, 2025

2025

[21] [21]

Class incremental learning for image classification with out-of-distribution task identification,

X. Cao, H. Lu, X. Liu, and M.-M. Cheng, “Class incremental learning for image classification with out-of-distribution task identification,” IEEE Transactions on Multimedia, 2025

2025

[22] [22]

Learning confidence for out-of- distribution detection in neural networks,

T. DeVries and G. W. Taylor, “Learning confidence for out-of- distribution detection in neural networks,” 2018

2018

[23] [23]

Musia: Exploiting multi-source information fusion with abnormal activations for out-of-distribution detection,

H. yang Lu, X. Guo, W. Jiang, C. Fan, Y . Du, Z. Shao, W. Fang, and X. Wu, “Musia: Exploiting multi-source information fusion with abnormal activations for out-of-distribution detection,” Neural Networks, vol. 188, p. 107427, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608025003065

2025

[24] [24]

Augmix: A simple data processing method to improve robustness and uncertainty.arXiv preprint arXiv:1912.02781, 2019

D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lak- shminarayanan, “Augmix: A simple data processing method to improve robustness and uncertainty,” arXiv preprint arXiv:1912.02781, 2019

work page arXiv 1912

[25] [25]

Certifiably adversarially robust detection of out-of-distribution data,

J. Bitterwolf, A. Meinke, and M. Hein, “Certifiably adversarially robust detection of out-of-distribution data,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 085–16 095, 2020

2020

[26] [26]

On mixup training: Improved calibration and predictive uncertainty for deep neural networks,

S. Thulasidasan, G. Chennupati, J. A. Bilmes, T. Bhattacharya, and S. Michalak, “On mixup training: Improved calibration and predictive uncertainty for deep neural networks,” Advances in Neural Information Processing Systems, vol. 32, 2019

2019

[27] [27]

Deep Anomaly Detection with Outlier Exposure

D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” arXiv preprint arXiv:1812.04606, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Generalized outlier exposure: Towards a trustworthy out-of-distribution detector without sacrificing accuracy,

J. Koo, S. Choi, and S. Hwang, “Generalized outlier exposure: Towards a trustworthy out-of-distribution detector without sacrificing accuracy,” Neurocomputing, vol. 577, p. 127371, 2024

2024

[29] [29]

Atom: Robustifying out- of-distribution detection using outlier mining,

J. Chen, Y . Li, X. Wu, Y . Liang, and S. Jha, “Atom: Robustifying out- of-distribution detection using outlier mining,” 2021

2021

[30] [30]

Domain agnostic learn- ing with disentangled representations,

X. Peng, Z. Huang, X. Sun, and K. Saenko, “Domain agnostic learn- ing with disentangled representations,” in International Conference on Machine Learning. PMLR, 2019, pp. 5102–5112

2019

[31] [31]

Disentangled representation for age- invariant face recognition: A mutual information minimization perspec- tive,

X. Hou, Y . Li, and S. Wang, “Disentangled representation for age- invariant face recognition: A mutual information minimization perspec- tive,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3692–3701

2021

[32] [32]

Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization,

S. Lee, J. Bae, and H. Y . Kim, “Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 776–11 785

2023

[33] [33]

Crada: Cross domain object detection with cyclic reconstruction and decoupling adaptation,

Y . Liu, J. Wang, W. Wang, Y . Hu, Y . Wang, and Y . Xu, “Crada: Cross domain object detection with cyclic reconstruction and decoupling adaptation,” IEEE Transactions on Multimedia, vol. 26, pp. 6250–6261, 2024

2024

[34] [34]

Multi-layer decoupling attention network for weakly supervised object localization,

A. Zhang, Z. Ling, and Y . Wang, “Multi-layer decoupling attention network for weakly supervised object localization,” IEEE Transactions on Multimedia, vol. 26, pp. 4469–4479, 2023

2023

[35] [35]

Batch normalization: Accelerating deep network training by reducing internal covariate shift,

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. pmlr, 2015, pp. 448–456

2015

[36] [36]

Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6924–6932

2017

[37] [37]

Learning deep features for discriminative localization,

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921– 2929

2016

[38] [38]

Fine-Grained Visual Classification of Aircraft

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine- grained visual classification of aircraft,”arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[39] [39]

3d object representations for fine-grained categorization,

J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” inProceedings of the IEEE international conference on computer vision workshops, 2013, pp. 554–561

2013

[40] [40]

Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding,

T. Chen, W. Wu, Y . Gao, L. Dong, X. Luo, and L. Lin, “Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding,” in Proceedings of the 26th ACM International JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9 Conference on Multimedia, ser. MM ’18. New York, NY , USA: Association for Computing Machinery, ...

work page doi:10.1145/3240508.3240523 2021

[41] [41]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,

G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie, “Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 595–604

2015

[42] [42]

WebVision Database: Visual Learning and Understanding from Web Data

W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool, “Webvision database: Visual learning and understanding from web data,” arXiv preprint arXiv:1708.02862, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Detecting semantic anomalies,

F. Ahmed and A. Courville, “Detecting semantic anomalies,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3154–3162

2020

[44] [44]

Scaling for training time and post-hoc out-of-distribution detection enhancement,

K. Xu, R. Chen, G. Franchi, and A. Yao, “Scaling for training time and post-hoc out-of-distribution detection enhancement,” arXiv preprint arXiv:2310.00227, 2023

work page arXiv 2023

[45] [45]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016