pith. sign in

arxiv: 2506.20263 · v2 · pith:BUZ4BAB6new · submitted 2025-06-25 · 💻 cs.CV

Hierarchical Mask-Enhanced Dual Reconstruction Network for Few-Shot Fine-Grained Image Classification

classification 💻 cs.CV
keywords mask-enhancedmethodsreconstructionfeaturefine-grainedhierarchicalhmdrninformation
0
0 comments X
read the original abstract

Few-shot fine-grained image classification (FS-FGIC) is challenging as it requires distinguishing visually similar subclasses with extremely limited labeled examples. Existing methods suffer from critical limitations: metric-based methods lose spatial information and misalign local features, while reconstruction-based methods underuse hierarchical feature information and lack selective focus on discriminative key regions. We propose the Hierarchical Mask-enhanced Dual Reconstruction Network (HMDRN), integrating dual-layer feature reconstruction with mask-enhanced feature processing. HMDRN leverages complementary visual information from different network hierarchies via learnable weights, balancing high-level semantic representations with mid-level structural details. It incorporates a spatial binary mask-enhanced transformer module that selectively enhances discriminative regions while filtering background noise. On three fine-grained datasets, HMDRN consistently outperforms state-of-the-art methods with both Conv-4 and ResNet-12 backbones. Ablation studies validate each component's effectiveness, showing dual-layer reconstruction enhances inter-class discrimination while mask-enhanced transformation reduces intra-class variations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.