Geometry-Guided Self-Supervision for Ultra-Fine-Grained Recognition with Limited Data

Haojie Li; Mahsa Baktashmotlagh; Shijie Wang; Yadan Luo; Zi Huang; Zijian Wang

arxiv: 2604.19345 · v1 · submitted 2026-04-21 · 💻 cs.CV

Geometry-Guided Self-Supervision for Ultra-Fine-Grained Recognition with Limited Data

Shijie Wang , Yadan Luo , Zijian Wang , Haojie Li , Zi Huang , Mahsa Baktashmotlagh This is my paper

Pith reviewed 2026-05-10 02:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords ultra-fine-grained visual categorizationself-supervised learninggeometric attributeslimited datafine-grained recognitiongeometric descriptorspolar coordinates

0 comments

The pith

GAEor extracts category-specific geometric attributes to improve ultra-fine-grained recognition when labeled data is scarce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Geometric Attribute Exploration Network (GAEor) as a self-supervised framework for ultra-fine-grained visual categorization in limited-data regimes. It generates geometric attributes by first amplifying geometry-relevant details through visual feedback from a backbone network, then embedding the relative polar coordinates of those details into the learned representation. The approach rests on the observation that each category possesses distinct geometric descriptors, such as vein patterns in leaves, which remain useful even when visual differences are minimal. A sympathetic reader would care because this supplies recognition cues beyond the subtle appearance variations that most prior methods target. Experiments establish that GAEor sets new state-of-the-art results on five standard Ultra-FGVC benchmarks.

Core claim

GAEor is a general self-supervised framework that generates geometric attributes as novel recognition cues for ultra-fine-grained visual categorization in data-limited scenarios. These attributes are determined by details aligned with an object's geometric patterns. The network discovers them by amplifying geometry-relevant details via visual feedback from a backbone network and then embedding the relative polar coordinates of these details into the final representation.

What carries the argument

The Geometric Attribute Exploration Network (GAEor), which amplifies geometry-relevant details through backbone feedback and embeds relative polar coordinates to capture distinct per-category geometric descriptors.

Load-bearing premise

Each ultra-fine-grained category possesses distinct geometric descriptors that can be reliably amplified from limited data without overfitting.

What would settle it

A controlled test showing that randomizing or removing geometric details causes GAEor to lose its reported gains on the five Ultra-FGVC benchmarks would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.19345 by Haojie Li, Mahsa Baktashmotlagh, Shijie Wang, Yadan Luo, Zi Huang, Zijian Wang.

**Figure 2.** Figure 2: Detailed illustration of Geometric Attribution Exploration network. Our framework is composed of three key [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An overall of the process of GAE. geometric associations between the learned embedding of amplified details, thereby obtaining geometric attributes. Self-supervised signal generation. Given the transformed image ID, we input it into the shared backbone network (previously used in the classification branch) to extract amplified detail representation T ∈ R 𝐶×𝐻 ×𝑊 . Each spatial vector in T encodes semantic … view at source ↗

**Figure 4.** Figure 4: Analyses of hyper-parameters 𝛼, 𝛽 and 𝛾 in Eq. 14. The results denote Top-1 Accuracy on Cotton80. for objects within the same category. This instability significantly degrades performance in real-world scenarios with diverse orientations. Conversely, minimizing the standard deviation of angular differences imposes statistical stability, forcing the network to capture orientation-agnostic patterns rather … view at source ↗

**Figure 6.** Figure 6: Visualization of the impact of geometric attribute [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

This paper investigates the intrinsic geometrical features of highly similar objects and introduces a general self-supervised framework called the Geometric Attribute Exploration Network (GAEor), which is designed to address the ultra-fine-grained visual categorization (Ultra-FGVC) task in data-limited scenarios. Unlike prior work that often captures subtle yet critical distinctions, GAEor generates geometric attributes as novel alternative recognition cues. These attributes are determined by various details within the object, aligned with its geometric patterns, such as the intricate vein structures in soybean leaves. Crucially, each category exhibits distinct geometric descriptors that serve as powerful cues, even among objects with minimal visual variation -- a factor largely overlooked in recent research. GAEor discovers these geometric attributes by first amplifying geometry-relevant details via visual feedback from a backbone network, then embedding the relative polar coordinates of these details into the final representation. Extensive experiments demonstrate that GAEor significantly sets new state-of-the-art records in five widely-used Ultra-FGVC benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAEor offers a geometry-focused self-supervision pipeline for ultra-fine-grained tasks in low data, but the SOTA claims rest on experiments that aren't shown here and could be sensitive to overfitting in the feedback loop.

read the letter

The main takeaway is that this paper puts forward a self-supervised network called GAEor that amplifies geometry-relevant details through backbone visual feedback and then folds in relative polar coordinates as extra cues for distinguishing highly similar objects. It targets ultra-fine-grained visual categorization when labeled examples are scarce, such as telling apart soybean leaf types by vein structure rather than overall appearance. That framing is distinct from most prior fine-grained work, which tends to chase texture or local appearance differences instead of treating geometry as the primary signal. The authors make a reasonable case that each category carries its own geometric descriptors that remain useful even when visual variation is tiny, and the method tries to discover those without heavy supervision. This is a practical angle for applied domains like agriculture or biology imaging where data collection is expensive. The approach itself looks conceptually clean on the surface, with no obvious circular definitions or invented entities that collapse under scrutiny. What is actually new is the specific two-step process of feedback-driven amplification followed by polar embedding, which isn't directly covered in the referenced self-supervised or fine-grained literature. The soft spot is the experimental side. The abstract asserts new state-of-the-art results across five benchmarks, yet supplies no numbers, no baseline comparisons, no ablations on the feedback step, and no error analysis. Without those, it's impossible to tell whether the loop is reliably surfacing generalizable geometric patterns or simply reinforcing dataset-specific artifacts, especially in the limited-data regime where initial backbone features start weak. The stress-test concern about overfitting risk holds up based on what's described; nothing in the abstract indicates regularization or cross-checks that would guard against it. This paper is aimed at computer vision researchers who work on fine-grained recognition or self-supervised methods for constrained data. A reader in that area would get value from the problem framing and the proposed pipeline, even if they have to wait for the full results to judge practicality. It deserves peer review so the experiments can be checked properly.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Geometric Attribute Exploration Network (GAEor), a self-supervised framework for ultra-fine-grained visual categorization (Ultra-FGVC) in limited-data regimes. It claims that each category possesses distinct geometric descriptors (e.g., leaf veins) that can be discovered by amplifying geometry-relevant details through visual feedback from a backbone network, followed by embedding relative polar coordinates of these details into the final representation. Extensive experiments are said to establish new state-of-the-art results across five widely-used Ultra-FGVC benchmarks.

Significance. If the central empirical claims hold after proper validation, the work offers a novel self-supervised route to exploit overlooked geometric patterns in data-scarce fine-grained tasks, with potential utility in domains such as botany or medical imaging. The approach is distinguished by its explicit use of polar-coordinate embedding and backbone-guided amplification rather than purely appearance-based cues.

major comments (2)

[Method section (GAEor architecture and feedback loop)] The method description of the backbone-feedback amplification loop (the step that discovers category-specific geometric descriptors) provides no explicit regularization, cross-validation protocol, or ablation that isolates whether the loop extracts generalizable geometry versus training-set artifacts. In limited-data Ultra-FGVC regimes this is load-bearing for the SOTA claim, as weak initial backbone features make reinforcement of noise plausible.
[Experiments section] The experimental section asserts new state-of-the-art records on five benchmarks yet supplies no quantitative tables, baseline comparisons, ablation results on the polar-embedding component, or error analysis in the provided manuscript text. Without these, it is impossible to verify that performance gains derive from the claimed geometric cues rather than post-hoc selection or implementation details.

minor comments (1)

[Abstract] The abstract introduces the acronym GAEor before its full expansion; moving the parenthetical definition to first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Method section (GAEor architecture and feedback loop)] The method description of the backbone-feedback amplification loop (the step that discovers category-specific geometric descriptors) provides no explicit regularization, cross-validation protocol, or ablation that isolates whether the loop extracts generalizable geometry versus training-set artifacts. In limited-data Ultra-FGVC regimes this is load-bearing for the SOTA claim, as weak initial backbone features make reinforcement of noise plausible.

Authors: We acknowledge the validity of this concern. The current manuscript describes the amplification loop as relying on iterative visual feedback from the backbone to emphasize geometry-relevant details, but it does not provide explicit regularization, a cross-validation protocol for the discovery process, or an ablation isolating the loop's contribution. To address this, we will revise the method section to include: (i) regularization terms in the self-supervised objective to mitigate artifact reinforcement, (ii) a cross-validation protocol applied to the attribute discovery across data splits, and (iii) an ablation study comparing the full model against a variant without the feedback loop. These additions will help demonstrate that the extracted descriptors are generalizable rather than training-set specific. revision: yes
Referee: [Experiments section] The experimental section asserts new state-of-the-art records on five benchmarks yet supplies no quantitative tables, baseline comparisons, ablation results on the polar-embedding component, or error analysis in the provided manuscript text. Without these, it is impossible to verify that performance gains derive from the claimed geometric cues rather than post-hoc selection or implementation details.

Authors: We agree that the experimental presentation requires strengthening for full verifiability. While the manuscript reports new state-of-the-art results on the five Ultra-FGVC benchmarks, we will revise the experiments section to include: detailed quantitative tables with all baseline comparisons, a dedicated ablation study on the polar-embedding component, and error analysis (including per-category breakdowns and failure cases). These changes will make it possible to directly attribute performance gains to the geometric cues. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; claims rest on empirical benchmarks

full rationale

The paper describes a self-supervised GAEor framework that amplifies geometry-relevant details via backbone feedback and embeds relative polar coordinates to generate category-specific geometric attributes for Ultra-FGVC. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains are present in the abstract or described method that reduce any result to its inputs by construction. The premise that categories exhibit distinct geometric descriptors is an empirical assumption validated externally on five benchmarks rather than a self-referential definition or tautology. The approach is self-contained against external data and does not invoke load-bearing self-citations or uniqueness theorems from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that geometric patterns yield distinct per-category descriptors even when visual appearance is nearly identical; this is not independently verified in the provided abstract and functions as an untested premise for the method's effectiveness.

axioms (1)

domain assumption Geometric patterns such as vein structures provide distinct and powerful recognition cues for each category in ultra-fine-grained objects.
Invoked to justify why geometric attributes outperform standard visual features in minimal-variation cases.

invented entities (1)

Geometric Attribute Exploration Network (GAEor) no independent evidence
purpose: To discover and embed geometric attributes as alternative recognition cues.
New proposed architecture whose effectiveness depends on the domain assumption above.

pith-pipeline@v0.9.0 · 5490 in / 1254 out tokens · 33447 ms · 2026-05-10T02:44:54.376409+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Chen Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. 2022. SASA: Semantics- Augmented Set Abstraction for Point-Based 3D Object Detection. InThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Sym- posium on Educational Advances in Artifici...

work page 2022
[2]

Qiupu Chen, Lin Jiao, Fenmei Wang, Jianming Du, Haiyun Liu, Xue Wang, and Rujing Wang. 2024. Integrating foreground-background feature distillation and contrastive feature learning for ultra-fine-grained visual classification.Pattern Recognit.150 (2024), 110339

work page 2024
[3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597–1607

work page 2020
[4]

Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and Con- struction Learning for Fine-Grained Image Recognition. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 5157–5166

work page 2019
[5]

Junsuk Choe, Seungho Lee, and Hyunjung Shim. 2021. Attention-Based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmenta- tion.IEEE Trans. Pattern Anal. Mach. Intell.43, 12 (2021), 4256–4271

work page 2021
[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 248–255

work page 2009
[7]

Joachim Denzler and Heinrich Niemann. 1999. Active Rays: Polar-transformed Active Contours for Real-Time Contour Tracking.Real Time Imaging5, 3 (1999), 203–213

work page 1999
[8]

Pablo Diego-Simón, Stéphane d’Ascoli, Emmanuel Chemla, Yair Lakretz, and Jean-Remi King. 2024. A Polar coordinate system represents syntax in large language models. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globe...

work page 2024
[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In9th Interna- tional Conference on Learning Representations, IC...

work page 2021
[10]

Ziye Fang, Xin Jiang, Hao Tang, and Zechao Li. 2024. Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples.IEEE Trans. Circuits Syst. Video Technol.34, 8 (2024), 7135–7148

work page 2024
[11]

Ju He, Jieneng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, and Changhu Wang. 2022. TransFG: A Transformer Architecture for Fine-Grained Recognition. InThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educat...

work page 2022
[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InCVPR 2016, Las Vegas, NV, USA, June 27-30,

work page 2016
[13]

Rong-Xiang Hu, Wei Jia, Haibin Ling, and Deshuang Huang. 2012. Multiscale Distance Matrix for Fast Plant Leaf Recognition.IEEE Trans. Image Process.21, 11 (2012), 4667–4672

work page 2012
[14]

Jianing Li, Yaowei Wang, and Shiliang Zhang. 2023. PolarPose: Single-Stage Multi-Person Pose Estimation in Polar Coordinates.IEEE Trans. Image Process. 32 (2023), 1108–1119

work page 2023
[15]

Anastasiu

Yanhong Li, Jack Xu, and David C. Anastasiu. 2024. Learning from Polar Repre- sentation: An Extreme-Adaptive Model for Long-Term Time Series Forecasting. InThirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty- Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advan...

work page doi:10.1609/aaai.v38i1 2024
[16]

Haibin Ling and David W. Jacobs. 2007. Shape Classification Using the Inner- Distance.IEEE Trans. Pattern Anal. Mach. Intell.29, 2 (2007), 286–299

work page 2007
[17]

Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, and Nan Pu. 2024. Novel Class Discovery for Ultra-Fine-Grained Visual Categorization. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. IEEE, 17679–17688

work page 2024
[18]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 9992–10002

work page 2021
[19]

Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, and Li Zhang. 2023. PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 3778–3790

work page 2023
[20]

Zicheng Pan, Xiaohan Yu, Miaohua Zhang, and Yongsheng Gao. 2023. SSFE-Net: Self-Supervised Feature Enhancement for Ultra-Fine-Grained Few-Shot Class Incremental Learning. InIEEE/CVF Winter Conference on Applications of Computer Vision, W ACV 2023, Waikoloa, HI, USA, January 2-7, 2023. IEEE, 6264–6273

work page 2023
[21]

Edwin Arkel Rios, Femiloye Oyerinde, Min-Chun Tien, and Bo-Cheng Lai. 2024. Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition.CoRRabs/2409.11051 (2024)

work page arXiv 2024
[22]

Hongbo Sun, Xiangteng He, Jinglin Xu, and Yuxin Peng. 2024. SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization.IEEE Trans. Image Process.33 (2024), 5312–5326

work page 2024
[23]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Mar...

work page 2021
[24]

Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir

Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022. CLIPasso: semantically-aware object sketching.ACM Trans. Graph.41, 4 (2022), 86:1–86:11

work page 2022
[25]

Bin Wang and Yongsheng Gao. 2014. Hierarchical String Cuts: A Translation, Rotation, Scale, and Mirror Invariant Descriptor for Fast Shape Retrieval.IEEE Trans. Image Process.23, 9 (2014), 4101–4111

work page 2014
[26]

Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, and Ping Luo. 2022. Polar- Mask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.IEEE Trans. Pattern Anal. Mach. Intell.44, 9 (2022), 5385–5400

work page 2022
[27]

Xiaohan Yu, Jun Wang, and Yongsheng Gao. 2023. CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization. InProceed- ings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 4531–4539

work page 2023
[28]

Xiaohan Yu, Jun Wang, Yang Zhao, and Yongsheng Gao. 2023. Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization.Pattern Recognit.135 (2023), 109131

work page 2023
[29]

Xiaohan Yu, Yang Zhao, and Yongsheng Gao. 2022. SPARE: Self-supervised part erasing for ultra-fine-grained visual categorization.Pattern Recognit.128 (2022), 108691

work page 2022
[30]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, and Shengwu Xiong. 2021. MaskCOV: A random mask covariance network for ultra-fine-grained visual categorization. Pattern Recognit.119 (2021), 108067

work page 2021
[31]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, and Xiaohui Yuan. 2020. Patchy Image Structure Classification Using Multi-Orientation Region Transform. InThe Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educa...

work page 2020
[32]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, Xiaohui Yuan, and Shengwu Xiong

work page
[33]

In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021

Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 10265–10275

work page 2021

[1] [1]

Chen Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. 2022. SASA: Semantics- Augmented Set Abstraction for Point-Based 3D Object Detection. InThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Sym- posium on Educational Advances in Artifici...

work page 2022

[2] [2]

Qiupu Chen, Lin Jiao, Fenmei Wang, Jianming Du, Haiyun Liu, Xue Wang, and Rujing Wang. 2024. Integrating foreground-background feature distillation and contrastive feature learning for ultra-fine-grained visual classification.Pattern Recognit.150 (2024), 110339

work page 2024

[3] [3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597–1607

work page 2020

[4] [4]

Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and Con- struction Learning for Fine-Grained Image Recognition. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 5157–5166

work page 2019

[5] [5]

Junsuk Choe, Seungho Lee, and Hyunjung Shim. 2021. Attention-Based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmenta- tion.IEEE Trans. Pattern Anal. Mach. Intell.43, 12 (2021), 4256–4271

work page 2021

[6] [6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 248–255

work page 2009

[7] [7]

Joachim Denzler and Heinrich Niemann. 1999. Active Rays: Polar-transformed Active Contours for Real-Time Contour Tracking.Real Time Imaging5, 3 (1999), 203–213

work page 1999

[8] [8]

Pablo Diego-Simón, Stéphane d’Ascoli, Emmanuel Chemla, Yair Lakretz, and Jean-Remi King. 2024. A Polar coordinate system represents syntax in large language models. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globe...

work page 2024

[9] [9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In9th Interna- tional Conference on Learning Representations, IC...

work page 2021

[10] [10]

Ziye Fang, Xin Jiang, Hao Tang, and Zechao Li. 2024. Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples.IEEE Trans. Circuits Syst. Video Technol.34, 8 (2024), 7135–7148

work page 2024

[11] [11]

Ju He, Jieneng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, and Changhu Wang. 2022. TransFG: A Transformer Architecture for Fine-Grained Recognition. InThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educat...

work page 2022

[12] [12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InCVPR 2016, Las Vegas, NV, USA, June 27-30,

work page 2016

[13] [13]

Rong-Xiang Hu, Wei Jia, Haibin Ling, and Deshuang Huang. 2012. Multiscale Distance Matrix for Fast Plant Leaf Recognition.IEEE Trans. Image Process.21, 11 (2012), 4667–4672

work page 2012

[14] [14]

Jianing Li, Yaowei Wang, and Shiliang Zhang. 2023. PolarPose: Single-Stage Multi-Person Pose Estimation in Polar Coordinates.IEEE Trans. Image Process. 32 (2023), 1108–1119

work page 2023

[15] [15]

Anastasiu

Yanhong Li, Jack Xu, and David C. Anastasiu. 2024. Learning from Polar Repre- sentation: An Extreme-Adaptive Model for Long-Term Time Series Forecasting. InThirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty- Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advan...

work page doi:10.1609/aaai.v38i1 2024

[16] [16]

Haibin Ling and David W. Jacobs. 2007. Shape Classification Using the Inner- Distance.IEEE Trans. Pattern Anal. Mach. Intell.29, 2 (2007), 286–299

work page 2007

[17] [17]

Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, and Nan Pu. 2024. Novel Class Discovery for Ultra-Fine-Grained Visual Categorization. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. IEEE, 17679–17688

work page 2024

[18] [18]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 9992–10002

work page 2021

[19] [19]

Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, and Li Zhang. 2023. PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 3778–3790

work page 2023

[20] [20]

Zicheng Pan, Xiaohan Yu, Miaohua Zhang, and Yongsheng Gao. 2023. SSFE-Net: Self-Supervised Feature Enhancement for Ultra-Fine-Grained Few-Shot Class Incremental Learning. InIEEE/CVF Winter Conference on Applications of Computer Vision, W ACV 2023, Waikoloa, HI, USA, January 2-7, 2023. IEEE, 6264–6273

work page 2023

[21] [21]

Edwin Arkel Rios, Femiloye Oyerinde, Min-Chun Tien, and Bo-Cheng Lai. 2024. Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition.CoRRabs/2409.11051 (2024)

work page arXiv 2024

[22] [22]

Hongbo Sun, Xiangteng He, Jinglin Xu, and Yuxin Peng. 2024. SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization.IEEE Trans. Image Process.33 (2024), 5312–5326

work page 2024

[23] [23]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Mar...

work page 2021

[24] [24]

Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir

Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022. CLIPasso: semantically-aware object sketching.ACM Trans. Graph.41, 4 (2022), 86:1–86:11

work page 2022

[25] [25]

Bin Wang and Yongsheng Gao. 2014. Hierarchical String Cuts: A Translation, Rotation, Scale, and Mirror Invariant Descriptor for Fast Shape Retrieval.IEEE Trans. Image Process.23, 9 (2014), 4101–4111

work page 2014

[26] [26]

Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, and Ping Luo. 2022. Polar- Mask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.IEEE Trans. Pattern Anal. Mach. Intell.44, 9 (2022), 5385–5400

work page 2022

[27] [27]

Xiaohan Yu, Jun Wang, and Yongsheng Gao. 2023. CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization. InProceed- ings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 4531–4539

work page 2023

[28] [28]

Xiaohan Yu, Jun Wang, Yang Zhao, and Yongsheng Gao. 2023. Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization.Pattern Recognit.135 (2023), 109131

work page 2023

[29] [29]

Xiaohan Yu, Yang Zhao, and Yongsheng Gao. 2022. SPARE: Self-supervised part erasing for ultra-fine-grained visual categorization.Pattern Recognit.128 (2022), 108691

work page 2022

[30] [30]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, and Shengwu Xiong. 2021. MaskCOV: A random mask covariance network for ultra-fine-grained visual categorization. Pattern Recognit.119 (2021), 108067

work page 2021

[31] [31]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, and Xiaohui Yuan. 2020. Patchy Image Structure Classification Using Multi-Orientation Region Transform. InThe Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educa...

work page 2020

[32] [32]

Xiaohan Yu, Yang Zhao, Yongsheng Gao, Xiaohui Yuan, and Shengwu Xiong

work page

[33] [33]

In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021

Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 10265–10275

work page 2021