pith. sign in

arxiv: 2606.22383 · v1 · pith:CEOX6CQXnew · submitted 2026-06-21 · 💻 cs.CV · cs.AI· cs.LG

Structured Hyperedge Adaptation for Parameter-Efficient Fine-Tuning of Vision Transformers

Pith reviewed 2026-06-26 10:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords parameter-efficient fine-tuningvision transformershypergraph adaptersstructured adaptationtoken relationshipsadapter methodsViT fine-tuninginductive bias
0
0 comments X

The pith

Vision transformers adapt more effectively when updates are computed over groups of related tokens rather than each token alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard parameter-efficient adapters refine each ViT token independently, which the paper claims ignores the spatial and semantic groupings that naturally exist in images. HyperAdapter instead builds a soft hypergraph by assigning tokens to prototypes, aggregates features into hyperedge representations, runs a lightweight bottleneck update at that level, and routes the changes back to the original tokens through the incidence matrix. This supplies an explicit structural bias while adding only a modest number of trainable parameters. Experiments on multiple visual benchmarks show higher accuracy than token-wise PEFT baselines at matched budgets, with larger margins on tasks that require reasoning about object parts or scene layout. The central move is therefore a change in adaptation space from tokens to hyperedges.

Core claim

HyperAdapter constructs a soft hypergraph over ViT tokens with prototype-based assignments, aggregates token features into latent hyperedge representations, applies bottleneck adaptation at the hyperedge level, and diffuses the resulting updates back to tokens via the hypergraph incidence structure, thereby injecting an explicit structural inductive bias into parameter-efficient fine-tuning.

What carries the argument

Soft hypergraph constructed via prototype-based token assignments, which enables group-aware adaptation by routing updates through aggregated hyperedge representations rather than individual tokens.

If this is right

  • Updates become spatially consistent because tokens assigned to the same hyperedge receive correlated refinements.
  • Redundant parameter changes decrease when adaptation occurs on aggregated hyperedge features instead of every token separately.
  • Gains are largest on tasks whose labels depend on relational structure among image regions.
  • The module stays modular and can be inserted into existing ViT pipelines without altering the backbone weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hyperedge-level adaptation could be tested on language or multimodal transformers where token relations also matter.
  • Alternative hypergraph construction rules, such as attention-derived or spatial-grid rules, might be compared directly against the prototype method.
  • If the performance edge persists across many datasets, future PEFT designs may prioritize the choice of adaptation graph over further reductions in parameter count.

Load-bearing premise

Prototype-based soft hypergraph construction will produce meaningful hyperedges that reflect the actual structured relationships among tokens in visual scenes.

What would settle it

Replacing the learned prototype assignments with random token groupings of the same size and measuring whether accuracy falls back to standard adapter levels on the same benchmarks.

Figures

Figures reproduced from arXiv: 2606.22383 by Edwin Kwadwo Tenagyei, Jun Zhou, Lei Wang, Ugochukwu Ejike Akpudo, Yongsheng Gao.

Figure 1
Figure 1. Figure 1: HyperAdapter framework. We introduce HyperAdapter, a parameter-efficient adaptation module that operates in a structured interaction space. Given patch tokens from a frozen vision transformer, a routing mechanism softly groups tokens into hyper￾edges, capturing higher-order relationships among visually related regions. Each hy￾peredge aggregates information from multiple tokens and is refined using a light… view at source ↗
Figure 2
Figure 2. Figure 2: Top-1 accuracy on few-shot FGVC benchmarks using ViT-B/16. HyperAdapter consistently outperforms prior PEFT methods, with the largest gains observed in the ultra-low-shot (1-4) regime. Implementation details. We build upon a pretrained ViT-B/16 [7] back￾bone initialized with ImageNet-21k [41] weights. Unless otherwise specified, the backbone parameters are frozen, and only the classification head and Hyper… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Effect of the number of hyperedges K on VTAB-1k, showing that a moderate K = 8 bal￾ances model capacity and efficiency. (b) Sensitivity to routing temperature τ on Caltech101, where per￾formance peaks at τ = 0.10, indicating optimal hy￾peredge assignment at moderate routing sharpness. Impact of the num￾ber of hyperedges. Fig. 3a shows how varying the num￾ber of hyperedges K af￾fects performance and pa￾… view at source ↗
Figure 4
Figure 4. Figure 4: Token-to-hyperedge routing entropy across transformer layers for CIFAR-100, EuroSAT, and KITTI. Higher entropy indicates more distributed assignments. At￾tention adapters maintain broader routing, while MLP adapters become increasingly specialized in deeper layers, reflecting progressive feature refinement. (a) CIFAR-100 (b) EuroSAT (c) KITTI [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hyperedge usage distribution across transformer layers for CIFAR-100, Eu￾roSAT, and KITTI. Early layers use hyperedges more evenly, while deeper layers in￾creasingly concentrate on a subset of hyperedges, indicating progressive specialization. both). Parallel placement consistently achieves the best performance, reaching 77.6% average accuracy, as it preserves the residual pathway and allows adapters to pr… view at source ↗
Figure 6
Figure 6. Figure 6: DAAM [27] visualizations compar￾ing spatial attribution across PEFT meth￾ods. Columns show the original image, token-wise baseline, AdaptFormer, and Hy￾perAdapter. HyperAdapter produces more concentrated and semantically aligned acti￾vations, highlighting relevant object regions while reducing background noise, reflecting the benefits of hyperedge-based routing. Hyperedge usage across lay￾ers. We examine h… view at source ↗
Figure 7
Figure 7. Figure 7: Token-to-hyperedge membership heatmaps across representative transformer layers. Each heatmap visualizes the routing matrix M, where rows correspond to patch tokens and columns correspond to hyperedges. We show routing patterns from Blocks 1, 6, and 12 for both attention and MLP adapters across CIFAR100, EuroSAT, and KITTI. Early layers exhibit diffuse and distributed routing assignments, while deeper laye… view at source ↗
Figure 8
Figure 8. Figure 8: Patch-grid routing visualization. Each patch is colored according to the hyper￾edge receiving the highest routing probability. We show representative routing patterns from Blocks 1, 6, and 12 across datasets and modules. Early layers exhibit fragmented assignments, while deeper layers form more coherent spatial groups, indicating that HyperAdapter organizes tokens into semantically meaningful hyperedge clu… view at source ↗
Figure 9
Figure 9. Figure 9: DAAM [27] visualizations comparing spatial attribution across PEFT meth￾ods. Columns show the original image, token-wise baseline, AdaptFormer, and Hyper￾Adapter. HyperAdapter produces more concentrated and semantically aligned activa￾tions, highlighting relevant object regions while reducing background noise, reflecting the benefits of hyperedge-based routing [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: DAAM [27] visualizations comparing HyperAdapter model with baseline and AdaptFormer models on VTAB-1K across all 12 transformer blocks [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
read the original abstract

Parameter-efficient fine-tuning (PEFT) has become a practical solution for adapting large pretrained vision transformers (ViTs) to downstream tasks while updating only a small subset of parameters. However, existing adapter-based methods perform adaptation independently for each token, implicitly assuming that token refinements should be learned in isolation. This token-wise formulation overlooks the structured relationships among tokens that naturally arise in visual scenes, potentially leading to redundant updates and spatially inconsistent feature refinement. In this work, we revisit the design of parameter-efficient adapters and propose to perform adaptation in hyperedge space rather than token space. We introduce HyperAdapter, a hypergraph-based adapter architecture that enables structured, group-aware adaptation through soft token routing. HyperAdapter constructs a soft hypergraph over ViT tokens using prototype-based assignments, aggregates token features into latent hyperedge representations, applies lightweight bottleneck adaptation at the hyperedge level, and diffuses the resulting updates back to tokens via the hypergraph incidence structure. This design injects an explicit structural inductive bias into PEFT while preserving the modularity and efficiency of standard adapters. Extensive experiments across diverse visual benchmarks demonstrate that structured hyperedge adaptation consistently outperforms strong PEFT baselines under comparable parameter budgets, with particularly pronounced gains on tasks requiring structured reasoning. Our results suggest that the choice of adaptation space is a critical yet underexplored dimension in parameter-efficient transfer for ViTs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes HyperAdapter, a hypergraph-based adapter for parameter-efficient fine-tuning of Vision Transformers. It constructs a soft hypergraph over ViT tokens via prototype-based assignments, aggregates features to hyperedge representations, performs bottleneck adaptation at the hyperedge level, and diffuses updates back to tokens via the incidence matrix. The central claim is that this structured adaptation in hyperedge space yields consistent outperformance over strong PEFT baselines (e.g., standard adapters) under comparable parameter budgets across visual benchmarks, with larger gains on tasks requiring structured reasoning.

Significance. If the results hold and the hypergraph construction indeed captures meaningful visual structure, the work highlights adaptation space (hyperedge vs. token) as an underexplored axis for injecting inductive bias in PEFT. This could inform future adapter designs for relational or part-based vision tasks while retaining modularity and efficiency. The explicit structural mechanism is a conceptual contribution, though its empirical grounding requires further verification as noted below.

major comments (2)
  1. [§4] §4 (HyperAdapter architecture), prototype-based soft hypergraph construction: the central claim that gains arise from 'structured' hyperedges reflecting visual relationships (objects, parts, spatial relations) is load-bearing, yet the manuscript provides no direct verification such as hyperedge visualizations, semantic coherence metrics on the learned prototypes, or ablations replacing prototype assignments with random soft assignments. Without these, it remains possible that performance improvements derive from the aggregation/diffusion routing or added parameters rather than genuine structural bias.
  2. [§5] §5 (Experiments), results tables: while the abstract states 'consistent outperformance' and 'pronounced gains on structured-reasoning tasks,' the reported comparisons lack error bars, multiple random seeds, or statistical significance tests against baselines. This weakens the ability to assess whether the hyperedge mechanism reliably drives the claimed advantages under comparable parameter budgets.
minor comments (2)
  1. [§4] Notation for the incidence matrix and diffusion step should be clarified with an explicit equation showing how updates are propagated back to tokens; current description in §4 is high-level.
  2. The manuscript should include a reference to prior hypergraph neural network work in vision (e.g., hypergraph convolutions for scene understanding) to better situate the novelty of the soft prototype construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript accordingly to strengthen the empirical support for our claims.

read point-by-point responses
  1. Referee: [§4] §4 (HyperAdapter architecture), prototype-based soft hypergraph construction: the central claim that gains arise from 'structured' hyperedges reflecting visual relationships (objects, parts, spatial relations) is load-bearing, yet the manuscript provides no direct verification such as hyperedge visualizations, semantic coherence metrics on the learned prototypes, or ablations replacing prototype assignments with random soft assignments. Without these, it remains possible that performance improvements derive from the aggregation/diffusion routing or added parameters rather than genuine structural bias.

    Authors: We agree that direct evidence isolating the contribution of the prototype-based structure is needed to support the central claim. In the revision we will add (i) qualitative visualizations of selected hyperedges overlaid on input images to show semantic coherence (e.g., grouping tokens belonging to the same object or spatial region), and (ii) a controlled ablation that replaces the learned prototype assignments with random soft assignments while preserving the aggregation, bottleneck, and diffusion stages. These additions will clarify whether the observed gains are attributable to the structured routing rather than the routing mechanism or parameter count alone. revision: yes

  2. Referee: [§5] §5 (Experiments), results tables: while the abstract states 'consistent outperformance' and 'pronounced gains on structured-reasoning tasks,' the reported comparisons lack error bars, multiple random seeds, or statistical significance tests against baselines. This weakens the ability to assess whether the hyperedge mechanism reliably drives the claimed advantages under comparable parameter budgets.

    Authors: We acknowledge the absence of statistical reporting. In the revised manuscript we will rerun all main experiments with at least three random seeds, report mean performance together with standard deviation, and include paired statistical significance tests (e.g., t-tests) against the strongest baselines under matched parameter budgets. Updated tables and a short discussion of variability will be added to §5. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new architecture with independent empirical evaluation

full rationale

The paper introduces HyperAdapter as an explicit new architecture involving prototype-based soft hypergraph construction over ViT tokens, aggregation to hyperedges, bottleneck adaptation, and incidence-based diffusion. These are presented as design choices injecting structural bias, with performance claims resting on external benchmark comparisons rather than any equation or parameter reducing the gains to quantities defined by the method's own fitted inputs. No self-citations, self-definitional steps, or fitted-input predictions appear in the abstract or described method. The derivation chain is self-contained against external baselines.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of prototype-based soft hypergraph construction and the assumption that hyperedge-level updates will translate into spatially consistent token refinements; these are introduced without independent evidence in the provided abstract.

free parameters (2)
  • number of prototypes
    Used to construct soft assignments; value not stated in abstract but required for the hypergraph.
  • bottleneck dimension
    Standard adapter hyperparameter whose specific value affects the parameter budget comparison.
axioms (1)
  • domain assumption Tokens in visual scenes exhibit structured relationships that are better captured by hyperedges than by independent token updates.
    Invoked in the motivation for moving adaptation to hyperedge space.
invented entities (1)
  • soft hypergraph over ViT tokens no independent evidence
    purpose: To enable group-aware adaptation via incidence structure.
    New architectural construct introduced by the paper; no external falsifiable handle provided in abstract.

pith-pipeline@v0.9.1-grok · 5795 in / 1370 out tokens · 28111 ms · 2026-06-26T10:41:07.226873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 3 linked inside Pith

  1. [1]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Albert, P., Zhang, F.Z., Saratchandran, H., van den Hengel, A., Abbasnejad, E.: Towards higher effective rank in parameter-efficient fine-tuning using khatri-rao product. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1292–1302 (2025)

  2. [2]

    In: European Conference on Computer Vision (2014)

    Bossard, L., Guillaumin, M., Gool, L.V.: Food-101 - mining discriminative compo- nents with random forests. In: European Conference on Computer Vision (2014)

  3. [3]

    Advances in Neural Information Processing Systems35, 16664–16678 (2022)

    Chen, S., Ge, C., Tong, Z., Wang, J., Song, Y., Wang, J., Luo, P.: Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems35, 16664–16678 (2022)

  4. [4]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Chen, T., Chen, J., Zhang, B., Yu, Z., Chen, S., Ye, R., Li, X., Ye, Y.: Sensitivity- aware efficient fine-tuning via compact dynamic-rank adaptation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 9655–9664 (2025)

  5. [5]

    Advances in Neural Information Processing Systems 37, 102056–102077 (2024)

    Dong, W., Sun, Y., Yang, Y., Zhang, X., Lin, Z., Yan, Q., Zhang, H., Wang, P., Yang, Y., Shen, H.: Efficient adaptation of pre-trained vision transformer via householder transformation. Advances in Neural Information Processing Systems 37, 102056–102077 (2024)

  6. [6]

    Advances in Neural Information Processing Sys- tems36, 52548–52567 (2023)

    Dong, W., Yan, D., Lin, Z., Wang, P.: Efficient adaptation of large vision trans- former via adapter re-composing. Advances in Neural Information Processing Sys- tems36, 52548–52567 (2023)

  7. [7]

    ArXiv abs/2010.11929(2020)

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv abs/2010.11929(2020)

  8. [8]

    In: Proceedings of the AAAI conference on artificial intelligence

    Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp. 3558– 3565 (2019)

  9. [9]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Fixelle, J.: Hypergraph vision transformers: Images are more than nodes, more than edges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9751–9761 (2025)

  10. [10]

    Tenagyei et al

    Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-d object retrieval and recognition withhypergraphanalysis.IEEEtransactionsonimageprocessing21(9),4290–4303 (2012) 16 E.K. Tenagyei et al

  11. [11]

    Advances in neural information processing systems30(2017)

    Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Advances in neural information processing systems30(2017)

  12. [12]

    arXiv preprint arXiv:2307.13770 (2023)

    Han,C.,Wang,Q.,Cui,Y.,Cao,Z.,Wang,W.,Qi,S.,Liu,D.:Eˆ2vpt:Aneffective and efficient approach for visual prompt tuning. arXiv preprint arXiv:2307.13770 (2023)

  13. [13]

    Advances in neural information processing systems35, 8291–8303 (2022)

    Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision gnn: An image is worth graph of nodes. Advances in neural information processing systems35, 8291–8303 (2022)

  14. [14]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision hgnn: An image is more than a graph of nodes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19878–19888 (2023)

  15. [15]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

  16. [16]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    He, X., Li, C., Zhang, P., Yang, J., Wang, X.E.: Parameter-efficient model adapta- tion for vision transformers. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 817–825 (2023)

  17. [17]

    In: International conference on machine learning

    Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Ges- mundo, A., Attariyan, M., Gelly, S.: Parameter-efficient transfer learning for nlp. In: International conference on machine learning. pp. 2790–2799. PMLR (2019)

  18. [18]

    Iclr1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

  19. [19]

    In: 2009 IEEE conference on computer vision and pattern recognition

    Huang, Y., Liu, Q., Metaxas, D.: ] video object segmentation by hypergraph cut. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 1738–

  20. [20]

    arXiv preprint arXiv:2403.19243 (2024)

    Ji, Y., Saratchandran, H., Gordon, C., Zhang, Z., Lucey, S.: Efficient learning with sine-activated low-rank matrices. arXiv preprint arXiv:2403.19243 (2024)

  21. [21]

    In: European conference on computer vision

    Jia, M., Tang, L., Chen, B.C., Cardie, C., Belongie, S., Hariharan, B., Lim, S.N.: Visual prompt tuning. In: European conference on computer vision. pp. 709–727. Springer (2022)

  22. [22]

    arXiv preprint arXiv:2207.07039 (2022)

    Jie, S., Deng, Z.H.: Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039 (2022)

  23. [23]

    In: Proceedings of the AAAI conference on artificial intelligence

    Jie, S., Deng, Z.H.: Fact: Factor-tuning for lightweight adaptation on vision trans- former. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 1060–1068 (2023)

  24. [24]

    Advances in neural information processing systems 34, 1022–1035 (2021)

    Karimi Mahabadi, R., Henderson, J., Ruder, S.: Compacter: Efficient low-rank hypercomplex adapter layers. Advances in neural information processing systems 34, 1022–1035 (2021)

  25. [25]

    2013 IEEE International Conference on Computer Vision Workshops pp

    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. 2013 IEEE International Conference on Computer Vision Workshops pp. 554–561 (2013)

  26. [26]

    Advances in Neural Information Processing Systems35, 109–123 (2022)

    Lian, D., Zhou, D., Feng, J., Wang, X.: Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems35, 109–123 (2022)

  27. [27]

    Pattern Recognit.165, 111607 (2025)

    Liao, Y., Gao, Y., Zhang, W.: Dynamic accumulated attention map for interpreting evolution of decision-making in vision transformer. Pattern Recognit.165, 111607 (2025)

  28. [28]

    arXiv preprint arXiv:2311.06243 (2023) HyperAdapter: Structured Hyperedge Adaptation for ViTs 17

    Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., Feng, H., Liu, Z., Heo, J., Peng, S., et al.: Parameter-efficient orthogonal finetuning via butterfly factoriza- tion. arXiv preprint arXiv:2311.06243 (2023) HyperAdapter: Structured Hyperedge Adaptation for ViTs 17

  29. [29]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liu,Z.,Hu,H.,Lin,Y.,Yao,Z.,Xie,Z.,Wei,Y.,Ning,J.,Cao,Y.,Zhang,Z.,Dong, L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12009–12019 (2022)

  30. [30]

    2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin trans- former: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 9992–10002 (2021)

  31. [31]

    arXiv preprint arXiv:2302.08106 (2023)

    Luo, G., Huang, M., Zhou, Y., Sun, X., Jiang, G., Wang, Z., Ji, R.: To- wards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106 (2023)

  32. [32]

    In: 2018 25th IEEE Interna- tional Conference on Image Processing (ICIP)

    Lv, X., Wang, L., Zhang, Q., Zheng, N., Hua, G.: Video object co-segmentation from noisy videos by a multi-level hypergraph model. In: 2018 25th IEEE Interna- tional Conference on Image Processing (ICIP). pp. 2207–2211. IEEE (2018)

  33. [33]

    arXiv preprint arXiv:2404.04316 (2024)

    Ma, X., Chu, X., Yang, Z., Lin, Y., Gao, X., Zhao, J.: Parameter efficient quasi- orthogonal fine-tuning via givens rotation. arXiv preprint arXiv:2404.04316 (2024)

  34. [34]

    ArXivabs/1306.5151(2013)

    Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., Vedaldi, A.: Fine-grained visual classification of aircraft. ArXivabs/1306.5151(2013)

  35. [35]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Munir, M., Avery, W., Marculescu, R.: Mobilevig: Graph-based sparse attention for mobile vision applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2211–2219 (2023)

  36. [36]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Munir, M., Avery, W., Rahman, M.M., Marculescu, R.: Greedyvig: Dynamic axial graph construction for efficient vision gnns. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6118–6127 (2024)

  37. [37]

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)2, 1447–1454 (2006)

    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)2, 1447–1454 (2006)

  38. [38]

    In: 2012 IEEE conference on computer vision and pattern recognition

    Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition. pp. 3498–3505. IEEE (2012)

  39. [39]

    In: Proceedings of the AAAI conference on artificial intel- ligence

    Pei, W., Xia, T., Chen, F., Li, J., Tian, J., Lu, G.: Sa2vp: Spatially aligned-and- adapted visual prompt. In: Proceedings of the AAAI conference on artificial intel- ligence. vol. 38, pp. 4450–4458 (2024)

  40. [40]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Ren, L., Chen, C., Wang, L., Hua, K.: Da-vpt: Semantic-guided visual prompt tuning for vision transformers. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 4353–4363 (2025)

  41. [41]

    International Journal of Computer Vision115, 211 – 252 (2014)

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision115, 211 – 252 (2014)

  42. [42]

    AI magazine29(3), 93–93 (2008)

    Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collec- tive classification in network data. AI magazine29(3), 93–93 (2008)

  43. [43]

    arXiv preprint arXiv:2408.11351 (2024)

    Srinivas, S.S., Sarkar, R.K., Gangasani, S., Runkana, V.: Vision hgnn: An electron- micrograph is worth hypergraph of hypernodes. arXiv preprint arXiv:2408.11351 (2024)

  44. [44]

    In: International conference on machine learning

    Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. pp. 10347–10357. PMLR (2021)

  45. [45]

    Knowledge and Information Systems14(3), 347–375 (2008) 18 E.K

    Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems14(3), 347–375 (2008) 18 E.K. Tenagyei et al

  46. [46]

    In: Proceedings of the AAAI conference on artificial intelli- gence

    Wang, H., Chang, J., Zhai, Y., Luo, X., Sun, J., Lin, Z., Tian, Q.: Lion: Implicit vision prompt tuning. In: Proceedings of the AAAI conference on artificial intelli- gence. vol. 38, pp. 5372–5380 (2024)

  47. [47]

    arXiv preprint arXiv:2505.11235 (2025)

    Wu, F., Hu, J., Min, G., Wang, S.: Efficient orthogonal fine-tuning with principal subspace adaptation. arXiv preprint arXiv:2505.11235 (2025)

  48. [48]

    In: International Conference on Machine Learning

    Yoo, S., Kim, E., Jung, D., Lee, J., Yoon, S.: Improving visual prompt tuning for self-supervised vision transformers. In: International Conference on Machine Learning. pp. 40075–40092. PMLR (2023)

  49. [49]

    arXiv preprint arXiv:2210.00788 (2022)

    Yu,B.X.,Chang,J.,Liu,L.,Tian,Q.,Chen,C.W.:Towardsaunifiedviewonvisual parameter-efficient transfer learning. arXiv preprint arXiv:2210.00788 (2022)

  50. [50]

    In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

    Zaken, E.B., Goldberg, Y., Ravfogel, S.: Bitfit: Simple parameter-efficient fine- tuning for transformer-based masked language-models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 1–9 (2022)

  51. [51]

    arXiv preprint arXiv:1910.04867 (2019)

    Zhai, X., Puigcerver, J., Kolesnikov, A., Ruyssen, P., Riquelme, C., Lucic, M., Djo- longa, J., Pinto, A.S., Neumann, M., Dosovitskiy, A., et al.: A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867 (2019)

  52. [52]

    IEEE Transactions on Pattern Analysis and Machine Intelligence47(7), 5268–5280 (2024)

    Zhang, Y., Zhou, K., Liu, Z.: Neural prompt search. IEEE Transactions on Pattern Analysis and Machine Intelligence47(7), 5268–5280 (2024)

  53. [53]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Zhang, Z., Zhang, Q., Gao, Z., Zhang, R., Shutova, E., Zhou, S., Zhang, S.: Gradient-based parameter selection for efficient fine-tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 28566– 28577 (2024) HyperAdapter: Structured Hyperedge Adaptation for ViTs 19 T able 6:VTAB-1k datasets [51] categorized into Na...