View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification
Pith reviewed 2026-05-20 11:11 UTC · model grok-4.3
The pith
View-aware semantic alignment using expert queries and graph fusion improves cross-view person re-identification over view-invariant methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a view-aware framework achieves better cross-view semantic consistency than view-invariant paradigms by constructing view-aware experts to generate adaptive semantic queries that capture viewpoint-specific patterns, then using graph reasoning in a dual-branch module to extract and align responsive local regions, thereby retaining more discriminative identity information.
What carries the argument
ViSA framework with an Expert-driven Token Generation Module that builds view-aware experts for adaptive semantic queries and a Dual-branch Local Fusion Module that applies graph reasoning to align local regions.
If this is right
- Superior matching accuracy on AG-ReID.v2, CARGO, and LAGPeR benchmarks for aerial-ground person pairs.
- A 10.06 percent mAP gain on the challenging CARGO cross-view evaluation protocol.
- Improved extraction of local identity cues that respond differently to drone versus ground viewpoints.
- Cross-view semantic consistency that still respects viewpoint-specific patterns rather than discarding them.
Where Pith is reading between the lines
- The same expert-query and graph-fusion pattern could be tested on vehicle re-identification across satellite and street views.
- If the modules generalize, they might reduce the need for massive viewpoint-augmented training sets in other multi-camera settings.
- Scaling the number of view-aware experts could be checked for diminishing returns when more than two camera platforms are involved.
Load-bearing premise
Preserving view-specific cues through adaptive expert queries and graph-based local alignment will retain more discriminative identity information than enforcing view-invariant alignment, without creating new inconsistencies in identity matching.
What would settle it
If a carefully tuned view-invariant baseline matches or exceeds ViSA's mAP on the CARGO cross-view protocol while using similar compute, the advantage of retaining view-specific cues would be called into question.
Figures
read the original abstract
Aerial-Ground Person Re-Identification (AGPReID) remains highly challenging due to drastic viewpoint variations between drones and fixed cameras. Existing methods typically follow a view-invariant paradigm, aligning shared features across views to achieve robustness. However, view-invariant inherently enforces part-level alignment, which ignores view-specific cues and discriminative identity information. To this end, this work proposes ViSA (View-aware Semantic Alignment), a view-aware framework that achieves cross-view semantic consistency containing an Expert-driven Token Generation Module (ETGM) and a Dual-branch Local Fusion Module (DLFM). Technically, the former constructs a set of view-aware experts to generate adaptive semantic queries that perceive viewpoint-specific patterns, while the latter leverages graph reasoning to extract and align local regions responsive to different experts. Extensive experiments on three AGPReID benchmarks including AG-ReID.v2, CARGO and LAGPeR demonstrate that ViSA consistently achieves superior performance, with a notable 10.06\% mAP improvement on the challenging CARGO cross-view protocol. The code is available at \href{https://github.com/Cat-Zero/ViSA}{https://github.com/Cat-Zero/ViSA}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ViSA, a view-aware semantic alignment framework for Aerial-Ground Person Re-Identification (AGPReID). It consists of an Expert-driven Token Generation Module (ETGM) that uses view-aware experts to generate adaptive semantic queries for viewpoint-specific patterns, and a Dual-branch Local Fusion Module (DLFM) that employs graph reasoning to extract and align local regions. The method is evaluated on three benchmarks (AG-ReID.v2, CARGO, LAGPeR), reporting superior performance including a 10.06% mAP improvement on the CARGO cross-view protocol.
Significance. If the empirical results hold, this work challenges the conventional view-invariant paradigm in cross-view ReID by demonstrating that preserving view-specific cues can lead to better identity discriminability. The code release supports reproducibility. This could influence future designs in handling drastic viewpoint variations in person re-identification tasks.
major comments (2)
- [§3.1 (ETGM)] §3.1 (ETGM): The construction of view-aware experts and adaptive queries does not include an explicit mechanism to enforce that the generated tokens correspond to consistent identity features across aerial and ground views. This is load-bearing for the central claim, as the performance gains might arise from modeling view-specific variations rather than achieving true semantic alignment.
- [§3.2 (DLFM)] §3.2 (DLFM): The graph-based local fusion aligns regions responsive to different experts, but lacks a cross-view identity consistency constraint (such as a matching loss between corresponding local features of the same identity). Without this, the alignment may connect locally salient but identity-inconsistent patterns, undermining the claim of improved semantic consistency over view-invariant methods.
minor comments (2)
- [Abstract] The abstract mentions empirical improvements but does not specify the baselines used or report statistical significance of the results.
- [Experiments] Details on ablation studies for the number of view-aware experts and their impact on performance should be expanded for clarity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment below with clarifications on the implicit mechanisms in ViSA and note planned revisions to strengthen the presentation of identity consistency.
read point-by-point responses
-
Referee: [§3.1 (ETGM)] §3.1 (ETGM): The construction of view-aware experts and adaptive queries does not include an explicit mechanism to enforce that the generated tokens correspond to consistent identity features across aerial and ground views. This is load-bearing for the central claim, as the performance gains might arise from modeling view-specific variations rather than achieving true semantic alignment.
Authors: We appreciate the referee highlighting this aspect of the ETGM. The view-aware experts share parameters across aerial and ground inputs and generate adaptive queries conditioned on the input features; end-to-end training with identity classification and triplet losses on cross-view pairs encourages the resulting tokens to emphasize identity-discriminative patterns that remain consistent despite viewpoint differences. The reported gains on cross-view protocols (e.g., 10.06% mAP on CARGO) are consistent with semantic alignment rather than isolated view-specific modeling. To make the consistency argument more explicit, we will add a clarifying paragraph in §3.1 and an ablation measuring token similarity for matched identities across views. revision: partial
-
Referee: [§3.2 (DLFM)] §3.2 (DLFM): The graph-based local fusion aligns regions responsive to different experts, but lacks a cross-view identity consistency constraint (such as a matching loss between corresponding local features of the same identity). Without this, the alignment may connect locally salient but identity-inconsistent patterns, undermining the claim of improved semantic consistency over view-invariant methods.
Authors: We thank the referee for this observation on the DLFM. Graph edges are formed between local patches from the two branches that exhibit high feature similarity under the same expert; the global identity and metric losses then back-propagate consistency signals to these locally aligned regions. While an explicit local matching loss is not present, the current design ties local alignment directly to identity supervision. We agree an additional local consistency term could further reinforce the claim and will incorporate either a lightweight local matching loss or a quantitative analysis of local feature correspondence in the revised manuscript and supplementary material. revision: partial
Circularity Check
No circularity: empirical performance on external benchmarks
full rationale
The paper introduces ViSA with two new modules (ETGM for view-aware expert queries and DLFM for graph-based local fusion) and reports mAP gains on public AGPReID datasets (AG-ReID.v2, CARGO, LAGPeR). These gains are measured against external test protocols rather than being algebraically forced by any fitted parameter or self-referential definition inside the method. No derivation chain reduces a claimed result to an input by construction, and no load-bearing step relies on a self-citation that itself lacks independent verification. The framework is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of view-aware experts
axioms (1)
- domain assumption View-specific cues contain discriminative identity information that should be preserved rather than discarded by invariance
invented entities (1)
-
View-aware experts
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Quan Zhang, Jianhuang Lai, Xiaohua Xie, Xiaofeng Jin, and Sien Huang. Separable spatial-temporal residual graph for cloth-changing group re-identification.IEEE TPAMI, 46(8): 5791–5805, 2024. 1
work page 2024
-
[2]
Yuhao Wang, Yongfeng Lv, Pingping Zhang, and Huchuan Lu. Idea: Inverted text with cooperative deformable aggrega- tion for multi-modal object re-identification. InCVPR, pages 29701–29710, 2025
work page 2025
-
[3]
Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024
Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiaohua Xie. Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024
work page 2024
-
[4]
Video-based person re-identification with long short-term representation learning
Xuehu Liu, Pingping Zhang, and Huchuan Lu. Video-based person re-identification with long short-term representation learning. InICIG, pages 55–67, 2023
work page 2023
-
[5]
Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiao- hua Xie. Seeing like a human: Asynchronous learning with dynamic progressive refinement for person re-identification. IEEE TIP, 31:352–365, 2021
work page 2021
-
[6]
Hu Lu, Xuezhang Zou, and Pingping Zhang. Learning pro- gressive modality-shared transformers for effective visible- infrared person re-identification. InAAAI, pages 1835–1843, 2023
work page 2023
-
[7]
Modeling 3d layout for group re- identification
Quan Zhang, Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, and Xiaohua Xie. Modeling 3d layout for group re- identification. InCVPR, pages 7512–7520, 2022
work page 2022
-
[8]
Multi- memory matching for unsupervised visible-infrared person re-identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, and Yanyun Qu. Multi- memory matching for unsupervised visible-infrared person re-identification. InECCV, pages 456–474. Springer, 2024
work page 2024
-
[9]
Part representation learning with teacher-student decoder for occluded person re-identification
Shang Gao, Chenyang Yu, Pingping Zhang, and Huchuan Lu. Part representation learning with teacher-student decoder for occluded person re-identification. InICASSP, pages 2660–2664, 2024
work page 2024
-
[10]
A feature enhancement loss for person re-identification
Yao Peng, Yining Lin, Huajian Ni, Hua Gao, and Chenchen Hu. A feature enhancement loss for person re-identification. Systems Science & Control Engineering, 11(1):2220482, 2023
work page 2023
-
[11]
Quan Zhang, Jianhuang Lai, and Xiaohua Xie. Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification.IEEE TIP, 30:8019– 8033, 2021
work page 2021
-
[12]
Liyao Han, Jiajia Liu, and Yanning Zhang. Dp-gan: A novel generative adversarial network-based drone pilot identifica- tion scheme.IEEE Sensors Journal, 23(24):31537–31548, 2023
work page 2023
-
[13]
Hat: Hierarchical aggregation transformers for person re-identification
Guowen Zhang, Pingping Zhang, Jinqing Qi, and Huchuan Lu. Hat: Hierarchical aggregation transformers for person re-identification. InACM MM, pages 516–525, 2021
work page 2021
-
[14]
Toward re-identifying any animal
Bingliang Jiao, Lingqiao Liu, Liying Gao, Ruiqi Wu, Gu- osheng Lin, PENG W ANG, and Yanning Zhang. Toward re-identifying any animal. InNeurIPS, pages 40042–40053. Curran Associates, Inc., 2023. 1
work page 2023
-
[15]
Huy Nguyen, Khanh Nguyen, Sridha Sridharan, and Clinton Fookes. Aerial-ground person re-id. InICME, pages 2585– 2590, 2023. 1, 2
work page 2023
-
[16]
Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020
Shizhou Zhang, Qi Zhang, Yifei Yang, Xing Wei, Peng Wang, Bingliang Jiao, and Yanning Zhang. Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020
work page 2020
-
[17]
Huy Nguyen, Kien Nguyen, Sridha Sridharan, and Clinton Fookes. Ag-reid.v2: Bridging aerial and ground views for person re-identification.IEEE Transactions on Information Forensics and Security, 19:2896–2908, 2024. 2, 6, 7
work page 2024
-
[18]
Ground- to-aerial person search: Benchmark dataset and approach
Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, and Yanning Zhang. Ground- to-aerial person search: Benchmark dataset and approach. In ACM MM, pages 789–799, 2023. 1
work page 2023
-
[19]
View-decoupled transformer for person re- identification under aerial-ground camera network
Quan Zhang, Lei Wang, Vishal M Patel, Xiaohua Xie, and Jianhaung Lai. View-decoupled transformer for person re- identification under aerial-ground camera network. InCVPR, pages 22000–22009, 2024. 1, 2, 4, 6, 7
work page 2024
-
[20]
Dissecting person re- identification from the viewpoint of viewpoint
Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. InCVPR, pages 608–617, 2019
work page 2019
-
[21]
Shiqi Wang, Yifan Wang, Rui Wu, et al. Secap: Self- calibrating and adaptive prompts for cross-view person re- identification in aerial-ground networks. InCVPR, 2025. 2, 6, 7
work page 2025
-
[22]
Wajahat Khalid, Bin Liu, Xulin Li, Muhammad Waqas, and Muhammad Sher Afgan. Bridging the sky and ground: To- wards view-invariant feature learning for aerial-ground per- son re-identification. InICCV, pages 9749–9758, 2025. 1, 2
work page 2025
-
[23]
Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification
Pingping Zhang, Xiang Hu, Yuhao Wang, and Huchuan Lu. Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification. 2025. 2
work page 2025
-
[24]
Dynamic token selection for aerial-ground person re-identification
Yuhai Wang and Maryam Pishgar. Dynamic token selection for aerial-ground person re-identification. InICME, 2025. 2, 6
work page 2025
-
[25]
Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991
Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 2
work page 1991
-
[26]
Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994
Michael I Jordan and Robert A Jacobs. Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994. 2
work page 1994
-
[27]
Tutel: Adaptive mixture-of-experts at scale
Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prab- hat Ram, et al. Tutel: Adaptive mixture-of-experts at scale. Proceedings of Machine Learning and Systems, 5:269–287,
-
[28]
Scaling vision with sparse mix- ture of experts
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, Andr ´e Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mix- ture of experts. InNeurIPS, pages 8583–8595, 2021
work page 2021
-
[29]
Mohammed Nowaz Rabbani Chowdhury, Shuai Zhang, Meng Wang, Sijia Liu, and Pin-Yu Chen. Patch-level routing in mixture-of-experts is provably sample-efficient for con- volutional neural networks. InICML, pages 6074–6114. PMLR, 2023. 2
work page 2023
-
[30]
Shaoxiang Chen, Zequn Jie, and Lin Ma. Llava-mole: Sparse mixture of lora experts for mitigating data con- flicts in instruction finetuning mllms.arXiv preprint arXiv:2401.16160, 2024. 2
-
[31]
Octavius: Mitigating task interference in mllms via moe, 2023
Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, and Jing Shao. Octavius: Mitigating task interference in mllms via moe, 2023
work page 2023
-
[32]
Multimodal contrastive learn- ing with limoe: the language-image mixture of experts
Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multimodal contrastive learn- ing with limoe: the language-image mixture of experts. NeurIPS, 35:9564–9576, 2022. 2
work page 2022
-
[33]
Kai Ren, Chuanping Hu, Hao Xi, Yongqiang Li, Jinhao Fan, and Lihua Liu. Mosce-reid: Mixture of semantic clustering experts for person re-identification.Neurocomputing, 626: 129587, 2025. 2
work page 2025
-
[34]
Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid
Yiyang Su, Yunping Shi, Feng Liu, and Xiaoming Liu. Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid. InICCV, pages 11525– 11536, 2025. 2
work page 2025
-
[35]
Yuhao Wang, Yang Liu, Aihua Zheng, and Pingping Zhang. Demo: Decoupled feature-based mixture of experts for multi-modal object re-identification.AAAI, 39(8):8141– 8149, 2025. 2
work page 2025
-
[36]
Yang Wang, Yixing Zhang, Xudie Ren, and Yuxin Deng. Moda: Mixture of domain adapters for parameter-efficient generalizable person re-identification.ACM Transactions on Multimedia Computing, Communications and Applications, 21(5):1–19, 2025. 2
work page 2025
-
[37]
Graph-based person sig- nature for person re-identifications
Binh X Nguyen, Binh D Nguyen, Tuong Do, Erman Tjiputra, Quang D Tran, and Anh Nguyen. Graph-based person sig- nature for person re-identifications. InCVPR, pages 3492– 3501, 2021. 3
work page 2021
-
[38]
High-order information matters: Learning relation and topology for occluded person re-identification
Guan’an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, and Jian Sun. High-order information matters: Learning relation and topology for occluded person re-identification. InCVPR, pages 6449–6458, 2020. 3
work page 2020
-
[39]
Meiyan Huang, Chunping Hou, Qingyuan Yang, and Zhipeng Wang. Reasoning and tuning: Graph attention net- work for occluded person re-identification.IEEE TIP, 32: 1568–1582, 2023. 3
work page 2023
-
[40]
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple granu- larities for person re-identification.ACM MM, 2018. 6, 2
work page 2018
-
[41]
Bag of tricks and a strong baseline for deep person re-identification
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. Bag of tricks and a strong baseline for deep person re-identification. InCVPR, pages 0–0, 2019. 6, 2
work page 2019
-
[42]
Ratnesh Kumar, Edwin Weill, Farzin Aghdasi, and Parthasarathy Sriram. A strong and efficient baseline for ve- hicle re-identification using deep triplet embedding.Journal of Artificial Intelligence and Soft Computing Research, 10 (1):27–45, 2020. 6
work page 2020
-
[43]
Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang. Learning part-based convolutional features for person re- identification.IEEE TPAMI, 43(3):902–917, 2021. 6
work page 2021
-
[44]
Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021. 6
work page 2021
-
[45]
Fastreid: A pytorch toolbox for general instance re-identification
Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, and Tao Mei. Fastreid: A pytorch toolbox for general instance re-identification. InACM MM, pages 9664–9667,
-
[46]
An image is worth 16x16 words: Trans- formers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. InICLR, 2021. 6, 7, 2
work page 2021
-
[47]
Prototypical contrastive learning-based clip fine-tuning for object re-identification
Jiachen Li and Xiaojin Gong. Prototypical contrastive learning-based clip fine-tuning for object re-identification. CoRR, 2023. 6, 2
work page 2023
-
[48]
Clip-reid: exploiting vision-language model for image re-identification without concrete text labels
Siyuan Li, Li Sun, and Qingli Li. Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. InAAAI, pages 1405–1413, 2023. 6, 2
work page 2023
-
[49]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021. 6, 2
work page 2021
-
[50]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255. Ieee, 2009. 7
work page 2009
-
[51]
Large-scale machine learning with stochastic gradient descent
L ´eon Bottou. Large-scale machine learning with stochastic gradient descent. InICCS, pages 177–186. Springer, 2010. 7
work page 2010
-
[52]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution represen- tation learning for visual recognition.IEEE TPAMI, 43(10): 3349–3364, 2020. 2
work page 2020
-
[53]
Swin transformer v2: Scaling up capacity and resolution
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022. 2
work page 2022
-
[54]
Transreid: Transformer-based object re- identification
Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer-based object re- identification. InICCV, pages 15013–15022, 2021. 2
work page 2021
-
[55]
Yuhao Wang, Pingping Zhang, Xuehu Liu, Zhengzheng Tu, and Huchuan Lu. Unity is strength: Unifying convolutional and transformeral features for better person re-identification. IEEE Transactions on Intelligent Transportation Systems,
-
[56]
Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, and Peng Wang. Enhancing visible-infrared person re-identification with modality-and instance-aware visual prompt learning. InICMR, pages 579–588, 2024. 2 View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification Supplementary Material In this supplementary material, we provide additional an...
work page 2024
-
[57]
Visual Analysis 6.1. Retrieval Visualization In addition to the retrieval visualization on the AG-ReID.v2 dataset shown in Fig. 5, we further provide retrieval exam- ples on the CARGO dataset under both the A↔G protocol and the ALL protocol. As illustrated in Fig. 6, our method consistently retrieves correct cross-view matches across dif- ferent evaluatio...
-
[58]
Performance Tab. 3 and Tab. 4 report the performance of ViSA on AG-ReID.v2 and LAGPeR, respectively. On AG-ReID.v2, ViSA achieves the highest mAP across all methods and sets a new state of the art in every evaluation setting except the A→W protocol, where the viewpoint variation is rela- tively minor. We further observe that in this less challenging A→W s...
-
[59]
Parameter Analysis We further analyze the impact of hyperparameters under different evaluation protocols. In the study of the number of expertsE, we fix the number of selected expert to 1 (set- tingk= 1) to ensure controlled comparisons. As shown in Table 3. Performance comparison on AG-ReID.v2 dataset. C represents CCTV , W represents wearable devices an...
-
[60]
Efficiency ViSA achieves strong accuracy and efficiency despite in- troducing additional parameters (+107.6%). As shown in Tab. 5, ViSA enhances mAP by 10.06% while reducing GFLOPs by 3.3% and increasing inference speed by 55.1% FPS
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.