View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification

Cailun Wu; Hongbo Chen; Jianhuang Lai; Jingze Wu; Peiming Zhao; Quan Zhang; Zeqiang Cai

arxiv: 2605.18192 · v1 · pith:3UYDIHQWnew · submitted 2026-05-18 · 💻 cs.CV

View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification

Quan Zhang , Zeqiang Cai , Peiming Zhao , Jingze Wu , Cailun Wu , Hongbo Chen , Jianhuang Lai This is my paper

Pith reviewed 2026-05-20 11:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords aerial-ground person re-identificationview-aware semantic alignmentexpert-driven token generationdual-branch local fusioncross-view consistencyviewpoint variationsgraph reasoningperson re-identification

0 comments

The pith

View-aware semantic alignment using expert queries and graph fusion improves cross-view person re-identification over view-invariant methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that standard view-invariant alignment in aerial-ground person re-identification discards useful viewpoint-specific details that help distinguish identities. It introduces ViSA to maintain cross-view semantic consistency instead, by generating adaptive queries from view-aware experts and fusing local regions through graph reasoning. This design is meant to keep more discriminative identity information intact despite large viewpoint shifts between drones and fixed cameras. Experiments on three benchmarks show consistent gains, including a 10.06 percent mAP rise on the hard CARGO cross-view setting. The result matters for applications like surveillance where the same person must be matched across aerial and ground footage.

Core claim

The central claim is that a view-aware framework achieves better cross-view semantic consistency than view-invariant paradigms by constructing view-aware experts to generate adaptive semantic queries that capture viewpoint-specific patterns, then using graph reasoning in a dual-branch module to extract and align responsive local regions, thereby retaining more discriminative identity information.

What carries the argument

ViSA framework with an Expert-driven Token Generation Module that builds view-aware experts for adaptive semantic queries and a Dual-branch Local Fusion Module that applies graph reasoning to align local regions.

If this is right

Superior matching accuracy on AG-ReID.v2, CARGO, and LAGPeR benchmarks for aerial-ground person pairs.
A 10.06 percent mAP gain on the challenging CARGO cross-view evaluation protocol.
Improved extraction of local identity cues that respond differently to drone versus ground viewpoints.
Cross-view semantic consistency that still respects viewpoint-specific patterns rather than discarding them.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same expert-query and graph-fusion pattern could be tested on vehicle re-identification across satellite and street views.
If the modules generalize, they might reduce the need for massive viewpoint-augmented training sets in other multi-camera settings.
Scaling the number of view-aware experts could be checked for diminishing returns when more than two camera platforms are involved.

Load-bearing premise

Preserving view-specific cues through adaptive expert queries and graph-based local alignment will retain more discriminative identity information than enforcing view-invariant alignment, without creating new inconsistencies in identity matching.

What would settle it

If a carefully tuned view-invariant baseline matches or exceeds ViSA's mAP on the CARGO cross-view protocol while using similar compute, the advantage of retaining view-specific cues would be called into question.

Figures

Figures reproduced from arXiv: 2605.18192 by Cailun Wu, Hongbo Chen, Jianhuang Lai, Jingze Wu, Peiming Zhao, Quan Zhang, Zeqiang Cai.

**Figure 2.** Figure 2: Overview of the proposed ViSA. The transformer backbone first separates intrinsic identity semantics from extrinsic viewpoint [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Modules detail of the Token Expert Module and Graph Learning Module. The Token Expert Module generates view-aware expert [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Analysis of hyper-parameters under ‘ALL’ protocol on the CARGO dataset. Rank-1 and mAP and their average are reported (%). [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of several retrieval visualizations on AG [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of several retrieval visualizations on [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: t-SNE visualization. Left: Baseline. Right: ViSA. Colors indicate different classes and shapes denote different views. 1 accuracy, demonstrating its strong generalization ability under larger cross-view variations. 8. Parameter Analysis We further analyze the impact of hyperparameters under different evaluation protocols. In the study of the number of experts E, we fix the number of selected expert to 1 (s… view at source ↗

**Figure 8.** Figure 8: Analysis of E, k, and λ across different protocols on the CARGO dataset. Rank-1 and mAP and their average are reported (%). E represents the number of experts. k represents number of selected experts. λ represents the balancing coefficient [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Aerial-Ground Person Re-Identification (AGPReID) remains highly challenging due to drastic viewpoint variations between drones and fixed cameras. Existing methods typically follow a view-invariant paradigm, aligning shared features across views to achieve robustness. However, view-invariant inherently enforces part-level alignment, which ignores view-specific cues and discriminative identity information. To this end, this work proposes ViSA (View-aware Semantic Alignment), a view-aware framework that achieves cross-view semantic consistency containing an Expert-driven Token Generation Module (ETGM) and a Dual-branch Local Fusion Module (DLFM). Technically, the former constructs a set of view-aware experts to generate adaptive semantic queries that perceive viewpoint-specific patterns, while the latter leverages graph reasoning to extract and align local regions responsive to different experts. Extensive experiments on three AGPReID benchmarks including AG-ReID.v2, CARGO and LAGPeR demonstrate that ViSA consistently achieves superior performance, with a notable 10.06\% mAP improvement on the challenging CARGO cross-view protocol. The code is available at \href{https://github.com/Cat-Zero/ViSA}{https://github.com/Cat-Zero/ViSA}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ViSA gets solid benchmark gains on aerial-ground re-id by keeping view-specific cues via expert tokens and graph fusion, but the identity consistency argument rests on an assumption the design does not explicitly enforce.

read the letter

ViSA's main contribution is shifting away from pure view-invariant alignment in aerial-ground person re-identification. Instead of forcing shared features across drone and ground views, it uses an Expert-driven Token Generation Module to create adaptive queries that pick up viewpoint-specific patterns, then a Dual-branch Local Fusion Module that applies graph reasoning to align locally responsive regions. This combination is new in this exact setting even if the underlying transformer and graph ideas are familiar. The paper does well at spelling out why view-invariance can throw away useful identity cues, and the results back the approach with consistent improvements on AG-ReID.v2, CARGO, and LAGPeR, including the 10% mAP lift on the hard CARGO cross-view split. Code release helps anyone who wants to inspect or extend the implementation. The soft spot is exactly the one the stress-test note flags. Nothing in the expert queries or the graph edges adds an explicit cross-view identity constraint, so the local alignments could connect salient but non-corresponding regions across the viewpoint gap. If that happens, the gains might come from extra modeling capacity rather than better semantic consistency. The abstract gives limited detail on ablations, baseline choices, or statistical checks, which leaves some room for doubt about how cleanly the improvements are isolated. This work is aimed at people already working on cross-view re-id or drone-based surveillance. A reader in that niche can extract the modules and the benchmark numbers for their own experiments. I would send it for peer review because the problem is real, the results are concrete, and the code makes verification feasible, even if the central identity-preservation claim would benefit from tighter evidence.

Referee Report

2 major / 2 minor

Summary. The paper introduces ViSA, a view-aware semantic alignment framework for Aerial-Ground Person Re-Identification (AGPReID). It consists of an Expert-driven Token Generation Module (ETGM) that uses view-aware experts to generate adaptive semantic queries for viewpoint-specific patterns, and a Dual-branch Local Fusion Module (DLFM) that employs graph reasoning to extract and align local regions. The method is evaluated on three benchmarks (AG-ReID.v2, CARGO, LAGPeR), reporting superior performance including a 10.06% mAP improvement on the CARGO cross-view protocol.

Significance. If the empirical results hold, this work challenges the conventional view-invariant paradigm in cross-view ReID by demonstrating that preserving view-specific cues can lead to better identity discriminability. The code release supports reproducibility. This could influence future designs in handling drastic viewpoint variations in person re-identification tasks.

major comments (2)

[§3.1 (ETGM)] §3.1 (ETGM): The construction of view-aware experts and adaptive queries does not include an explicit mechanism to enforce that the generated tokens correspond to consistent identity features across aerial and ground views. This is load-bearing for the central claim, as the performance gains might arise from modeling view-specific variations rather than achieving true semantic alignment.
[§3.2 (DLFM)] §3.2 (DLFM): The graph-based local fusion aligns regions responsive to different experts, but lacks a cross-view identity consistency constraint (such as a matching loss between corresponding local features of the same identity). Without this, the alignment may connect locally salient but identity-inconsistent patterns, undermining the claim of improved semantic consistency over view-invariant methods.

minor comments (2)

[Abstract] The abstract mentions empirical improvements but does not specify the baselines used or report statistical significance of the results.
[Experiments] Details on ablation studies for the number of view-aware experts and their impact on performance should be expanded for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment below with clarifications on the implicit mechanisms in ViSA and note planned revisions to strengthen the presentation of identity consistency.

read point-by-point responses

Referee: [§3.1 (ETGM)] §3.1 (ETGM): The construction of view-aware experts and adaptive queries does not include an explicit mechanism to enforce that the generated tokens correspond to consistent identity features across aerial and ground views. This is load-bearing for the central claim, as the performance gains might arise from modeling view-specific variations rather than achieving true semantic alignment.

Authors: We appreciate the referee highlighting this aspect of the ETGM. The view-aware experts share parameters across aerial and ground inputs and generate adaptive queries conditioned on the input features; end-to-end training with identity classification and triplet losses on cross-view pairs encourages the resulting tokens to emphasize identity-discriminative patterns that remain consistent despite viewpoint differences. The reported gains on cross-view protocols (e.g., 10.06% mAP on CARGO) are consistent with semantic alignment rather than isolated view-specific modeling. To make the consistency argument more explicit, we will add a clarifying paragraph in §3.1 and an ablation measuring token similarity for matched identities across views. revision: partial
Referee: [§3.2 (DLFM)] §3.2 (DLFM): The graph-based local fusion aligns regions responsive to different experts, but lacks a cross-view identity consistency constraint (such as a matching loss between corresponding local features of the same identity). Without this, the alignment may connect locally salient but identity-inconsistent patterns, undermining the claim of improved semantic consistency over view-invariant methods.

Authors: We thank the referee for this observation on the DLFM. Graph edges are formed between local patches from the two branches that exhibit high feature similarity under the same expert; the global identity and metric losses then back-propagate consistency signals to these locally aligned regions. While an explicit local matching loss is not present, the current design ties local alignment directly to identity supervision. We agree an additional local consistency term could further reinforce the claim and will incorporate either a lightweight local matching loss or a quantitative analysis of local feature correspondence in the revised manuscript and supplementary material. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance on external benchmarks

full rationale

The paper introduces ViSA with two new modules (ETGM for view-aware expert queries and DLFM for graph-based local fusion) and reports mAP gains on public AGPReID datasets (AG-ReID.v2, CARGO, LAGPeR). These gains are measured against external test protocols rather than being algebraically forced by any fitted parameter or self-referential definition inside the method. No derivation chain reduces a claimed result to an input by construction, and no load-bearing step relies on a self-citation that itself lacks independent verification. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that view-specific cues carry useful identity signal and on the modeling choice to implement that signal via expert tokens and graph reasoning; no explicit free parameters or invented physical entities are named in the abstract.

free parameters (1)

number of view-aware experts
Hyperparameter controlling how many adaptive semantic queries are generated; value not stated in abstract.

axioms (1)

domain assumption View-specific cues contain discriminative identity information that should be preserved rather than discarded by invariance
Explicitly stated as motivation for moving away from view-invariant paradigm.

invented entities (1)

View-aware experts no independent evidence
purpose: Generate adaptive semantic queries that perceive viewpoint-specific patterns
New component introduced in the ETGM module

pith-pipeline@v0.9.0 · 5755 in / 1229 out tokens · 34029 ms · 2026-05-20T11:11:46.236364+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

[1]

Separable spatial-temporal residual graph for cloth-changing group re-identification.IEEE TPAMI, 46(8): 5791–5805, 2024

Quan Zhang, Jianhuang Lai, Xiaohua Xie, Xiaofeng Jin, and Sien Huang. Separable spatial-temporal residual graph for cloth-changing group re-identification.IEEE TPAMI, 46(8): 5791–5805, 2024. 1

work page 2024
[2]

Idea: Inverted text with cooperative deformable aggrega- tion for multi-modal object re-identification

Yuhao Wang, Yongfeng Lv, Pingping Zhang, and Huchuan Lu. Idea: Inverted text with cooperative deformable aggrega- tion for multi-modal object re-identification. InCVPR, pages 29701–29710, 2025

work page 2025
[3]

Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024

Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiaohua Xie. Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024

work page 2024
[4]

Video-based person re-identification with long short-term representation learning

Xuehu Liu, Pingping Zhang, and Huchuan Lu. Video-based person re-identification with long short-term representation learning. InICIG, pages 55–67, 2023

work page 2023
[5]

Seeing like a human: Asynchronous learning with dynamic progressive refinement for person re-identification

Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiao- hua Xie. Seeing like a human: Asynchronous learning with dynamic progressive refinement for person re-identification. IEEE TIP, 31:352–365, 2021

work page 2021
[6]

Learning pro- gressive modality-shared transformers for effective visible- infrared person re-identification

Hu Lu, Xuezhang Zou, and Pingping Zhang. Learning pro- gressive modality-shared transformers for effective visible- infrared person re-identification. InAAAI, pages 1835–1843, 2023

work page 2023
[7]

Modeling 3d layout for group re- identification

Quan Zhang, Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, and Xiaohua Xie. Modeling 3d layout for group re- identification. InCVPR, pages 7512–7520, 2022

work page 2022
[8]

Multi- memory matching for unsupervised visible-infrared person re-identification

Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, and Yanyun Qu. Multi- memory matching for unsupervised visible-infrared person re-identification. InECCV, pages 456–474. Springer, 2024

work page 2024
[9]

Part representation learning with teacher-student decoder for occluded person re-identification

Shang Gao, Chenyang Yu, Pingping Zhang, and Huchuan Lu. Part representation learning with teacher-student decoder for occluded person re-identification. InICASSP, pages 2660–2664, 2024

work page 2024
[10]

A feature enhancement loss for person re-identification

Yao Peng, Yining Lin, Huajian Ni, Hua Gao, and Chenchen Hu. A feature enhancement loss for person re-identification. Systems Science & Control Engineering, 11(1):2220482, 2023

work page 2023
[11]

Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification.IEEE TIP, 30:8019– 8033, 2021

Quan Zhang, Jianhuang Lai, and Xiaohua Xie. Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification.IEEE TIP, 30:8019– 8033, 2021

work page 2021
[12]

Dp-gan: A novel generative adversarial network-based drone pilot identifica- tion scheme.IEEE Sensors Journal, 23(24):31537–31548, 2023

Liyao Han, Jiajia Liu, and Yanning Zhang. Dp-gan: A novel generative adversarial network-based drone pilot identifica- tion scheme.IEEE Sensors Journal, 23(24):31537–31548, 2023

work page 2023
[13]

Hat: Hierarchical aggregation transformers for person re-identification

Guowen Zhang, Pingping Zhang, Jinqing Qi, and Huchuan Lu. Hat: Hierarchical aggregation transformers for person re-identification. InACM MM, pages 516–525, 2021

work page 2021
[14]

Toward re-identifying any animal

Bingliang Jiao, Lingqiao Liu, Liying Gao, Ruiqi Wu, Gu- osheng Lin, PENG W ANG, and Yanning Zhang. Toward re-identifying any animal. InNeurIPS, pages 40042–40053. Curran Associates, Inc., 2023. 1

work page 2023
[15]

Aerial-ground person re-id

Huy Nguyen, Khanh Nguyen, Sridha Sridharan, and Clinton Fookes. Aerial-ground person re-id. InICME, pages 2585– 2590, 2023. 1, 2

work page 2023
[16]

Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020

Shizhou Zhang, Qi Zhang, Yifei Yang, Xing Wei, Peng Wang, Bingliang Jiao, and Yanning Zhang. Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020

work page 2020
[17]

Ag-reid.v2: Bridging aerial and ground views for person re-identification.IEEE Transactions on Information Forensics and Security, 19:2896–2908, 2024

Huy Nguyen, Kien Nguyen, Sridha Sridharan, and Clinton Fookes. Ag-reid.v2: Bridging aerial and ground views for person re-identification.IEEE Transactions on Information Forensics and Security, 19:2896–2908, 2024. 2, 6, 7

work page 2024
[18]

Ground- to-aerial person search: Benchmark dataset and approach

Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, and Yanning Zhang. Ground- to-aerial person search: Benchmark dataset and approach. In ACM MM, pages 789–799, 2023. 1

work page 2023
[19]

View-decoupled transformer for person re- identification under aerial-ground camera network

Quan Zhang, Lei Wang, Vishal M Patel, Xiaohua Xie, and Jianhaung Lai. View-decoupled transformer for person re- identification under aerial-ground camera network. InCVPR, pages 22000–22009, 2024. 1, 2, 4, 6, 7

work page 2024
[20]

Dissecting person re- identification from the viewpoint of viewpoint

Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. InCVPR, pages 608–617, 2019

work page 2019
[21]

Secap: Self- calibrating and adaptive prompts for cross-view person re- identification in aerial-ground networks

Shiqi Wang, Yifan Wang, Rui Wu, et al. Secap: Self- calibrating and adaptive prompts for cross-view person re- identification in aerial-ground networks. InCVPR, 2025. 2, 6, 7

work page 2025
[22]

Bridging the sky and ground: To- wards view-invariant feature learning for aerial-ground per- son re-identification

Wajahat Khalid, Bin Liu, Xulin Li, Muhammad Waqas, and Muhammad Sher Afgan. Bridging the sky and ground: To- wards view-invariant feature learning for aerial-ground per- son re-identification. InICCV, pages 9749–9758, 2025. 1, 2

work page 2025
[23]

Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification

Pingping Zhang, Xiang Hu, Yuhao Wang, and Huchuan Lu. Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification. 2025. 2

work page 2025
[24]

Dynamic token selection for aerial-ground person re-identification

Yuhai Wang and Maryam Pishgar. Dynamic token selection for aerial-ground person re-identification. InICME, 2025. 2, 6

work page 2025
[25]

Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 2

work page 1991
[26]

Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994

Michael I Jordan and Robert A Jacobs. Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994. 2

work page 1994
[27]

Tutel: Adaptive mixture-of-experts at scale

Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prab- hat Ram, et al. Tutel: Adaptive mixture-of-experts at scale. Proceedings of Machine Learning and Systems, 5:269–287,

work page
[28]

Scaling vision with sparse mix- ture of experts

Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, Andr ´e Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mix- ture of experts. InNeurIPS, pages 8583–8595, 2021

work page 2021
[29]

Patch-level routing in mixture-of-experts is provably sample-efficient for con- volutional neural networks

Mohammed Nowaz Rabbani Chowdhury, Shuai Zhang, Meng Wang, Sijia Liu, and Pin-Yu Chen. Patch-level routing in mixture-of-experts is provably sample-efficient for con- volutional neural networks. InICML, pages 6074–6114. PMLR, 2023. 2

work page 2023
[30]

Llava-mole: Sparse mixture of lora experts for mitigating data con- flicts in instruction finetuning mllms.arXiv preprint arXiv:2401.16160, 2024

Shaoxiang Chen, Zequn Jie, and Lin Ma. Llava-mole: Sparse mixture of lora experts for mitigating data con- flicts in instruction finetuning mllms.arXiv preprint arXiv:2401.16160, 2024. 2

work page arXiv 2024
[31]

Octavius: Mitigating task interference in mllms via moe, 2023

Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, and Jing Shao. Octavius: Mitigating task interference in mllms via moe, 2023

work page 2023
[32]

Multimodal contrastive learn- ing with limoe: the language-image mixture of experts

Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multimodal contrastive learn- ing with limoe: the language-image mixture of experts. NeurIPS, 35:9564–9576, 2022. 2

work page 2022
[33]

Mosce-reid: Mixture of semantic clustering experts for person re-identification.Neurocomputing, 626: 129587, 2025

Kai Ren, Chuanping Hu, Hao Xi, Yongqiang Li, Jinhao Fan, and Lihua Liu. Mosce-reid: Mixture of semantic clustering experts for person re-identification.Neurocomputing, 626: 129587, 2025. 2

work page 2025
[34]

Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid

Yiyang Su, Yunping Shi, Feng Liu, and Xiaoming Liu. Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid. InICCV, pages 11525– 11536, 2025. 2

work page 2025
[35]

Demo: Decoupled feature-based mixture of experts for multi-modal object re-identification.AAAI, 39(8):8141– 8149, 2025

Yuhao Wang, Yang Liu, Aihua Zheng, and Pingping Zhang. Demo: Decoupled feature-based mixture of experts for multi-modal object re-identification.AAAI, 39(8):8141– 8149, 2025. 2

work page 2025
[36]

Yang Wang, Yixing Zhang, Xudie Ren, and Yuxin Deng. Moda: Mixture of domain adapters for parameter-efficient generalizable person re-identification.ACM Transactions on Multimedia Computing, Communications and Applications, 21(5):1–19, 2025. 2

work page 2025
[37]

Graph-based person sig- nature for person re-identifications

Binh X Nguyen, Binh D Nguyen, Tuong Do, Erman Tjiputra, Quang D Tran, and Anh Nguyen. Graph-based person sig- nature for person re-identifications. InCVPR, pages 3492– 3501, 2021. 3

work page 2021
[38]

High-order information matters: Learning relation and topology for occluded person re-identification

Guan’an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, and Jian Sun. High-order information matters: Learning relation and topology for occluded person re-identification. InCVPR, pages 6449–6458, 2020. 3

work page 2020
[39]

Reasoning and tuning: Graph attention net- work for occluded person re-identification.IEEE TIP, 32: 1568–1582, 2023

Meiyan Huang, Chunping Hou, Qingyuan Yang, and Zhipeng Wang. Reasoning and tuning: Graph attention net- work for occluded person re-identification.IEEE TIP, 32: 1568–1582, 2023. 3

work page 2023
[40]

Learning discriminative features with multiple granu- larities for person re-identification.ACM MM, 2018

Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple granu- larities for person re-identification.ACM MM, 2018. 6, 2

work page 2018
[41]

Bag of tricks and a strong baseline for deep person re-identification

Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. Bag of tricks and a strong baseline for deep person re-identification. InCVPR, pages 0–0, 2019. 6, 2

work page 2019
[42]

A strong and efficient baseline for ve- hicle re-identification using deep triplet embedding.Journal of Artificial Intelligence and Soft Computing Research, 10 (1):27–45, 2020

Ratnesh Kumar, Edwin Weill, Farzin Aghdasi, and Parthasarathy Sriram. A strong and efficient baseline for ve- hicle re-identification using deep triplet embedding.Journal of Artificial Intelligence and Soft Computing Research, 10 (1):27–45, 2020. 6

work page 2020
[43]

Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang. Learning part-based convolutional features for person re- identification.IEEE TPAMI, 43(3):902–917, 2021. 6

work page 2021
[44]

Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021

Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021. 6

work page 2021
[45]

Fastreid: A pytorch toolbox for general instance re-identification

Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, and Tao Mei. Fastreid: A pytorch toolbox for general instance re-identification. InACM MM, pages 9664–9667,

work page
[46]

An image is worth 16x16 words: Trans- formers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. InICLR, 2021. 6, 7, 2

work page 2021
[47]

Prototypical contrastive learning-based clip fine-tuning for object re-identification

Jiachen Li and Xiaojin Gong. Prototypical contrastive learning-based clip fine-tuning for object re-identification. CoRR, 2023. 6, 2

work page 2023
[48]

Clip-reid: exploiting vision-language model for image re-identification without concrete text labels

Siyuan Li, Li Sun, and Qingli Li. Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. InAAAI, pages 1405–1413, 2023. 6, 2

work page 2023
[49]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021. 6, 2

work page 2021
[50]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255. Ieee, 2009. 7

work page 2009
[51]

Large-scale machine learning with stochastic gradient descent

L ´eon Bottou. Large-scale machine learning with stochastic gradient descent. InICCS, pages 177–186. Springer, 2010. 7

work page 2010
[52]

Deep high-resolution represen- tation learning for visual recognition.IEEE TPAMI, 43(10): 3349–3364, 2020

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution represen- tation learning for visual recognition.IEEE TPAMI, 43(10): 3349–3364, 2020. 2

work page 2020
[53]

Swin transformer v2: Scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022. 2

work page 2022
[54]

Transreid: Transformer-based object re- identification

Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer-based object re- identification. InICCV, pages 15013–15022, 2021. 2

work page 2021
[55]

Unity is strength: Unifying convolutional and transformeral features for better person re-identification

Yuhao Wang, Pingping Zhang, Xuehu Liu, Zhengzheng Tu, and Huchuan Lu. Unity is strength: Unifying convolutional and transformeral features for better person re-identification. IEEE Transactions on Intelligent Transportation Systems,

work page
[56]

Enhancing visible-infrared person re-identification with modality-and instance-aware visual prompt learning

Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, and Peng Wang. Enhancing visible-infrared person re-identification with modality-and instance-aware visual prompt learning. InICMR, pages 579–588, 2024. 2 View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification Supplementary Material In this supplementary material, we provide additional an...

work page 2024
[57]

Retrieval Visualization In addition to the retrieval visualization on the AG-ReID.v2 dataset shown in Fig

Visual Analysis 6.1. Retrieval Visualization In addition to the retrieval visualization on the AG-ReID.v2 dataset shown in Fig. 5, we further provide retrieval exam- ples on the CARGO dataset under both the A↔G protocol and the ALL protocol. As illustrated in Fig. 6, our method consistently retrieves correct cross-view matches across dif- ferent evaluatio...

work page
[58]

3 and Tab

Performance Tab. 3 and Tab. 4 report the performance of ViSA on AG-ReID.v2 and LAGPeR, respectively. On AG-ReID.v2, ViSA achieves the highest mAP across all methods and sets a new state of the art in every evaluation setting except the A→W protocol, where the viewpoint variation is rela- tively minor. We further observe that in this less challenging A→W s...

work page
[59]

In the study of the number of expertsE, we fix the number of selected expert to 1 (set- tingk= 1) to ensure controlled comparisons

Parameter Analysis We further analyze the impact of hyperparameters under different evaluation protocols. In the study of the number of expertsE, we fix the number of selected expert to 1 (set- tingk= 1) to ensure controlled comparisons. As shown in Table 3. Performance comparison on AG-ReID.v2 dataset. C represents CCTV , W represents wearable devices an...

work page arXiv
[60]

As shown in Tab

Efficiency ViSA achieves strong accuracy and efficiency despite in- troducing additional parameters (+107.6%). As shown in Tab. 5, ViSA enhances mAP by 10.06% while reducing GFLOPs by 3.3% and increasing inference speed by 55.1% FPS

work page

[1] [1]

Separable spatial-temporal residual graph for cloth-changing group re-identification.IEEE TPAMI, 46(8): 5791–5805, 2024

Quan Zhang, Jianhuang Lai, Xiaohua Xie, Xiaofeng Jin, and Sien Huang. Separable spatial-temporal residual graph for cloth-changing group re-identification.IEEE TPAMI, 46(8): 5791–5805, 2024. 1

work page 2024

[2] [2]

Idea: Inverted text with cooperative deformable aggrega- tion for multi-modal object re-identification

Yuhao Wang, Yongfeng Lv, Pingping Zhang, and Huchuan Lu. Idea: Inverted text with cooperative deformable aggrega- tion for multi-modal object re-identification. InCVPR, pages 29701–29710, 2025

work page 2025

[3] [3]

Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024

Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiaohua Xie. Uncertainty modeling for group re-identification.IJCV, 132(8):3046–3066, 2024

work page 2024

[4] [4]

Video-based person re-identification with long short-term representation learning

Xuehu Liu, Pingping Zhang, and Huchuan Lu. Video-based person re-identification with long short-term representation learning. InICIG, pages 55–67, 2023

work page 2023

[5] [5]

Seeing like a human: Asynchronous learning with dynamic progressive refinement for person re-identification

Quan Zhang, Jianhuang Lai, Zhanxiang Feng, and Xiao- hua Xie. Seeing like a human: Asynchronous learning with dynamic progressive refinement for person re-identification. IEEE TIP, 31:352–365, 2021

work page 2021

[6] [6]

Learning pro- gressive modality-shared transformers for effective visible- infrared person re-identification

Hu Lu, Xuezhang Zou, and Pingping Zhang. Learning pro- gressive modality-shared transformers for effective visible- infrared person re-identification. InAAAI, pages 1835–1843, 2023

work page 2023

[7] [7]

Modeling 3d layout for group re- identification

Quan Zhang, Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, and Xiaohua Xie. Modeling 3d layout for group re- identification. InCVPR, pages 7512–7520, 2022

work page 2022

[8] [8]

Multi- memory matching for unsupervised visible-infrared person re-identification

Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, and Yanyun Qu. Multi- memory matching for unsupervised visible-infrared person re-identification. InECCV, pages 456–474. Springer, 2024

work page 2024

[9] [9]

Part representation learning with teacher-student decoder for occluded person re-identification

Shang Gao, Chenyang Yu, Pingping Zhang, and Huchuan Lu. Part representation learning with teacher-student decoder for occluded person re-identification. InICASSP, pages 2660–2664, 2024

work page 2024

[10] [10]

A feature enhancement loss for person re-identification

Yao Peng, Yining Lin, Huajian Ni, Hua Gao, and Chenchen Hu. A feature enhancement loss for person re-identification. Systems Science & Control Engineering, 11(1):2220482, 2023

work page 2023

[11] [11]

Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification.IEEE TIP, 30:8019– 8033, 2021

Quan Zhang, Jianhuang Lai, and Xiaohua Xie. Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification.IEEE TIP, 30:8019– 8033, 2021

work page 2021

[12] [12]

Dp-gan: A novel generative adversarial network-based drone pilot identifica- tion scheme.IEEE Sensors Journal, 23(24):31537–31548, 2023

Liyao Han, Jiajia Liu, and Yanning Zhang. Dp-gan: A novel generative adversarial network-based drone pilot identifica- tion scheme.IEEE Sensors Journal, 23(24):31537–31548, 2023

work page 2023

[13] [13]

Hat: Hierarchical aggregation transformers for person re-identification

Guowen Zhang, Pingping Zhang, Jinqing Qi, and Huchuan Lu. Hat: Hierarchical aggregation transformers for person re-identification. InACM MM, pages 516–525, 2021

work page 2021

[14] [14]

Toward re-identifying any animal

Bingliang Jiao, Lingqiao Liu, Liying Gao, Ruiqi Wu, Gu- osheng Lin, PENG W ANG, and Yanning Zhang. Toward re-identifying any animal. InNeurIPS, pages 40042–40053. Curran Associates, Inc., 2023. 1

work page 2023

[15] [15]

Aerial-ground person re-id

Huy Nguyen, Khanh Nguyen, Sridha Sridharan, and Clinton Fookes. Aerial-ground person re-id. InICME, pages 2585– 2590, 2023. 1, 2

work page 2023

[16] [16]

Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020

Shizhou Zhang, Qi Zhang, Yifei Yang, Xing Wei, Peng Wang, Bingliang Jiao, and Yanning Zhang. Person re- identification in aerial imagery.IEEE TMM, 23:281–291, 2020

work page 2020

[17] [17]

Ag-reid.v2: Bridging aerial and ground views for person re-identification.IEEE Transactions on Information Forensics and Security, 19:2896–2908, 2024

Huy Nguyen, Kien Nguyen, Sridha Sridharan, and Clinton Fookes. Ag-reid.v2: Bridging aerial and ground views for person re-identification.IEEE Transactions on Information Forensics and Security, 19:2896–2908, 2024. 2, 6, 7

work page 2024

[18] [18]

Ground- to-aerial person search: Benchmark dataset and approach

Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, and Yanning Zhang. Ground- to-aerial person search: Benchmark dataset and approach. In ACM MM, pages 789–799, 2023. 1

work page 2023

[19] [19]

View-decoupled transformer for person re- identification under aerial-ground camera network

Quan Zhang, Lei Wang, Vishal M Patel, Xiaohua Xie, and Jianhaung Lai. View-decoupled transformer for person re- identification under aerial-ground camera network. InCVPR, pages 22000–22009, 2024. 1, 2, 4, 6, 7

work page 2024

[20] [20]

Dissecting person re- identification from the viewpoint of viewpoint

Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. InCVPR, pages 608–617, 2019

work page 2019

[21] [21]

Secap: Self- calibrating and adaptive prompts for cross-view person re- identification in aerial-ground networks

Shiqi Wang, Yifan Wang, Rui Wu, et al. Secap: Self- calibrating and adaptive prompts for cross-view person re- identification in aerial-ground networks. InCVPR, 2025. 2, 6, 7

work page 2025

[22] [22]

Bridging the sky and ground: To- wards view-invariant feature learning for aerial-ground per- son re-identification

Wajahat Khalid, Bin Liu, Xulin Li, Muhammad Waqas, and Muhammad Sher Afgan. Bridging the sky and ground: To- wards view-invariant feature learning for aerial-ground per- son re-identification. InICCV, pages 9749–9758, 2025. 1, 2

work page 2025

[23] [23]

Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification

Pingping Zhang, Xiang Hu, Yuhao Wang, and Huchuan Lu. Latex: Leveraging attribute-based text knowledge for aerial- ground person re-identification. 2025. 2

work page 2025

[24] [24]

Dynamic token selection for aerial-ground person re-identification

Yuhai Wang and Maryam Pishgar. Dynamic token selection for aerial-ground person re-identification. InICME, 2025. 2, 6

work page 2025

[25] [25]

Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 2

work page 1991

[26] [26]

Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994

Michael I Jordan and Robert A Jacobs. Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2): 181–214, 1994. 2

work page 1994

[27] [27]

Tutel: Adaptive mixture-of-experts at scale

Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prab- hat Ram, et al. Tutel: Adaptive mixture-of-experts at scale. Proceedings of Machine Learning and Systems, 5:269–287,

work page

[28] [28]

Scaling vision with sparse mix- ture of experts

Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, Andr ´e Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mix- ture of experts. InNeurIPS, pages 8583–8595, 2021

work page 2021

[29] [29]

Patch-level routing in mixture-of-experts is provably sample-efficient for con- volutional neural networks

Mohammed Nowaz Rabbani Chowdhury, Shuai Zhang, Meng Wang, Sijia Liu, and Pin-Yu Chen. Patch-level routing in mixture-of-experts is provably sample-efficient for con- volutional neural networks. InICML, pages 6074–6114. PMLR, 2023. 2

work page 2023

[30] [30]

Llava-mole: Sparse mixture of lora experts for mitigating data con- flicts in instruction finetuning mllms.arXiv preprint arXiv:2401.16160, 2024

Shaoxiang Chen, Zequn Jie, and Lin Ma. Llava-mole: Sparse mixture of lora experts for mitigating data con- flicts in instruction finetuning mllms.arXiv preprint arXiv:2401.16160, 2024. 2

work page arXiv 2024

[31] [31]

Octavius: Mitigating task interference in mllms via moe, 2023

Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, and Jing Shao. Octavius: Mitigating task interference in mllms via moe, 2023

work page 2023

[32] [32]

Multimodal contrastive learn- ing with limoe: the language-image mixture of experts

Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multimodal contrastive learn- ing with limoe: the language-image mixture of experts. NeurIPS, 35:9564–9576, 2022. 2

work page 2022

[33] [33]

Mosce-reid: Mixture of semantic clustering experts for person re-identification.Neurocomputing, 626: 129587, 2025

Kai Ren, Chuanping Hu, Hao Xi, Yongqiang Li, Jinhao Fan, and Lihua Liu. Mosce-reid: Mixture of semantic clustering experts for person re-identification.Neurocomputing, 626: 129587, 2025. 2

work page 2025

[34] [34]

Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid

Yiyang Su, Yunping Shi, Feng Liu, and Xiaoming Liu. Hamobe: Hierarchical and adaptive mixture of biometric ex- perts for video-based person reid. InICCV, pages 11525– 11536, 2025. 2

work page 2025

[35] [35]

Demo: Decoupled feature-based mixture of experts for multi-modal object re-identification.AAAI, 39(8):8141– 8149, 2025

Yuhao Wang, Yang Liu, Aihua Zheng, and Pingping Zhang. Demo: Decoupled feature-based mixture of experts for multi-modal object re-identification.AAAI, 39(8):8141– 8149, 2025. 2

work page 2025

[36] [36]

Yang Wang, Yixing Zhang, Xudie Ren, and Yuxin Deng. Moda: Mixture of domain adapters for parameter-efficient generalizable person re-identification.ACM Transactions on Multimedia Computing, Communications and Applications, 21(5):1–19, 2025. 2

work page 2025

[37] [37]

Graph-based person sig- nature for person re-identifications

Binh X Nguyen, Binh D Nguyen, Tuong Do, Erman Tjiputra, Quang D Tran, and Anh Nguyen. Graph-based person sig- nature for person re-identifications. InCVPR, pages 3492– 3501, 2021. 3

work page 2021

[38] [38]

High-order information matters: Learning relation and topology for occluded person re-identification

Guan’an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, and Jian Sun. High-order information matters: Learning relation and topology for occluded person re-identification. InCVPR, pages 6449–6458, 2020. 3

work page 2020

[39] [39]

Reasoning and tuning: Graph attention net- work for occluded person re-identification.IEEE TIP, 32: 1568–1582, 2023

Meiyan Huang, Chunping Hou, Qingyuan Yang, and Zhipeng Wang. Reasoning and tuning: Graph attention net- work for occluded person re-identification.IEEE TIP, 32: 1568–1582, 2023. 3

work page 2023

[40] [40]

Learning discriminative features with multiple granu- larities for person re-identification.ACM MM, 2018

Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple granu- larities for person re-identification.ACM MM, 2018. 6, 2

work page 2018

[41] [41]

Bag of tricks and a strong baseline for deep person re-identification

Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. Bag of tricks and a strong baseline for deep person re-identification. InCVPR, pages 0–0, 2019. 6, 2

work page 2019

[42] [42]

A strong and efficient baseline for ve- hicle re-identification using deep triplet embedding.Journal of Artificial Intelligence and Soft Computing Research, 10 (1):27–45, 2020

Ratnesh Kumar, Edwin Weill, Farzin Aghdasi, and Parthasarathy Sriram. A strong and efficient baseline for ve- hicle re-identification using deep triplet embedding.Journal of Artificial Intelligence and Soft Computing Research, 10 (1):27–45, 2020. 6

work page 2020

[43] [43]

Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang. Learning part-based convolutional features for person re- identification.IEEE TPAMI, 43(3):902–917, 2021. 6

work page 2021

[44] [44]

Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021

Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. Deep learning for person re- identification: A survey and outlook.IEEE TPAMI, 44(6): 2872–2893, 2021. 6

work page 2021

[45] [45]

Fastreid: A pytorch toolbox for general instance re-identification

Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, and Tao Mei. Fastreid: A pytorch toolbox for general instance re-identification. InACM MM, pages 9664–9667,

work page

[46] [46]

An image is worth 16x16 words: Trans- formers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. InICLR, 2021. 6, 7, 2

work page 2021

[47] [47]

Prototypical contrastive learning-based clip fine-tuning for object re-identification

Jiachen Li and Xiaojin Gong. Prototypical contrastive learning-based clip fine-tuning for object re-identification. CoRR, 2023. 6, 2

work page 2023

[48] [48]

Clip-reid: exploiting vision-language model for image re-identification without concrete text labels

Siyuan Li, Li Sun, and Qingli Li. Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. InAAAI, pages 1405–1413, 2023. 6, 2

work page 2023

[49] [49]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021. 6, 2

work page 2021

[50] [50]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255. Ieee, 2009. 7

work page 2009

[51] [51]

Large-scale machine learning with stochastic gradient descent

L ´eon Bottou. Large-scale machine learning with stochastic gradient descent. InICCS, pages 177–186. Springer, 2010. 7

work page 2010

[52] [52]

Deep high-resolution represen- tation learning for visual recognition.IEEE TPAMI, 43(10): 3349–3364, 2020

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution represen- tation learning for visual recognition.IEEE TPAMI, 43(10): 3349–3364, 2020. 2

work page 2020

[53] [53]

Swin transformer v2: Scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022. 2

work page 2022

[54] [54]

Transreid: Transformer-based object re- identification

Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer-based object re- identification. InICCV, pages 15013–15022, 2021. 2

work page 2021

[55] [55]

Unity is strength: Unifying convolutional and transformeral features for better person re-identification

Yuhao Wang, Pingping Zhang, Xuehu Liu, Zhengzheng Tu, and Huchuan Lu. Unity is strength: Unifying convolutional and transformeral features for better person re-identification. IEEE Transactions on Intelligent Transportation Systems,

work page

[56] [56]

Enhancing visible-infrared person re-identification with modality-and instance-aware visual prompt learning

Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, and Peng Wang. Enhancing visible-infrared person re-identification with modality-and instance-aware visual prompt learning. InICMR, pages 579–588, 2024. 2 View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification Supplementary Material In this supplementary material, we provide additional an...

work page 2024

[57] [57]

Retrieval Visualization In addition to the retrieval visualization on the AG-ReID.v2 dataset shown in Fig

Visual Analysis 6.1. Retrieval Visualization In addition to the retrieval visualization on the AG-ReID.v2 dataset shown in Fig. 5, we further provide retrieval exam- ples on the CARGO dataset under both the A↔G protocol and the ALL protocol. As illustrated in Fig. 6, our method consistently retrieves correct cross-view matches across dif- ferent evaluatio...

work page

[58] [58]

3 and Tab

Performance Tab. 3 and Tab. 4 report the performance of ViSA on AG-ReID.v2 and LAGPeR, respectively. On AG-ReID.v2, ViSA achieves the highest mAP across all methods and sets a new state of the art in every evaluation setting except the A→W protocol, where the viewpoint variation is rela- tively minor. We further observe that in this less challenging A→W s...

work page

[59] [59]

In the study of the number of expertsE, we fix the number of selected expert to 1 (set- tingk= 1) to ensure controlled comparisons

Parameter Analysis We further analyze the impact of hyperparameters under different evaluation protocols. In the study of the number of expertsE, we fix the number of selected expert to 1 (set- tingk= 1) to ensure controlled comparisons. As shown in Table 3. Performance comparison on AG-ReID.v2 dataset. C represents CCTV , W represents wearable devices an...

work page arXiv

[60] [60]

As shown in Tab

Efficiency ViSA achieves strong accuracy and efficiency despite in- troducing additional parameters (+107.6%). As shown in Tab. 5, ViSA enhances mAP by 10.06% while reducing GFLOPs by 3.3% and increasing inference speed by 55.1% FPS

work page