Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification
Pith reviewed 2026-05-09 21:57 UTC · model grok-4.3
The pith
HiTPro uses hierarchical temporal prototypes to enable unsupervised matching of people across visible and infrared video tracklets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HiTPro constructs reliable intra-camera prototypes via temporal partitioning of tracklets and performs two-stage positive mining in Hierarchical Cross-Prototype Alignment, progressing from within-modality to cross-modality matching with dynamic thresholds and soft weights, then optimizes with hierarchical contrastive learning at intra-camera, cross-camera same-modality, and cross-modality levels, achieving state-of-the-art unsupervised performance on HITSZ-VCM and BUPTCampus.
What carries the argument
Hierarchical Temporal Prototyping, which aggregates features from temporally partitioned sub-tracklets into prototypes and aligns them across cameras and modalities without hard pseudo-labels.
If this is right
- Outperforms adapted baselines significantly on two benchmark datasets under fully unsupervised settings.
- Establishes a strong baseline for future research in unsupervised video-based VI-ReID.
- The temporal-aware feature encoder provides robust tracklet representations that support prototype construction.
- The two-stage alignment reduces the impact of modality gaps in positive mining.
- Multi-level contrastive learning progressively enforces discrimination and invariance.
Where Pith is reading between the lines
- If the prototypes reliably capture identities, similar prototyping could apply to other unlabeled multi-modal video tasks.
- The dynamic threshold strategy might generalize to adaptively handle varying data qualities in re-identification.
- Success here suggests that avoiding hard labels can mitigate error propagation in unsupervised cross-modal learning.
- Future work could test integration with other temporal aggregation techniques for even better robustness.
Load-bearing premise
The constructed intra-camera prototypes and the hierarchical alignment process accurately identify true cross-modality identity matches from unlabeled tracklets without systematic errors from incorrect positives or large modality differences.
What would settle it
Running HiTPro on a dataset where ground-truth cross-modality correspondences are known and observing whether the learned features achieve high matching accuracy or if many wrong identities are aligned due to prototype errors.
Figures
read the original abstract
Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HiTPro, a prototype-driven framework for unsupervised video-based visible-infrared person re-identification (VI-ReID). It consists of a Temporal-aware Feature Encoder that extracts frame-level features and aggregates them into tracklet representations, Intra-Camera Tracklet Prototyping to build reliable per-camera prototypes from temporally partitioned sub-tracklets, Hierarchical Cross-Prototype Alignment that performs two-stage positive mining (within-modality then cross-modality) using a Dynamic Threshold Strategy and Soft Weight Assignment, and Hierarchical Contrastive Learning that optimizes alignment at intra-camera, cross-camera same-modality, and cross-modality levels. Experiments on the HITSZ-VCM and BUPTCampus datasets report state-of-the-art unsupervised performance, significantly outperforming adapted baselines.
Significance. If the central empirical claims hold, the work is significant because it addresses the largely unexplored problem of fully unsupervised video VI-ReID, which is practically important for label-free all-day surveillance systems. The avoidance of hard pseudo-label assignment via soft prototype alignment and the explicit handling of temporal dynamics and modality gaps represent a promising technical direction. The paper establishes a reproducible baseline that future unsupervised video VI-ReID methods can build upon.
major comments (2)
- [Section 3.3] Section 3.3 (Hierarchical Cross-Prototype Alignment): the two-stage positive mining (within-modality followed by cross-modality) with Dynamic Threshold Strategy and Soft Weight Assignment is load-bearing for the identity-discovery claim, yet the manuscript provides no direct measurement of mining precision/recall or ablation on error propagation from residual modality gaps after the Temporal-aware Feature Encoder; without these, it is unclear whether the reported SOTA gains on HITSZ-VCM and BUPTCampus arise from correct correspondences or from other factors such as stronger tracklet aggregation.
- [Section 4] Section 4 (Experiments): the central claim of reliable unsupervised identity discovery rests on the Intra-Camera Tracklet Prototyping producing clean clusters, but no quantitative cluster-purity metrics, sensitivity analysis on the dynamic threshold, or ablation removing the hierarchical alignment step are presented; this leaves open the possibility that performance improvements are not attributable to the proposed alignment mechanism.
minor comments (2)
- [Section 4.1] The description of baseline adaptations (e.g., how supervised VI-ReID methods were converted to unsupervised settings) is only summarized; explicit implementation details or code references would improve reproducibility.
- [Section 3.4] Notation for the soft weight assignment and the three-level contrastive losses could be clarified with a single summary equation or table to avoid ambiguity when reading the hierarchical objective.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of empirical validation for our unsupervised video VI-ReID framework. We address each major comment below and will incorporate the suggested analyses in the revised manuscript to more clearly attribute performance gains to the proposed components.
read point-by-point responses
-
Referee: [Section 3.3] Section 3.3 (Hierarchical Cross-Prototype Alignment): the two-stage positive mining (within-modality followed by cross-modality) with Dynamic Threshold Strategy and Soft Weight Assignment is load-bearing for the identity-discovery claim, yet the manuscript provides no direct measurement of mining precision/recall or ablation on error propagation from residual modality gaps after the Temporal-aware Feature Encoder; without these, it is unclear whether the reported SOTA gains on HITSZ-VCM and BUPTCampus arise from correct correspondences or from other factors such as stronger tracklet aggregation.
Authors: We agree that direct measurements of mining precision and recall, along with targeted ablations on residual modality gaps, would strengthen the evidence that the two-stage positive mining drives the identity discovery. In the revised version, we will add a quantitative evaluation of mining accuracy (precision/recall) by using ground-truth identities from the evaluation sets for post-hoc verification of the mined positives, while preserving the fully unsupervised training protocol. We will also include an ablation that isolates the effect of modality gaps after the Temporal-aware Feature Encoder (e.g., by comparing the full two-stage process against a single-stage cross-modality baseline). These additions will demonstrate that the SOTA gains on HITSZ-VCM and BUPTCampus arise from the hierarchical alignment mechanism rather than tracklet aggregation alone. revision: yes
-
Referee: [Section 4] Section 4 (Experiments): the central claim of reliable unsupervised identity discovery rests on the Intra-Camera Tracklet Prototyping producing clean clusters, but no quantitative cluster-purity metrics, sensitivity analysis on the dynamic threshold, or ablation removing the hierarchical alignment step are presented; this leaves open the possibility that performance improvements are not attributable to the proposed alignment mechanism.
Authors: We acknowledge that cluster-purity metrics, sensitivity analysis, and a dedicated ablation removing the hierarchical alignment would provide clearer attribution of gains to the Intra-Camera Tracklet Prototyping and subsequent alignment steps. In the revision, we will report quantitative cluster purity metrics (e.g., purity score and normalized mutual information) for the prototypes generated from temporally partitioned sub-tracklets. We will also add a sensitivity analysis varying the dynamic threshold parameter across a range of values and report its impact on final Re-ID performance. Finally, we will include an ablation that removes the hierarchical cross-prototype alignment (retaining only intra-camera prototyping and basic contrastive learning) to isolate its contribution. These experiments will be conducted on both HITSZ-VCM and BUPTCampus to confirm that the reported improvements stem from the full proposed pipeline. revision: yes
Circularity Check
No circularity: empirical framework with independent algorithmic content
full rationale
The paper describes an algorithmic pipeline (Temporal-aware Feature Encoder, Intra-Camera Tracklet Prototyping, two-stage Hierarchical Cross-Prototype Alignment with Dynamic Threshold and Soft Weight Assignment, and Hierarchical Contrastive Learning) that generates its own pseudo-supervision from unlabeled tracklets. These steps constitute a standard self-supervised clustering-plus-contrastive loop rather than a mathematical derivation. No equations are shown that reduce claimed performance or identity correspondences to fitted parameters by construction, and no load-bearing self-citations or uniqueness theorems are invoked. Validation is purely empirical on separate test sets (HITSZ-VCM, BUPTCampus), leaving the central claims falsifiable and non-tautological.
Axiom & Free-Parameter Ledger
free parameters (2)
- Dynamic Threshold
- Soft Weight Assignment parameters
axioms (2)
- domain assumption Temporal-aware Feature Encoder produces sufficiently discriminative frame-level features for subsequent aggregation.
- domain assumption Intra-camera sub-tracklet aggregation yields reliable prototypes without identity supervision.
invented entities (2)
-
Intra-Camera Tracklet Prototypes
no independent evidence
-
Hierarchical Cross-Prototype Alignment
no independent evidence
Reference graph
Works this paper leans on
-
[1]
N. Huang, J. Liu, Y . Miao, Q. Zhang, and J. Han, “Deep learning for visible-infrared cross-modality person re-identification: A comprehen- sive review,”Information Fusion, 2022
work page 2022
-
[2]
Deep learning for person re-identification: A survey and outlook,
M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 2872– 2893, 2021
work page 2021
-
[3]
A survey of open-world person re- identification,
Q. Leng, M. Ye, and Q. Tian, “A survey of open-world person re- identification,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 4, pp. 1092–1108, 2019
work page 2019
-
[4]
Rgb-infrared cross- modality person re-identification,
A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross- modality person re-identification,” inProceedings of the IEEE/CVF international conference on computer vision, 2017, pp. 5390–5399
work page 2017
-
[5]
Learning modality-specific representations for visible-infrared person re-identification,
Z. Feng, J. Lai, and X. Xie, “Learning modality-specific representations for visible-infrared person re-identification,”IEEE Transactions on Im- age Processing, vol. 29, pp. 579–590, 2019
work page 2019
-
[6]
H. Lu, X. Zou, and P. Zhang, “Learning progressive modality-shared transformers for effective visible-infrared person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 2, 2023, pp. 1835–1843
work page 2023
-
[7]
Cross- modality person re-identification with shared-specific feature transfer,
Y . Lu, Y . Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, “Cross- modality person re-identification with shared-specific feature transfer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 379–13 389
work page 2020
-
[8]
Grayscale enhancement colorization network for visible-infrared person re-identification,
X. Zhong, T. Lu, W. Huang, M. Ye, X. Jia, and C.-W. Lin, “Grayscale enhancement colorization network for visible-infrared person re-identification,”IEEE transactions on circuits and systems for video technology, vol. 32, no. 3, pp. 1418–1430, 2021. 13
work page 2021
-
[9]
Cross-modality person re- identification via channel-based partition network,
J. Liu, W. Song, C. Chen, and F. Liu, “Cross-modality person re- identification via channel-based partition network,”Applied intelligence, vol. 52, no. 3, pp. 2423–2435, 2022
work page 2022
-
[10]
A generative-based image fusion strategy for visible-infrared person re-identification,
J. Qi, T. Liang, W. Liu, Y . Li, and Y . Jin, “A generative-based image fusion strategy for visible-infrared person re-identification,”IEEE Trans- actions on Circuits and Systems for Video Technology, vol. 34, no. 1, pp. 518–533, 2023
work page 2023
-
[11]
Video- based person re-identification with accumulative motion context,
H. Liu, Z. Jie, K. Jayashree, M. Qi, J. Jiang, S. Yan, and J. Feng, “Video- based person re-identification with accumulative motion context,”IEEE transactions on circuits and systems for video technology, vol. 28, no. 10, pp. 2788–2802, 2017
work page 2017
-
[12]
Multi-level factorisation net for person re-identification,
X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2109–2118
work page 2018
-
[13]
X. Lin, J. Li, Z. Ma, H. Li, S. Li, K. Xu, G. Lu, and D. Zhang, “Learning modal-invariant and temporal-memory for video-based visible-infrared person re-identification,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 973–20 982
work page 2022
-
[14]
Video-based visible-infrared person re-identification with auxiliary samples,
Y . Du, C. Lei, Z. Zhao, Y . Dong, and F. Su, “Video-based visible-infrared person re-identification with auxiliary samples,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1313–1325, 2023
work page 2023
-
[15]
Multi-memory matching for unsupervised visible-infrared person re- identification,
J. Shi, X. Yin, Y . Chen, Y . Zhang, Z. Zhang, Y . Xie, and Y . Qu, “Multi-memory matching for unsupervised visible-infrared person re- identification,” inEuropean Conference on Computer Vision, 2024, pp. 456–474
work page 2024
-
[16]
Dual consistency-constrained learning for unsupervised visible-infrared person re-identification,
B. Yang, J. Chen, C. Chen, and M. Ye, “Dual consistency-constrained learning for unsupervised visible-infrared person re-identification,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1767– 1779, 2023
work page 2023
-
[17]
B. Yang, M. Ye, J. Chen, and Z. Wu, “Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re- identification,” inProceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 2843–2851
work page 2022
-
[18]
A density-based algorithm for discovering clusters in large spatial databases with noise,
M. Ester, H.-P. Kriegel, J. Sander, X. Xuet al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” inkdd, 1996, pp. 226–231
work page 1996
-
[19]
Unsupervised tracklet person re- identification,
M. Li, X. Zhu, and S. Gong, “Unsupervised tracklet person re- identification,”IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 7, pp. 1770–1782, 2019
work page 2019
-
[20]
Hierarchical discriminative learning for visible thermal person re-identification,
M. Ye, X. Lan, J. Li, and P. C. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” inProceedings of the AAAI conference on artificial intelligence, 2018, pp. 7501–7508
work page 2018
-
[21]
Cross-modality person re- identification via modality-aware collaborative ensemble learning,
M. Ye, X. Lan, Q. Leng, and J. Shen, “Cross-modality person re- identification via modality-aware collaborative ensemble learning,”IEEE Transactions on Image Processing, vol. 29, pp. 9387–9399, 2020
work page 2020
-
[22]
Structure-aware positional transformer for visible-infrared person re-identification,
C. Chen, M. Ye, M. Qi, J. Wu, J. Jiang, and C.-W. Lin, “Structure-aware positional transformer for visible-infrared person re-identification,”IEEE Transactions on Image Processing, vol. 31, pp. 2352–2364, 2022
work page 2022
-
[23]
Dual-stream transformer with distribution alignment for visible-infrared person re- identification,
Z. Chai, Y . Ling, Z. Luo, D. Lin, M. Jiang, and S. Li, “Dual-stream transformer with distribution alignment for visible-infrared person re- identification,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6764–6776, 2023
work page 2023
-
[24]
Cross- modality transformer for visible-infrared person re-identification,
K. Jiang, T. Zhang, X. Liu, B. Qian, Y . Zhang, and F. Wu, “Cross- modality transformer for visible-infrared person re-identification,” in European conference on computer vision, 2022, pp. 480–496
work page 2022
-
[25]
H. Liu, D. Xia, and W. Jiang, “Towards homogeneous modality learning and multi-granularity information exploration for visible-infrared person re-identification,”IEEE Journal of selected topics in signal processing, vol. 17, no. 3, pp. 545–559, 2023
work page 2023
-
[26]
Fmcnet: Feature-level modality compensation for visible-infrared person re-identification,
Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, “Fmcnet: Feature-level modality compensation for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 7349–7358
work page 2022
-
[27]
Y . Zhang and H. Wang, “Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re- identification,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 2153–2162
work page 2023
-
[28]
H. Liu, Z. Li, J. Gu, M. Wang, Q. J. Wu, and W. Jiang, “Stochastic style perturbation modelling for visible-infrared person re-identification with severely modality imbalance,”Neural Networks, p. 108206, 2025
work page 2025
-
[29]
Frequency domain nuances mining for visible-infrared person re-identification,
Y . Zhang, H. Wang, Y . Lu, Y . Yan, and X. Li, “Frequency domain nuances mining for visible-infrared person re-identification,”IEEE Transactions on Information Forensics and Security, 2025
work page 2025
-
[30]
Discovering multi- frequency embedding for visible-infrared person re-identification,
H. Gu, X. Yang, R. Lu, L. Pu, S. Han, and M. Wu, “Discovering multi- frequency embedding for visible-infrared person re-identification,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[31]
Wavelet-based frequency feature learning for visible-infrared person re-identification,
T. Yu, D. Cheng, H. Jiang, L. Chen, J. Qian, Q. Kou, and G. Zhai, “Wavelet-based frequency feature learning for visible-infrared person re-identification,”IEEE Transactions on Consumer Electronics, 2026
work page 2026
-
[32]
Homogeneous-to- heterogeneous: Unsupervised learning for rgb-infrared person re- identification,
W. Liang, G. Wang, J. Lai, and X. Xie, “Homogeneous-to- heterogeneous: Unsupervised learning for rgb-infrared person re- identification,”IEEE Transactions on Image Processing, vol. 30, pp. 6392–6407, 2021
work page 2021
-
[33]
Optimal transport for label-efficient visible-infrared person re-identification,
J. Wang, Z. Zhang, M. Chen, Y . Zhang, C. Wang, B. Sheng, Y . Qu, and Y . Xie, “Optimal transport for label-efficient visible-infrared person re-identification,” inEuropean conference on computer vision, 2022, pp. 93–109
work page 2022
-
[34]
Z. Wu and M. Ye, “Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9548–9558
work page 2023
-
[35]
Z. Pang, C. Wang, L. Zhao, Y . Liu, and G. Sharma, “Cross-modality hierarchical clustering and refinement for unsupervised visible-infrared person re-identification,”IEEE Transactions on circuits and systems for video technology, vol. 34, no. 4, pp. 2706–2718, 2023
work page 2023
-
[36]
Z. Li, H. Liu, X. Peng, and W. Jiang, “Inter-intra modality knowledge learning and clustering noise alleviation for unsupervised visible-infrared person re-identification,”Transactions on Knowledge and Data Engi- neering, vol. 36, no. 8, pp. 3934–3947, 2024
work page 2024
-
[37]
Shallow-deep collaborative learning for unsupervised visible-infrared person re-identification,
B. Yang, J. Chen, and M. Ye, “Shallow-deep collaborative learning for unsupervised visible-infrared person re-identification,” inProceedings of the IEEE/CVF international conference on computer vision, 2024, pp. 16 870–16 879
work page 2024
-
[38]
X. Yin, J. Shi, Z. Zhang, Y . Xie, and Y . Qu, “Adaptive pseudo-label purification and debiasing for unsupervised visible-infrared person re- identification,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[39]
Recurrent convolu- tional network for video-based person re-identification,
N. McLaughlin, J. M. Del Rincon, and P. Miller, “Recurrent convolu- tional network for video-based person re-identification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1325–1334
work page 2016
-
[40]
Z. Zhou, Y . Huang, W. Wang, L. Wang, and T. Tan, “See the forest for the trees: Joint spatial and temporal recurrent neural networks for video- based person re-identification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4747–4756
work page 2017
-
[41]
Appearance- preserving 3d convolution for video-based person re-identification,
X. Gu, H. Chang, B. Ma, H. Zhang, and X. Chen, “Appearance- preserving 3d convolution for video-based person re-identification,” in European conference on computer vision. Springer, 2020, pp. 228–243
work page 2020
-
[42]
Multi-scale 3d convolution network for video based person re-identification,
J. Li, S. Zhang, and T. Huang, “Multi-scale 3d convolution network for video based person re-identification,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8618– 8625
work page 2019
-
[43]
Pyramid spatial-temporal aggregation for video-based person re-identification,
Y . Wang, P. Zhang, S. Gao, X. Geng, H. Lu, and D. Wang, “Pyramid spatial-temporal aggregation for video-based person re-identification,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 026–12 035
work page 2021
-
[44]
G. Chen, Y . Rao, J. Lu, and J. Zhou, “Temporal coherence or temporal motion: Which is more critical for video-based person re-identification?” inEuropean conference on computer vision. Springer, 2020, pp. 660– 676
work page 2020
-
[45]
Temporal complemen- tary learning for video person re-identification,
R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen, “Temporal complemen- tary learning for video person re-identification,” inEuropean conference on computer vision. Springer, 2020, pp. 388–405
work page 2020
-
[46]
Spatial- temporal graph convolutional network for video-based person re- identification,
J. Yang, W.-S. Zheng, Q. Yang, Y .-C. Chen, and Q. Tian, “Spatial- temporal graph convolutional network for video-based person re- identification,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3289–3299
work page 2020
-
[47]
Sta: Spatial-temporal attention for large-scale video-based person re-identification,
Y . Fu, X. Wang, Y . Wei, and T. Huang, “Sta: Spatial-temporal attention for large-scale video-based person re-identification,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8287–8294
work page 2019
-
[48]
Multiscale aligned spatial–temporal interaction for video-based person re-identification,
Z. Ran, X. Wei, W. Liu, and X. Lu, “Multiscale aligned spatial–temporal interaction for video-based person re-identification,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8536– 8546, 2024
work page 2024
-
[49]
Spatial and temporal mutual promotion for video-based person re-identification,
Y . Liu, Z. Yuan, W. Zhou, and H. Li, “Spatial and temporal mutual promotion for video-based person re-identification,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8786–8793
work page 2019
-
[50]
H. Ma, C. Zhang, Z. Li, and Z. Wang, “Recursively learning fine-grained spatial–temporal features for video-based person re-identification,”En- gineering Applications of Artificial Intelligence, vol. 148, p. 110429, 2025. 14
work page 2025
-
[51]
Unsupervised person re-identification by deep learning tracklet association,
M. Li, X. Zhu, and S. Gong, “Unsupervised person re-identification by deep learning tracklet association,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 737–753
work page 2018
-
[52]
Unsupervised video person re-identification via noise and hard frame aware clustering,
P. Xie, X. Xu, Z. Wang, and T. Yamasaki, “Unsupervised video person re-identification via noise and hard frame aware clustering,”arXiv preprint arXiv:2106.05441, 2021
-
[53]
Successive consensus clustering for unsupervised video-based person re-identification,
J. Qian and X. Xie, “Successive consensus clustering for unsupervised video-based person re-identification,”IEEE Signal Processing Letters, vol. 29, pp. 822–826, 2022
work page 2022
-
[54]
P. Xie, X. Xu, Z. Wang, and T. Yamasaki, “Sampling and re- weighting: Towards diverse frame aware unsupervised video person re- identification,”IEEE Transactions on Multimedia, vol. 24, pp. 4250– 4261, 2022
work page 2022
-
[55]
C. Zhang, Y . Su, N. Wang, Y . Lan, T. Wang, and A. Li, “Dual represen- tation modeling and progressive contrastive learning for unsupervised video person re-identification,”Neurocomputing, vol. 645, p. 130467, 2025
work page 2025
-
[56]
H. Li, M. Liu, Z. Hu, F. Nie, and Z. Yu, “Intermediary-guided bidi- rectional spatial–temporal aggregation network for video-based visible- infrared person re-identification,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 4962–4972, 2023
work page 2023
-
[57]
Fa-net: A feature alignment network for video-based visible-infrared person re- identification,
X. Yang, W. Dong, X. Wang, D. Cheng, and N. Wang, “Fa-net: A feature alignment network for video-based visible-infrared person re- identification,”IEEE Transactions on Image Processing, vol. 34, pp. 8406–8420, 2025
work page 2025
-
[58]
Z. Zuo, H. Li, Y . Zhang, and M. Xie, “Spatio-temporal information mining and fusion feature-guided modal alignment for video-based visible-infrared person re-identification,”Image and Vision Computing, vol. 157, p. 105518, 2025
work page 2025
-
[59]
Spatial-temporal high-frequency learning for video-based visible-infrared person re- identification,
S. Tao, S. Li, J. Ye, N. Dong, F. Li, and H. Li, “Spatial-temporal high-frequency learning for video-based visible-infrared person re- identification,”IEEE Transactions on Circuits and Systems for Video Technology, 2026
work page 2026
-
[60]
Learning by aligning: Visible- infrared person re-identification using cross-modal correspondences,
H. Park, S. Lee, J. Lee, and B. Ham, “Learning by aligning: Visible- infrared person re-identification using cross-modal correspondences,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 046–12 055
work page 2021
-
[61]
Dynamic dual-attentive aggregation learning for visible-infrared person re- identification,
M. Ye, J. Shen, D. J. Crandall, L. Shao, and J. Luo, “Dynamic dual-attentive aggregation learning for visible-infrared person re- identification,” inEuropean conference on computer vision, 2020, pp. 229–247
work page 2020
-
[62]
Channel augmented joint learning for visible-infrared recognition,
M. Ye, W. Ruan, B. Du, and M. Z. Shou, “Channel augmented joint learning for visible-infrared recognition,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13 567–13 576
work page 2021
-
[63]
L. Xu, H. Li, Y . Zhang, and D. Tao, “Adversarial self-attack defense and spatial-temporal relation mining for visible-infrared video person re-identification,”International Journal of Machine Learning and Cy- bernetics, vol. 16, no. 10, pp. 7843–7858, 2025
work page 2025
-
[64]
Shape-centered repre- sentation learning for visible–infrared person re-identification,
S. Li, J. Leng, J. Gan, M. Mo, and X. Gao, “Shape-centered repre- sentation learning for visible–infrared person re-identification,”Pattern Recognition, vol. 167, p. 111756, 2025
work page 2025
-
[65]
Cluster contrast for unsupervised person re-identification,
Z. Dai, G. Wang, W. Yuan, S. Zhu, and P. Tan, “Cluster contrast for unsupervised person re-identification,” inProceedings of the Asian conference on computer vision, 2022, pp. 1142–1160
work page 2022
-
[66]
Cross-camera discriminative person association by unsupervised frame clustering and selection,
Q. Li, M. Gao, G. Zhang, W. Zhai, and G. Jeon, “Cross-camera discriminative person association by unsupervised frame clustering and selection,”IEEE Internet of Things Journal, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.