Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion
Pith reviewed 2026-05-07 08:27 UTC · model grok-4.3
The pith
A multi-branch residual network fuses body shape, speed, and joint motion features from skeletal poses to improve gait recognition under clothing and view changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that constructing three separate feature branches for body proportion, gait velocity, and skeletal motion, processing them with a 50-layer residual network, and integrating them through a Multi-Branch Feature Fusion module that learns branch contributions via activation parameters produces more discriminative gait representations than prior skeleton-based pipelines, yielding 94.52% Rank-1 accuracy on normal walking sequences and the strongest coat-wearing results on the CASIA-B cross-view benchmark.
What carries the argument
The Multi-Branch Feature Fusion (MFF) module, which applies channel-wise attention to learn and apply dynamic weights that combine the outputs of the body-proportion, gait-velocity, and skeletal-motion branches.
If this is right
- The three-branch design extracts both static shape cues and dynamic motion cues that remain useful when clothing changes hide surface appearance.
- The residual backbone supplies hierarchical features that capture fine spatial detail from the initial HRNet keypoint maps even at lower image resolutions.
- Cross-view performance improves because the fusion step lets the network emphasize whichever branch is least affected by the current camera angle.
- The reported 94.52% normal-walking accuracy and leading coat-wearing score establish a new reference point for skeleton-only gait systems on CASIA-B.
Where Pith is reading between the lines
- The same branch-and-fuse pattern could be tried on other time-varying skeletal tasks such as action classification or fall detection.
- If the learned weights prove stable, the method might reduce reliance on hand-crafted gait descriptors in favor of end-to-end learned combinations.
- Pairing the skeleton pipeline with low-resolution RGB input could test whether the fusion module still adds value when richer appearance cues are already available.
- Re-training the fusion parameters on a larger, more diverse collection of walking videos would indicate how much the current numbers depend on CASIA-B specifics.
Load-bearing premise
The learned weights inside the fusion module will continue to assign useful contributions to each branch when the walking patterns differ from those seen during training on the CASIA-B collection.
What would settle it
Running the same trained model on a fresh gait dataset recorded with different cameras, subjects, or lighting and finding that its Rank-1 accuracy falls below that of the best competing skeleton-based method on the same new data.
Figures
read the original abstract
Gait recognition has emerged as a compelling biometric modality for surveillance and security applications, offering inherent advantages such as non-intrusiveness, resistance to disguise, and long-range identification capability. However, prevailing approaches struggle to comprehensively capture and exploit the rich biometric cues embedded in human locomotion, particularly under covariate interference including viewpoint variation, clothing change, and carrying conditions. In this paper, we present a high-precision gait recognition framework that deeply extracts and synergistically fuses gait dynamics with body shape characteristics through a multi-branch architecture grounded in deep residual learning. Specifically, we first employ the High-Resolution Network (HRNet) to perform robust skeletal keypoint estimation, preserving fine-grained spatial information even under low-resolution inputs. We then construct three complementary feature branches -- body proportion, gait velocity, and skeletal motion -- from the extracted pose sequences. A 50-layer Residual Network (ResNet-50) backbone is leveraged within a deep feature extraction module to capture hierarchically rich and discriminative representations. To effectively integrate heterogeneous feature streams, we design a Multi-Branch Feature Fusion (MFF) module inspired by channel-wise attention mechanisms, which dynamically allocates contribution weights across branches through learned activation parameters. Extensive experiments on the cross-view multi-condition CASIA-B benchmark demonstrate that our method achieves a Rank-1 accuracy of 94.52\% under normal walking, with the best recognition performance among skeleton-based methods for the coat-wearing condition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a skeleton-based gait recognition framework that first uses HRNet to extract keypoints from video, derives three complementary branches (body proportion, gait velocity, skeletal motion) from the pose sequences, extracts features with a ResNet-50 backbone, and fuses them via a Multi-Branch Feature Fusion (MFF) module inspired by channel-wise attention. The MFF is described as dynamically allocating branch weights through learned activation parameters. On the CASIA-B benchmark the method reports 94.52% Rank-1 accuracy under normal walking and the highest accuracy among skeleton-based approaches for the coat-wearing condition.
Significance. If the MFF module indeed supplies input-dependent, condition-adaptive fusion that measurably improves robustness to clothing and carrying covariates, the work would strengthen the case for multi-branch skeleton representations in gait biometrics. The reported CASIA-B numbers are competitive with recent skeleton methods, but the significance is currently limited by the absence of ablation evidence isolating the fusion contribution and by the lack of verification that the activation parameters are truly dynamic rather than globally learned scalars.
major comments (3)
- [MFF module description] MFF module (description in abstract and corresponding architecture section): the claim that the module 'dynamically allocates contribution weights across branches through learned activation parameters' is load-bearing for the central attribution of coat-wearing gains to synergistic fusion. It is unclear whether the activation parameters are computed per-sample from the branch features (as in standard squeeze-excitation or cross-attention) or are a fixed set of scalars learned once during training. If the latter, the allocation is static and the superiority on covariate conditions cannot be credited to the claimed dynamic fusion mechanism.
- [Experiments and ablation tables] Experimental section and tables: no ablation results are referenced that compare the full MFF-equipped model against (i) simple concatenation of the three branches, (ii) the ResNet-50 backbone alone, or (iii) the three branches with fixed (non-attention) weighting. Without these controls it is impossible to determine whether the reported 94.52% normal-walking Rank-1 and leading coat-wearing result are driven by the fusion module or by the branch design and backbone. In addition, the manuscript provides no error bars, multiple-run statistics, or significance tests for the cross-condition numbers.
- [Results tables] Table reporting coat-wearing results: the abstract asserts 'best recognition performance among skeleton-based methods' for the coat condition, yet the manuscript does not supply the full comparative table with exact numbers and standard deviations from the competing skeleton methods. This makes the 'best' claim unverifiable from the given information and weakens the cross-condition robustness argument.
minor comments (2)
- [Branch construction] The precise mathematical definitions of the three input branches (body proportion, gait velocity, skeletal motion) are only sketched; explicit equations showing how each is derived from the HRNet keypoint sequences would improve reproducibility.
- [Implementation details] The manuscript should state the training protocol (optimizer, learning-rate schedule, data augmentation, batch size) and whether any condition-specific hyper-parameter search was performed on the CASIA-B validation split.
Simulated Author's Rebuttal
We sincerely thank the referee for the valuable comments and suggestions. Below, we provide a point-by-point response to the major comments. We have revised the manuscript to incorporate the necessary changes and clarifications.
read point-by-point responses
-
Referee: [MFF module description] MFF module (description in abstract and corresponding architecture section): the claim that the module 'dynamically allocates contribution weights across branches through learned activation parameters' is load-bearing for the central attribution of coat-wearing gains to synergistic fusion. It is unclear whether the activation parameters are computed per-sample from the branch features (as in standard squeeze-excitation or cross-attention) or are a fixed set of scalars learned once during training. If the latter, the allocation is static and the superiority on covariate conditions cannot be credited to the claimed dynamic fusion mechanism.
Authors: We thank the referee for this important observation regarding the MFF module. The design of the MFF is intended to provide dynamic, input-dependent weighting, as the activation parameters are generated from the input branch features rather than being fixed scalars. To resolve any ambiguity, we have updated the manuscript's architecture description in Section 3 to include explicit details on how the weights are computed dynamically for each sample, along with the relevant equations. This revision ensures that the dynamic nature of the fusion is clearly documented and supports the attribution of performance gains to the fusion mechanism. revision: yes
-
Referee: [Experiments and ablation tables] Experimental section and tables: no ablation results are referenced that compare the full MFF-equipped model against (i) simple concatenation of the three branches, (ii) the ResNet-50 backbone alone, or (iii) the three branches with fixed (non-attention) weighting. Without these controls it is impossible to determine whether the reported 94.52% normal-walking Rank-1 and leading coat-wearing result are driven by the fusion module or by the branch design and backbone. In addition, the manuscript provides no error bars, multiple-run statistics, or significance tests for the cross-condition numbers.
Authors: We agree with the referee that additional ablation studies would strengthen the experimental validation. In the revised manuscript, we will include a new table presenting results for the full model versus (i) simple concatenation of branches, (ii) ResNet-50 on a single combined feature stream, and (iii) fixed weighting. Furthermore, we will report mean and standard deviation from multiple runs to provide statistical context for the results. revision: yes
-
Referee: [Results tables] Table reporting coat-wearing results: the abstract asserts 'best recognition performance among skeleton-based methods' for the coat condition, yet the manuscript does not supply the full comparative table with exact numbers and standard deviations from the competing skeleton methods. This makes the 'best' claim unverifiable from the given information and weakens the cross-condition robustness argument.
Authors: We agree that providing the full comparative data would make the claim more verifiable. We have added a comprehensive table in the revised version that includes the exact Rank-1 accuracies and, where available, standard deviations from all referenced skeleton-based methods on the coat-wearing condition. This allows readers to confirm our method's leading performance. revision: yes
Circularity Check
Empirical neural architecture with benchmark evaluation exhibits no circularity
full rationale
The paper describes a standard deep learning pipeline: HRNet pose estimation followed by three hand-crafted feature branches, ResNet-50 extraction, and an MFF fusion module whose weights are learned during training. All performance claims are Rank-1 accuracies measured on held-out splits of the public CASIA-B benchmark. No mathematical derivation, uniqueness theorem, or first-principles prediction is offered; therefore no step can reduce to its own inputs by construction. The MFF description (“dynamically allocates … through learned activation parameters”) is an architectural claim whose correctness is tested empirically rather than assumed. No self-citation load-bearing steps appear in the abstract or described method.
Axiom & Free-Parameter Ledger
free parameters (2)
- ResNet-50 and fusion-module weights
- MFF attention scaling parameters
axioms (2)
- domain assumption HRNet produces sufficiently accurate keypoints on low-resolution gait videos
- domain assumption The three chosen feature streams (body proportion, gait velocity, skeletal motion) are complementary and sufficient
Reference graph
Works this paper leans on
-
[1]
Realtime multi-person 2D pose estimation using part affin- ity fields
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2D pose estimation using part affin- ity fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7291– 7299, 2017
2017
-
[2]
GaitSet: cross-view gait recognition through utilizing gait as a deep set.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3467–3478, 2022
Hanqing Chao, Kun Wang, Yiwei He, Jianping Zhang, and Jianfeng Feng. GaitSet: cross-view gait recognition through utilizing gait as a deep set.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3467–3478, 2022
2022
-
[3]
GaitGCI: generative counterfactual interven- tion for gait recognition
Huanzhang Dou, Pengyi Zhang, Wei Su, Yunlong Yu, Yiying Lin, and Xi Li. GaitGCI: generative counterfactual interven- tion for gait recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5765–5774, 2022
2022
-
[4]
OpenGait: revisiting gait recognition toward better practicality
Chao Fan, Junhao Liang, Chuanfu Shen, Saihui Hou, Yongzhen Huang, and Shiqi Yu. OpenGait: revisiting gait recognition toward better practicality. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9707–9716, 2023
2023
-
[5]
SkeletonGait: gait recognition using skeleton maps
Chao Fan, Junzhe Ma, Dongyang Jin, Chuanfu Shen, and Shiqi Yu. SkeletonGait: gait recognition using skeleton maps. InProceedings of the 38th AAAI Conference on Arti- ficial Intelligence, pages 1662–1669, 2024
2024
-
[6]
GaitPart: temporal part-based model for gait recognition
Chao Fan, Yunjie Peng, Chunshui Cao, Xu Liu, Saihui Hou, Jiannan Chi, Yongzhen Huang, Qing Li, and Zhiqiang He. GaitPart: temporal part-based model for gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 14213–14221, 2020
2020
-
[7]
DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile doc- ument understanding.Science China Information Sciences, 2024
Hao Feng, Qi Liu, Hao Liu, Jingqun Tang, Wei Zhou, Hao Li, and Can Huang. DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile doc- ument understanding.Science China Information Sciences, 2024
2024
-
[8]
Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wei Zhou, Hao Li, and Can Huang. UniDoc: a universal large multimodal model for simultaneous text detection, recognition, spotting and understanding. InarXiv preprint arXiv:2308.11592, 2023
-
[9]
Dolphin: document image parsing via heterogeneous anchor prompting
Hao Feng, Shuai Wei, Xiang Fei, Wenhui Shi, Yi Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, et al. Dolphin: document image parsing via heterogeneous anchor prompting. InFindings of the As- sociation for Computational Linguistics: ACL 2025, pages 21919–21936, 2025
2025
-
[10]
GPGait: generalized pose-based gait recognition
Yang Fu, Shibei Meng, Saihui Hou, Xuecai Hu, and Yongzhen Huang. GPGait: generalized pose-based gait recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19538–19547, 2023
2023
-
[11]
Per- son re-identification method based on multi-branch fusion attention mechanism.Computer Engineering and Design, 43(8):2260–2267, 2022
Tong Guo, Qian Zhao, Yan Zhao, and Chenglong Wang. Per- son re-identification method based on multi-branch fusion attention mechanism.Computer Engineering and Design, 43(8):2260–2267, 2022
2022
-
[12]
Deep residual learning for image recognition.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
2016
-
[13]
GaitDAN: cross-view gait recognition via adversarial domain adaptation.IEEE Transactions on Circuits and Systems for Video Technology, 34(9):8026–8040, 2024
Tianhao Huang, Xianye Ben, Chen Gong, Wenzheng Xu, Qiang Wu, and Huicheng Zhou. GaitDAN: cross-view gait recognition via adversarial domain adaptation.IEEE Transactions on Circuits and Systems for Video Technology, 34(9):8026–8040, 2024
2024
-
[14]
Context- sensitive temporal feature learning for gait recognition
Xiaohu Huang, Duowang Zhu, Hao Wang, Xinggang Wang, Bo Yang, Botao He, Wenyu Liu, and Bin Feng. Context- sensitive temporal feature learning for gait recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12909–12918, 2021
2021
-
[15]
Deep learn- ing based two-dimension human pose estimation: a critical analysis.Journal of Image and Graphics, 28(7):1965–1989, 2023
Yinghui Kong, Yinfeng Qin, and Ke Zhang. Deep learn- ing based two-dimension human pose estimation: a critical analysis.Journal of Image and Graphics, 28(7):1965–1989, 2023
1965
-
[16]
Hierarchical feature fusion attention network for im- age super-resolution reconstruction.Journal of Image and Graphics, 25(9):1773–1786, 2020
Pengcheng Lei, Cong Liu, Jiangang Tang, and Dunlu Peng. Hierarchical feature fusion attention network for im- age super-resolution reconstruction.Journal of Image and Graphics, 25(9):1773–1786, 2020
2020
-
[17]
Gait recognition via semi-supervised disen- tangled representation learning to identity and covariate fea- tures
Xiang Li, Yasushi Makihara, Chi Xu, Yasushi Yagi, and Mingwu Ren. Gait recognition via semi-supervised disen- tangled representation learning to identity and covariate fea- tures. pages 13309–13319, 2020
2020
-
[18]
Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing vari- ations
Rijun Liao, Chunshui Cao, Edel B Garcia, Shiqi Yu, and Yongzhen Huang. Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing vari- ations. InProceedings of the 12th Chinese Conference on Biometric Recognition, pages 474–483, 2017
2017
-
[19]
Gait recognition via effective global-local feature representation and local tem- poral aggregation
Beibei Lin, Shunli Zhang, and Xin Yu. Gait recognition via effective global-local feature representation and local tem- poral aggregation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 14648–14656, 2021
2021
-
[20]
SPTS v2: single-point scene text spotting.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(12):15477–15493, 2023
Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chun- hua Shen, Xiang Bai, et al. SPTS v2: single-point scene text spotting.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(12):15477–15493, 2023
2023
-
[21]
A bounding box is worth one token: interleav- ing layout and text in a large language model for document understanding
Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, et al. A bounding box is worth one token: interleav- ing layout and text in a large language model for document understanding. InFindings of the Association for Computa- tional Linguistics: ACL 2025, pages 7252–7273, 2025
2025
-
[22]
Gait recognition: a comprehensive survey on methods, datasets and evaluation metrics.IEEE Access, 11:83098–83120, 2023
Jashila Nair Mogan, Chin Poo Lee, and Kian Ming Lim. Gait recognition: a comprehensive survey on methods, datasets and evaluation metrics.IEEE Access, 11:83098–83120, 2023
2023
-
[23]
Learn- ing rich features for gait recognition by integrating skeletons and silhouettes
Yunjie Peng, Chao Fan, Chuanfu Shen, and Shiqi Yu. Learn- ing rich features for gait recognition by integrating skeletons and silhouettes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 4559–4567, 2024
2024
-
[24]
Deep gait recognition: a survey.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 45(1):264–284, 2023
Alireza Sepas-Moghaddam and Ali Etemad. Deep gait recognition: a survey.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 45(1):264–284, 2023
2023
-
[25]
MCTBench: Multi- modal cognition towards text-rich visual scenes bench- mark,
Biluo Shan, Xiang Fei, Wenhui Shi, Aolin Wang, Guozhi Tang, Lei Liao, Jingqun Tang, Xiang Bai, and Can Huang. MCTBench: multimodal cognition towards text-rich visual scenes benchmark.arXiv preprint arXiv:2410.11538, 2024. 10
-
[26]
LidarGait: benchmarking 3D gait recognition with point clouds
Chuanfu Shen, Chao Fan, Wei Wu, Rui Wang, George Q Huang, and Shiqi Yu. LidarGait: benchmarking 3D gait recognition with point clouds. pages 1054–1063, 2023
2023
-
[27]
GaitNet: an end-to-end network for gait based human identification.Pattern Recognition, 96:106988, 2019
Chunfeng Song, Yongzhen Huang, Wanli Ouyang, and Liang Wang. GaitNet: an end-to-end network for gait based human identification.Pattern Recognition, 96:106988, 2019
2019
-
[28]
Deep high-resolution representation learning for human pose es- timation
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. Deep high-resolution representation learning for human pose es- timation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5686– 5696, 2019
2019
-
[29]
TextSquare: Scaling up text-centric visual instruction tuning,
Jingqun Tang, Chunhui Lin, Zhen Zhao, Shuai Wei, Binghong Wu, Qi Liu, Yuang He, Kaixuan Lu, Hao Feng, Yu Li, et al. TextSquare: scaling up text-centric visual in- struction tuning.arXiv preprint arXiv:2404.12803, 2024
-
[30]
MTVQA: benchmarking multilingual text-centric vi- sual question answering
Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shuai Wei, Aolin Wang, Chunhui Lin, Hao Feng, Zhen Zhao, et al. MTVQA: benchmarking multilingual text-centric vi- sual question answering. InFindings of the Association for Computational Linguistics: ACL 2025, pages 7748–7763, 2025
2025
-
[31]
Optimal boxes: boosting end-to- end scene text recognition by adjusting annotated bound- ing boxes via reinforcement learning
Jingqun Tang, Wenqing Qian, Lei Song, Xiaolong Dong, Lan Li, and Xiang Bai. Optimal boxes: boosting end-to- end scene text recognition by adjusting annotated bound- ing boxes via reinforcement learning. InProceedings of the European Conference on Computer Vision, pages 233–248, 2022
2022
-
[32]
Few could be better than all: Feature sampling and grouping for scene text detection
Jingqun Tang, Wenqing Zhang, Hao Liu, Min-Kuan Yang, Bo Jiang, Guangliang Hu, and Xiang Bai. Few could be better than all: Feature sampling and grouping for scene text detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563– 4572, 2022
2022
-
[33]
Towards a deeper understand- ing of skeleton-based gait recognition
Torben Teepe, Johannes Gilg, Fabian Herzog, Stefan H¨ormann, and Gerhard Rigoll. Towards a deeper understand- ing of skeleton-based gait recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1569–1577, 2022
2022
-
[34]
GaitGraph: graph con- volutional network for skeleton-based gait recognition
Torben Teepe, Ali Khan, Johannes Gilg, Fabian Herzog, Ste- fan H ¨ormann, and Gerhard Rigoll. GaitGraph: graph con- volutional network for skeleton-based gait recognition. In Proceedings of the IEEE International Conference on Image Processing, pages 2314–2318, 2021
2021
-
[35]
PARGO: bridging vision-language with partial and global views
Aolin Wang, Biluo Shan, Wenhui Shi, Kevin Yi Lin, Xiang Fei, Guozhi Tang, Lei Liao, Jingqun Tang, Can Huang, et al. PARGO: bridging vision-language with partial and global views. InProceedings of the 38th AAAI Conference on Arti- ficial Intelligence, 2025
2025
-
[36]
WildDoc: how far are we from achieving comprehensive and robust document understanding in the wild? 2025
Aolin Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xi- ang Fei, Jinghui Lu, Han Wang, Hao Liu, Yuliang Liu, et al. WildDoc: how far are we from achieving comprehensive and robust document understanding in the wild? 2025
2025
-
[37]
Learning discriminative features with multiple gran- ularities for person re-identification
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple gran- ularities for person re-identification. InProceedings of the 26th ACM International Conference on Multimedia, pages 274–282, 2018
2018
-
[38]
DyGait: exploiting dynamic representations for high-performance gait recognition
Ming Wang, Xianda Guo, Beibei Lin, Tian Yang, Xin Yu, Shunli Zhang, and Xin Yu. DyGait: exploiting dynamic representations for high-performance gait recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13424–13433, 2023
2023
-
[39]
BigGait: learning gait representation you want by large vision models
Dingqiang Ye, Chao Fan, Junzhe Ma, Xiaoming Liu, and Shiqi Yu. BigGait: learning gait representation you want by large vision models. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 200–210, 2024
2024
-
[40]
A Hough transform based method for gait feature extraction.Journal of Image and Graphics, 10(10):1304–1309, 2005
Jing Yu, Juan Duan, and Kaina Su. A Hough transform based method for gait feature extraction.Journal of Image and Graphics, 10(10):1304–1309, 2005
2005
-
[41]
A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition
Shiqi Yu, Daoliang Tan, and Tieniu Tan. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. InProceedings of the 18th In- ternational Conference on Pattern Recognition, pages 441– 444, 2006
2006
-
[42]
TabPedia: towards comprehensive visual table under- standing with concept synergy
Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shuai Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Wei Zhou, et al. TabPedia: towards comprehensive visual table under- standing with concept synergy. InAdvances in Neural Infor- mation Processing Systems, 2024
2024
-
[43]
Multi-modal in-context learning makes an ego-evolving scene text recognizer
Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Hao Liu, Zeming Zhang, Xin Tan, Can Huang, and Yuan Xie. Multi-modal in-context learning makes an ego-evolving scene text recognizer. pages 15756–15766, 2023
2023
-
[44]
Harmonizing visual text comprehension and generation
Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shuai Wei, Hao Liu, Xin Tan, Zeming Zhang, Can Huang, and Yuan Xie. Harmonizing visual text comprehension and generation. InAdvances in Neural Information Processing Systems, 2024
2024
-
[45]
Gait recognition in the wild with multi-hop temporal switch
Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Cheng- gang Yan, and Tao Mei. Gait recognition in the wild with multi-hop temporal switch. pages 3111–3119, 2022
2022
-
[46]
Parsing is all you need for accurate gait recognition in the wild
Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Cheng- gang Yan, and Tao Mei. Parsing is all you need for accurate gait recognition in the wild. InProceedings of the 31st ACM International Conference on Multimedia, pages 3603–3612, 2023
2023
-
[47]
Gait recognition in the wild: A large-scale benchmark and NAS-based baseline
Zheng Zhu, Xianda Guo, Tian Yang, Junge Huang, Jiankang Deng, Guan Huang, Dalong Du, Jiwen Lu, and Jie Zhou. Gait recognition in the wild: A large-scale benchmark and NAS-based baseline. pages 14789–14798, 2021. 11
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.