Recognition: 1 theorem link
ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)
Pith reviewed 2026-05-08 18:31 UTC · model grok-4.3
The pith
A new top-view RGB-Depth dataset and three-track competition show cross-modal retrieval is harder than RGB or depth alone, yet modality-invariant methods achieve strong results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The TVRID competition and its 86-identity dataset establish a benchmark in which RGB re-identification outperforms depth re-identification, which in turn outperforms RGB-to-Depth cross-modal retrieval, while modality-invariant learning still permits strong performance across all tracks under a unified mAP and CMC-1 protocol.
What carries the argument
The TVRID dataset of paired RGB and Depth streams captured by four overhead RealSense D455 cameras across structured geometric viewpoints, together with the three-track evaluation protocol (RGB, Depth, RGB↔Depth) scored by mAP and CMC-1.
Load-bearing premise
The 86-identity collection with its specific flat, ascent, descent, and oblique overhead viewpoints is representative enough of real-world top-view privacy constraints and appearance variation.
What would settle it
An experiment on a substantially larger or more varied set of identities and camera placements that either eliminates the observed RGB > Depth > Cross-Modal difficulty ordering or shows that modality-invariant methods no longer produce competitive scores.
Figures
read the original abstract
This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes three tracks: RGB Re-ID, Depth Re-ID, and RGB$\leftrightarrow$Depth cross-modal retrieval. Submissions are ranked using mAP and CMC-1 under a unified server-side evaluation. The final results show a clear difficulty ordering (RGB $>$ Depth $>$ Cross-Modal), highlighting both the challenge of modality-constrained retrieval and the feasibility of strong performance with modality-invariant learning. By releasing the dataset at https://zenodo.org/records/17909410, the evaluation scripts at https://github.com/RaphaelDel/ICPR-TVRID, and the accompanying documentation, TVRID establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-id.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the outcomes of the ICPR 2026 TVRID competition on privacy-preserving person re-identification from top-view RGB-Depth cameras. It introduces the TVRID dataset with 86 identities captured by four synchronized overhead Intel RealSense D455 cameras providing paired RGB and depth streams across flat, ascent, descent, and oblique viewpoints. The competition defines three tracks (RGB Re-ID, Depth Re-ID, and RGB↔Depth cross-modal retrieval) evaluated under a unified server-side protocol using mAP and CMC-1. The central empirical finding is a clear difficulty ordering RGB > Depth > Cross-Modal, accompanied by descriptions of top entries; the paper releases the dataset, evaluation scripts, and documentation to support reproducibility.
Significance. If the reported ordering holds, the work establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-identification that emphasizes privacy through capture geometry rather than post-processing. The explicit release of the Zenodo dataset, GitHub evaluation scripts, and documentation is a clear strength that enables direct follow-on research and verification. The results usefully illustrate both the added challenge of modality-constrained retrieval and the viability of modality-invariant approaches, providing a concrete reference point for the community.
major comments (1)
- [Results] The central claim that the final results exhibit a 'clear difficulty ordering (RGB > Depth > Cross-Modal)' is load-bearing, yet the manuscript provides only a qualitative statement without a table or explicit numerical mAP and CMC-1 values for the top submissions in each track. Including such a summary table (with at least the top-3 entries per track) would allow readers to assess the magnitude of the gaps and the robustness of the ordering.
minor comments (2)
- [Dataset] The dataset description mentions 'structured geometric variation across flat, ascent, descent, and oblique viewpoints' but does not report the number of images or identities per viewpoint category or per camera; adding these basic statistics would improve interpretability of the reported difficulty ordering.
- [Abstract] The abstract summarizes results without participant method details; while the full text includes descriptions of top entries, a concise comparison table of the key technical choices (e.g., backbone, fusion strategy, loss) across the top three submissions per track would aid readers.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the TVRID competition paper and for the constructive suggestion regarding the presentation of results. We address the single major comment below.
read point-by-point responses
-
Referee: [Results] The central claim that the final results exhibit a 'clear difficulty ordering (RGB > Depth > Cross-Modal)' is load-bearing, yet the manuscript provides only a qualitative statement without a table or explicit numerical mAP and CMC-1 values for the top submissions in each track. Including such a summary table (with at least the top-3 entries per track) would allow readers to assess the magnitude of the gaps and the robustness of the ordering.
Authors: We agree that a compact numerical summary would strengthen the manuscript and make the claimed difficulty ordering more transparent. In the revised version we will insert a new table (placed in the Results section) that reports mAP and CMC-1 for the top-three submissions in each of the three tracks. The table will be accompanied by a short paragraph quantifying the observed gaps, thereby supporting the qualitative statement already present in the text. revision: yes
Circularity Check
No significant circularity: empirical competition report
full rationale
The paper is a standard competition report presenting the TVRID dataset, evaluation protocol, and externally evaluated submission results. No derivations, equations, predictions, fitted parameters, or load-bearing self-citations exist. The reported difficulty ordering (RGB > Depth > Cross-Modal) is a direct empirical observation from server-side mAP/CMC-1 scoring on the released benchmark; it does not reduce to any internal construction, ansatz, or prior author result. The work is self-contained against external benchmarks and reproducible via the provided Zenodo/GitHub links.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: 2015 IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR)
Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: 2015 IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 3908–3916 (Jun 2015), iSSN: 1063-6919
2015
-
[2]
In: 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)
Delécluse, R., Wannous, H., Guimas, L.: Privacy-Preserving Person Re- Identification from Temporal Sequences with Transformer and Hungarian Opti- mization. In: 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG). pp. 1–10 (May 2025), iSSN: 2770-8330
2025
-
[3]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)
2019
-
[4]
Fuentes-Jimenez, D., Gutierrez, C.L., Guarasa, J.M., Luna, C., Pizarro, D.: Depth Person detection database (GFPD) (2020),https://www.kaggle.com/dsv/ 1664233
2020
-
[5]
In: Gong, S., Xiang, T
Gong, S., Xiang, T.: Person Re-identification. In: Gong, S., Xiang, T. (eds.) Visual Analysis of Behaviour: From Pixels to Semantics, pp. 301–313. Springer, London (2011)
2011
-
[6]
In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re- identification. arXiv preprint arXiv:1703.07737 (2017)
-
[7]
3D LiDAR-based Person Detection on Mobile Robots (Jul 2022), arXiv:2106.11239 [cs]
Jia, D., Hermans, A., Leibe, B.: 2D vs. 3D LiDAR-based Person Detection on Mobile Robots (Jul 2022), arXiv:2106.11239 [cs]
-
[8]
In: 2017 International Conference of the Biometrics Special Interest Group (BIOSIG)
Lejbolle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Multimodal Neural Net- work for Overhead Person Re-Identification. In: 2017 International Conference of the Biometrics Special Interest Group (BIOSIG). pp. 1–5. IEEE, Darmstadt, Ger- many (Sep 2017)
2017
-
[9]
IEEE Transactions on Information Foren- sics and Security15, 1216–1231 (2020), conference Name: IEEE Transactions on Information Forensics and Security
Lejbølle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Person Re-Identification Using Spatial and Layer-Wise Attention. IEEE Transactions on Information Foren- sics and Security15, 1216–1231 (2020), conference Name: IEEE Transactions on Information Forensics and Security
2020
-
[10]
IEEE Transactions on Circuits and Systems for Video Technology30(4), 1092– 1108 (Apr 2020), conference Name: IEEE Transactions on Circuits and Systems for Video Technology
Leng, Q., Ye, M., Tian, Q.: A Survey of Open-World Person Re-Identification. IEEE Transactions on Circuits and Systems for Video Technology30(4), 1092– 1108 (Apr 2020), conference Name: IEEE Transactions on Circuits and Systems for Video Technology
2020
-
[11]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Liang, X., Rawat, Y.S.: Differ: Disentangling identity features via semantic cues for clothes-changing person re-id. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 13980–13989 (2025) 14 R. Delécluse et al
2025
-
[12]
In: Nas- rollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q
Liciotti, D., Paolanti, M., Frontoni, E., Mancini, A., Zingaretti, P.: Person Re- identification Dataset with RGB-D Camera in a Top-View Configuration. In: Nas- rollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q. (eds.) Video Analytics. Face and Facial Expression Recognition and Audience Measurement. pp. 1–11. Springer ...
2017
-
[13]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)
2022
-
[14]
IEEE Access 8, 67756–67765 (2020), conference Name: IEEE Access
Martini, M., Paolanti, M., Frontoni, E.: Open-World Person Re-Identification With RGBD Camera in Top-View Configuration for Retail Applications. IEEE Access 8, 67756–67765 (2020), conference Name: IEEE Access
2020
-
[15]
Knowledge-Based Systems283, 111155 (Jan 2024)
Mukhtar, H., Khan, M.U.G.: CMOT: A cross-modality transformer for RGB-D fu- sion in person re-identification with online learning capabilities. Knowledge-Based Systems283, 111155 (Jan 2024)
2024
-
[16]
In: Gong, S., Cristani, M., Yan, S., Loy, C.C
Munaro, M., Fossati, A., Basso, A., Menegatti, E., Van Gool, L.: One-Shot Person Re-identification with a Consumer Depth Camera. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification, pp. 161–181. Springer, London (2014)
2014
-
[17]
Machine Vision and Applications31(7), 66 (Sep 2020)
Paolanti, M., Pietrini, R., Mancini, A., Frontoni, E., Zingaretti, P.: Deep under- standing of shopper behaviours and interactions using RGB-D vision. Machine Vision and Applications31(7), 66 (Sep 2020)
2020
-
[18]
Sensors18(10), 3471 (Oct 2018), number: 10 Publisher: Mul- tidisciplinary Digital Publishing Institute
Paolanti, M., Romeo, L., Liciotti, D., Pietrini, R., Cenci, A., Frontoni, E., Zin- garetti, P.: Person Re-Identification with RGB-D Camera in Top-View Configura- tion through Multiple Nearest Neighbor Classifiers and Neighborhood Component Features Selection. Sensors18(10), 3471 (Oct 2018), number: 10 Publisher: Mul- tidisciplinary Digital Publishing Institute
2018
-
[19]
International Journal of Computer Vision132(1), 238–260 (Jan 2024)
Rao, H., Leung, C., Miao, C.: Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification. International Journal of Computer Vision132(1), 238–260 (Jan 2024)
2024
-
[20]
IEEE Transactions on Image Processing26(6), 2588–2603 (Jun 2017), conference Name: IEEE Transactions on Image Processing
Wu, A., Zheng, W.S., Lai, J.H.: Robust Depth-Based Person Re-Identification. IEEE Transactions on Image Processing26(6), 2588–2603 (Jun 2017), conference Name: IEEE Transactions on Image Processing
2017
-
[21]
ACM Trans
Wu, J., Jiang, J., Qi, M., Chen, C., Zhang, J.: An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification. ACM Trans. Multimedia Comput. Commun. Appl.18(4), 109:1–109:22 (Mar 2022)
2022
-
[22]
Patterns3(7) (2022)
Xu, Z., Escalera, S., Pavão, A., Richard, M., Tu, W.W., Yao, Q., Zhao, H., Guyon, I.: Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform. Patterns3(7) (2022)
2022
-
[23]
IEEE Transactions on Pattern Analysis and Machine Intelligence44(6), 2872–2893 (Jun 2022), conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence44(6), 2872–2893 (Jun 2022), conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
2022
-
[24]
Zhang, S., Luo, W., Cheng, D., Yang, Q., Ran, L., Xing, Y., Zhang, Y.: Cross- platform video person reid: A new benchmark dataset and adaptation approach (2024)
2024
-
[25]
In: 2015 IEEE International Conference on Computer Vision (ICCV)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable Person Re- identification: A Benchmark. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1116–1124. IEEE, Santiago, Chile (Dec 2015)
2015
-
[26]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k- reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1318–1327 (2017)
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.