pith. machine review for the scientific record. sign in

arxiv: 2605.04977 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 1 theorem link

ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:31 UTC · model grok-4.3

classification 💻 cs.CV
keywords person re-identificationtop-viewRGB-Depthcross-modal retrievalprivacy-preservingbenchmark datasetcompetition
0
0 comments X

The pith

A new top-view RGB-Depth dataset and three-track competition show cross-modal retrieval is harder than RGB or depth alone, yet modality-invariant methods achieve strong results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports an ICPR 2026 competition that releases a dataset of 86 identities filmed from overhead by four synchronized RGB-Depth cameras, with viewpoints deliberately varied across flat, ascent, descent, and oblique angles. Three evaluation tracks are defined: matching within RGB, within Depth, and across the RGB-Depth boundary. Participant results establish a consistent ordering of difficulty, with RGB easiest, Depth intermediate, and cross-modal hardest, while also demonstrating that methods learning features unchanged by modality can still reach competitive accuracy. The authors release the full dataset, evaluation code, and documentation to create a public, reproducible test bed for privacy-preserving overhead re-identification.

Core claim

The TVRID competition and its 86-identity dataset establish a benchmark in which RGB re-identification outperforms depth re-identification, which in turn outperforms RGB-to-Depth cross-modal retrieval, while modality-invariant learning still permits strong performance across all tracks under a unified mAP and CMC-1 protocol.

What carries the argument

The TVRID dataset of paired RGB and Depth streams captured by four overhead RealSense D455 cameras across structured geometric viewpoints, together with the three-track evaluation protocol (RGB, Depth, RGB↔Depth) scored by mAP and CMC-1.

Load-bearing premise

The 86-identity collection with its specific flat, ascent, descent, and oblique overhead viewpoints is representative enough of real-world top-view privacy constraints and appearance variation.

What would settle it

An experiment on a substantially larger or more varied set of identities and camera placements that either eliminates the observed RGB > Depth > Cross-Modal difficulty ordering or shows that modality-invariant methods no longer produce competitive scores.

Figures

Figures reproduced from arXiv: 2605.04977 by Hazem Wannous, Laurent Guimas, Rapha\"el Del\'ecluse.

Figure 1
Figure 1. Figure 1: Acquisition schematic. Top-view layout with four overhead cameras along a short path including a stepladder segment. Distances are chosen to obtain succes￾sive, partially overlapping observations with distinct geometry (flat, ascent, descent, oblique). 15 FPS, with synchronized RGB and Depth streams. For each passage, subjects are captured twice per camera (IN/OUT), producing eight observation sets per pas… view at source ↗
Figure 2
Figure 2. Figure 2: 2×2 camera layout. Each camera: RGB row (top) and Depth row (bottom), with IN/OUT columns. raw streams (640×480), a depth-driven detector provides person crops to remove the need for a separate detection stage; raw data remain available for end-to-end pipelines. 2.2 Description of Tracks Three official tracks were evaluated: 1. Track 1 – RGB Re-ID: Standard person re-identification from RGB im￾ages. 2. Tra… view at source ↗
read the original abstract

This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes three tracks: RGB Re-ID, Depth Re-ID, and RGB$\leftrightarrow$Depth cross-modal retrieval. Submissions are ranked using mAP and CMC-1 under a unified server-side evaluation. The final results show a clear difficulty ordering (RGB $>$ Depth $>$ Cross-Modal), highlighting both the challenge of modality-constrained retrieval and the feasibility of strong performance with modality-invariant learning. By releasing the dataset at https://zenodo.org/records/17909410, the evaluation scripts at https://github.com/RaphaelDel/ICPR-TVRID, and the accompanying documentation, TVRID establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-id.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript reports the outcomes of the ICPR 2026 TVRID competition on privacy-preserving person re-identification from top-view RGB-Depth cameras. It introduces the TVRID dataset with 86 identities captured by four synchronized overhead Intel RealSense D455 cameras providing paired RGB and depth streams across flat, ascent, descent, and oblique viewpoints. The competition defines three tracks (RGB Re-ID, Depth Re-ID, and RGB↔Depth cross-modal retrieval) evaluated under a unified server-side protocol using mAP and CMC-1. The central empirical finding is a clear difficulty ordering RGB > Depth > Cross-Modal, accompanied by descriptions of top entries; the paper releases the dataset, evaluation scripts, and documentation to support reproducibility.

Significance. If the reported ordering holds, the work establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-identification that emphasizes privacy through capture geometry rather than post-processing. The explicit release of the Zenodo dataset, GitHub evaluation scripts, and documentation is a clear strength that enables direct follow-on research and verification. The results usefully illustrate both the added challenge of modality-constrained retrieval and the viability of modality-invariant approaches, providing a concrete reference point for the community.

major comments (1)
  1. [Results] The central claim that the final results exhibit a 'clear difficulty ordering (RGB > Depth > Cross-Modal)' is load-bearing, yet the manuscript provides only a qualitative statement without a table or explicit numerical mAP and CMC-1 values for the top submissions in each track. Including such a summary table (with at least the top-3 entries per track) would allow readers to assess the magnitude of the gaps and the robustness of the ordering.
minor comments (2)
  1. [Dataset] The dataset description mentions 'structured geometric variation across flat, ascent, descent, and oblique viewpoints' but does not report the number of images or identities per viewpoint category or per camera; adding these basic statistics would improve interpretability of the reported difficulty ordering.
  2. [Abstract] The abstract summarizes results without participant method details; while the full text includes descriptions of top entries, a concise comparison table of the key technical choices (e.g., backbone, fusion strategy, loss) across the top three submissions per track would aid readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the TVRID competition paper and for the constructive suggestion regarding the presentation of results. We address the single major comment below.

read point-by-point responses
  1. Referee: [Results] The central claim that the final results exhibit a 'clear difficulty ordering (RGB > Depth > Cross-Modal)' is load-bearing, yet the manuscript provides only a qualitative statement without a table or explicit numerical mAP and CMC-1 values for the top submissions in each track. Including such a summary table (with at least the top-3 entries per track) would allow readers to assess the magnitude of the gaps and the robustness of the ordering.

    Authors: We agree that a compact numerical summary would strengthen the manuscript and make the claimed difficulty ordering more transparent. In the revised version we will insert a new table (placed in the Results section) that reports mAP and CMC-1 for the top-three submissions in each of the three tracks. The table will be accompanied by a short paragraph quantifying the observed gaps, thereby supporting the qualitative statement already present in the text. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical competition report

full rationale

The paper is a standard competition report presenting the TVRID dataset, evaluation protocol, and externally evaluated submission results. No derivations, equations, predictions, fitted parameters, or load-bearing self-citations exist. The reported difficulty ordering (RGB > Depth > Cross-Modal) is a direct empirical observation from server-side mAP/CMC-1 scoring on the released benchmark; it does not reduce to any internal construction, ansatz, or prior author result. The work is self-contained against external benchmarks and reproducible via the provided Zenodo/GitHub links.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a competition report and dataset release without mathematical models or derivations; therefore it introduces no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5520 in / 1177 out tokens · 39760 ms · 2026-05-08T18:31:54.719880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 2 canonical work pages

  1. [1]

    In: 2015 IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR)

    Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: 2015 IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 3908–3916 (Jun 2015), iSSN: 1063-6919

  2. [2]

    In: 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)

    Delécluse, R., Wannous, H., Guimas, L.: Privacy-Preserving Person Re- Identification from Temporal Sequences with Transformer and Hungarian Opti- mization. In: 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG). pp. 1–10 (May 2025), iSSN: 2770-8330

  3. [3]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

  4. [4]

    Fuentes-Jimenez, D., Gutierrez, C.L., Guarasa, J.M., Luna, C., Pizarro, D.: Depth Person detection database (GFPD) (2020),https://www.kaggle.com/dsv/ 1664233

  5. [5]

    In: Gong, S., Xiang, T

    Gong, S., Xiang, T.: Person Re-identification. In: Gong, S., Xiang, T. (eds.) Visual Analysis of Behaviour: From Pixels to Semantics, pp. 301–313. Springer, London (2011)

  6. [6]

    In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017

    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re- identification. arXiv preprint arXiv:1703.07737 (2017)

  7. [7]

    3D LiDAR-based Person Detection on Mobile Robots (Jul 2022), arXiv:2106.11239 [cs]

    Jia, D., Hermans, A., Leibe, B.: 2D vs. 3D LiDAR-based Person Detection on Mobile Robots (Jul 2022), arXiv:2106.11239 [cs]

  8. [8]

    In: 2017 International Conference of the Biometrics Special Interest Group (BIOSIG)

    Lejbolle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Multimodal Neural Net- work for Overhead Person Re-Identification. In: 2017 International Conference of the Biometrics Special Interest Group (BIOSIG). pp. 1–5. IEEE, Darmstadt, Ger- many (Sep 2017)

  9. [9]

    IEEE Transactions on Information Foren- sics and Security15, 1216–1231 (2020), conference Name: IEEE Transactions on Information Forensics and Security

    Lejbølle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Person Re-Identification Using Spatial and Layer-Wise Attention. IEEE Transactions on Information Foren- sics and Security15, 1216–1231 (2020), conference Name: IEEE Transactions on Information Forensics and Security

  10. [10]

    IEEE Transactions on Circuits and Systems for Video Technology30(4), 1092– 1108 (Apr 2020), conference Name: IEEE Transactions on Circuits and Systems for Video Technology

    Leng, Q., Ye, M., Tian, Q.: A Survey of Open-World Person Re-Identification. IEEE Transactions on Circuits and Systems for Video Technology30(4), 1092– 1108 (Apr 2020), conference Name: IEEE Transactions on Circuits and Systems for Video Technology

  11. [11]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Liang, X., Rawat, Y.S.: Differ: Disentangling identity features via semantic cues for clothes-changing person re-id. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 13980–13989 (2025) 14 R. Delécluse et al

  12. [12]

    In: Nas- rollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q

    Liciotti, D., Paolanti, M., Frontoni, E., Mancini, A., Zingaretti, P.: Person Re- identification Dataset with RGB-D Camera in a Top-View Configuration. In: Nas- rollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q. (eds.) Video Analytics. Face and Facial Expression Recognition and Audience Measurement. pp. 1–11. Springer ...

  13. [13]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

  14. [14]

    IEEE Access 8, 67756–67765 (2020), conference Name: IEEE Access

    Martini, M., Paolanti, M., Frontoni, E.: Open-World Person Re-Identification With RGBD Camera in Top-View Configuration for Retail Applications. IEEE Access 8, 67756–67765 (2020), conference Name: IEEE Access

  15. [15]

    Knowledge-Based Systems283, 111155 (Jan 2024)

    Mukhtar, H., Khan, M.U.G.: CMOT: A cross-modality transformer for RGB-D fu- sion in person re-identification with online learning capabilities. Knowledge-Based Systems283, 111155 (Jan 2024)

  16. [16]

    In: Gong, S., Cristani, M., Yan, S., Loy, C.C

    Munaro, M., Fossati, A., Basso, A., Menegatti, E., Van Gool, L.: One-Shot Person Re-identification with a Consumer Depth Camera. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification, pp. 161–181. Springer, London (2014)

  17. [17]

    Machine Vision and Applications31(7), 66 (Sep 2020)

    Paolanti, M., Pietrini, R., Mancini, A., Frontoni, E., Zingaretti, P.: Deep under- standing of shopper behaviours and interactions using RGB-D vision. Machine Vision and Applications31(7), 66 (Sep 2020)

  18. [18]

    Sensors18(10), 3471 (Oct 2018), number: 10 Publisher: Mul- tidisciplinary Digital Publishing Institute

    Paolanti, M., Romeo, L., Liciotti, D., Pietrini, R., Cenci, A., Frontoni, E., Zin- garetti, P.: Person Re-Identification with RGB-D Camera in Top-View Configura- tion through Multiple Nearest Neighbor Classifiers and Neighborhood Component Features Selection. Sensors18(10), 3471 (Oct 2018), number: 10 Publisher: Mul- tidisciplinary Digital Publishing Institute

  19. [19]

    International Journal of Computer Vision132(1), 238–260 (Jan 2024)

    Rao, H., Leung, C., Miao, C.: Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification. International Journal of Computer Vision132(1), 238–260 (Jan 2024)

  20. [20]

    IEEE Transactions on Image Processing26(6), 2588–2603 (Jun 2017), conference Name: IEEE Transactions on Image Processing

    Wu, A., Zheng, W.S., Lai, J.H.: Robust Depth-Based Person Re-Identification. IEEE Transactions on Image Processing26(6), 2588–2603 (Jun 2017), conference Name: IEEE Transactions on Image Processing

  21. [21]

    ACM Trans

    Wu, J., Jiang, J., Qi, M., Chen, C., Zhang, J.: An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification. ACM Trans. Multimedia Comput. Commun. Appl.18(4), 109:1–109:22 (Mar 2022)

  22. [22]

    Patterns3(7) (2022)

    Xu, Z., Escalera, S., Pavão, A., Richard, M., Tu, W.W., Yao, Q., Zhao, H., Guyon, I.: Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform. Patterns3(7) (2022)

  23. [23]

    IEEE Transactions on Pattern Analysis and Machine Intelligence44(6), 2872–2893 (Jun 2022), conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

    Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence44(6), 2872–2893 (Jun 2022), conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

  24. [24]

    Zhang, S., Luo, W., Cheng, D., Yang, Q., Ran, L., Xing, Y., Zhang, Y.: Cross- platform video person reid: A new benchmark dataset and adaptation approach (2024)

  25. [25]

    In: 2015 IEEE International Conference on Computer Vision (ICCV)

    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable Person Re- identification: A Benchmark. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1116–1124. IEEE, Santiago, Chile (Dec 2015)

  26. [26]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k- reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1318–1327 (2017)