pith. sign in

arxiv: 1907.11366 · v1 · pith:H2DTIOYCnew · submitted 2019-07-26 · 💻 cs.CV

MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks

Pith reviewed 2026-05-24 16:11 UTC · model grok-4.3

classification 💻 cs.CV
keywords baggage re-identificationmulti-view datasetSiamese networkairport imagingobject re-identificationpose variationsurface material labelslarge-scale dataset
0
0 comments X

The pith

MVB is the first large-scale public dataset for baggage re-identification, with 4519 identities captured via multi-view cameras across two real airports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MVB as a new dataset tailored to baggage re-identification, which differs from person re-identification in its high inter-class similarity and sensitivity to real-world imaging changes. It supplies 4519 baggage identities and 22660 annotated images plus surface material labels, all collected with a multi-view camera setup meant to capture more complete surface information despite pose shifts and occlusions. A merged Siamese network serves as the baseline model whose performance is measured on the data. A sympathetic reader would care because baggage tracking in airports currently lacks large, realistic training resources that reflect actual environmental differences between collection sites.

Core claim

The authors release MVB, the first publicly available large-scale dataset for baggage ReID containing 4519 identities and 22660 images together with surface material labels; all images come from a specially-designed multi-view camera system deployed in two real airport environments that differ markedly in imaging factors, with the system intended to obtain baggage surface 3D information as completely as possible in the presence of pose variation and occlusion. They further introduce a merged Siamese network baseline and report its evaluation results on the dataset.

What carries the argument

The specially-designed multi-view camera system that captures baggage from multiple angles to assemble more complete surface information despite pose and occlusion.

If this is right

  • Re-identification models can now be trained and tested on baggage data that includes both inter-class similarity and cross-environment imaging differences.
  • Surface material labels become available as an auxiliary signal for distinguishing visually similar items.
  • Benchmarks exist for merged Siamese architectures on this specific object category.
  • The dataset supports evaluation of methods that must generalize across two distinct real-world capture conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-view capture strategy could be applied to other rigid objects that must be tracked across camera networks.
  • Material labels might support downstream tasks such as automated sorting by surface type in logistics settings.
  • Performance gaps between the baseline and future models on MVB would quantify how much the baggage domain still differs from person re-identification.
  • The two-environment design supplies a ready test bed for domain-adaptation techniques without needing new data collection.

Load-bearing premise

The multi-view camera system actually gathers enough additional surface information to overcome the pose and occlusion problems that occur in the two airport environments.

What would settle it

A controlled experiment in which single-view subsets of MVB yield re-identification accuracy equal to or higher than the full multi-view version would falsify the claim that the multi-view capture is required to handle the stated variations.

Figures

Figures reproduced from arXiv: 1907.11366 by Dong Li, Jinhua Wu, Li Zhang, Yunda Sun, Zhulin Zhang.

Figure 1
Figure 1. Figure 1: Baggage ReID application and multi-view camera system at: (a) checkpoint (b) BHS. is the first publicly available baggage ReID dataset, which will enable utilizing deep learning methods on baggage ReID and benefit research and application on general ob￾ject ReID tasks. Additionally, we also propose baseline models using merged Siamese network with ablation study to understand how baggage ReID performance b… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of merged Siamese network. training data, meanwhile negative training pairs are randomly sampled among different identities, forming a training set balanced in positive and negative labels. The merged Siamese network is firstly trained on this balanced training set with a few epochs. Then the output model is utilized to inference each probe among 300 identities randomly sampled from 4019 ident… view at source ↗
Figure 3
Figure 3. Figure 3: Sample ReID results on MVB. Probe and Gallery images are not masked. Probe images are listed in the left in blue box. Gallery images are displayed in order of inferenced possibility. Gallery images with same identity as probe are bounded in green box, otherwise in red. (a) samples of baggage re-identified in top 3, (b) samples of baggage not re-identified in top 3. 6 Conclusion A new baggage ReID dataset n… view at source ↗
read the original abstract

In this paper, we present a novel dataset named MVB (Multi View Baggage) for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other. Moreover, we proposed a merged Siamese network as baseline model and evaluated its performance. Experiments and case study are conducted on MVB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce the MVB dataset for baggage re-identification, the first public large-scale collection with 4519 identities and 22660 annotated images plus surface material labels. Images are captured via a specially-designed multi-view camera system across two real airport environments to mitigate pose variation and occlusion. The work also proposes a merged Siamese network baseline and reports experiments and case studies on the dataset.

Significance. Release of a dataset at this scale with explicit multi-view capture and material labels would fill a gap in baggage ReID benchmarks, which differ from person ReID due to high inter-class similarity and environmental variation. The baseline provides an initial reference point for future models.

major comments (2)
  1. [Abstract] Abstract: the statement that the merged Siamese network 'evaluated its performance' is unsupported by any numerical results, rank-k accuracies, mAP values, or error bars; without these the baseline contribution cannot be assessed.
  2. [Data collection] Data collection description: the assertion that the multi-view system obtains '3D information of baggage surface as complete as possible' lacks quantitative support such as measured surface coverage percentages or occlusion rates across the two airport environments.
minor comments (1)
  1. Add explicit public release link, license, and download instructions in the camera-ready version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to improve clarity and support for the claims made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that the merged Siamese network 'evaluated its performance' is unsupported by any numerical results, rank-k accuracies, mAP values, or error bars; without these the baseline contribution cannot be assessed.

    Authors: We agree that the abstract would be strengthened by including quantitative results. The full manuscript contains experimental results for the merged Siamese network (including rank-k accuracies and mAP), but these were not summarized in the abstract. In the revision we will add the key performance metrics to the abstract. revision: yes

  2. Referee: [Data collection] Data collection description: the assertion that the multi-view system obtains '3D information of baggage surface as complete as possible' lacks quantitative support such as measured surface coverage percentages or occlusion rates across the two airport environments.

    Authors: The phrase was intended to describe the design objective of the multi-view camera rig. We did not perform explicit quantitative measurements of surface coverage or occlusion rates during data collection. We will revise the wording to remove the unsupported quantitative implication while retaining the description of the multi-view capture approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central contribution is the release of the MVB dataset with stated scale (4519 identities, 22660 images), capture method, and material labels, plus a baseline merged Siamese network for evaluation. No derivation chain, equations, fitted parameters presented as predictions, or self-citations are invoked to support load-bearing claims. The dataset introduction stands as an independent empirical contribution without reduction to its own inputs or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Contribution centers on empirical data collection rather than mathematical derivation; relies on standard computer vision practices for annotation and evaluation.

axioms (1)
  • domain assumption Standard assumptions in computer vision for image annotation, multi-view capture, and ReID evaluation hold for baggage images.
    The paper builds on typical CV dataset practices without stating novel axioms.

pith-pipeline@v0.9.0 · 5701 in / 1161 out tokens · 22059 ms · 2026-05-24T16:11:46.251675+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    Person Re-identification: Past, Present and Future

    Zheng, L., Yang, Y., Hauptmann A. G.: Person re-identification: past, present and future. arXiv preprint arXiv: 1610.02984 (2016)

  2. [2]

    In: CVPR, IEEE, pp

    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi- channel parts-based CNN with improved triplet loss function. In: CVPR, IEEE, pp. 13351344 (2016)

  3. [3]

    In: CVPR, IEEE, pp

    Liu, H., Tian, Y., Yang, Y., et al.: Deep relative distance learning: tell the difference be- tween similar vehicles. In: CVPR, IEEE, pp. 21672175 (2016)

  4. [4]

    In: Leibe, B., Matas, J., Sebe, N., Welling, M

    Liu, X., Liu, W., Mei T., et al.: A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 869-884. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6 53

  5. [5]

    ImageNet: a large-scale hierarchical image database

    Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: CVPR, IEEE, pp. 248-255 (2009)

  6. [6]

    In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T

    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740-755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1 48

  7. [7]

    In: ICCV, IEEE, pp

    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re- identification: a benchmark. In: ICCV, IEEE, pp. 11161124 (2015)

  8. [8]

    In: Leibe, B., Matas, J., Sebe, N., Welling, M

    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868-

  9. [9]

    https://doi.org/10.1007/978-3-319-46466-4 52

    Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4 52

  10. [10]

    In: CVPR, IEEE, pp

    Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR, IEEE, pp. 152159 (2014)

  11. [11]

    In: CVPR, IEEE, pp

    Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, IEEE, pp. 32133223 (2016)

  12. [12]

    In: NIPS, pp

    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 9199 (2015)

  13. [13]

    In: NIPS, pp

    Bromley, J., Guyon, I., LeCun, Y., et al.: Signature verification using a siamese time de-lay neural network. In: NIPS, pp. 737744 (1994)

  14. [14]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  15. [15]

    In: CVPR, IEEE, pp

    Hu, J., Shen L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, IEEE, pp. 7132- 7141 (2018)

  16. [16]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reduc- ing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  17. [17]

    In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y

    Sun, Y., Zheng, L., Yang, Y., et al.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501-518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0 30

  18. [18]

    Person transfer GAN to bridge domain gap for person re-identification

    Wei, L., Zhang, S., Gao, W., et al. Person transfer GAN to bridge domain gap for person re-identification. In: CVPR, IEEE, pp. 79-88 (2018)

  19. [19]

    In: CVPR, IEEE, pp

    Deng, W., Zheng, L., Ye, Q., et al.: Image-image domain adaptation with preserved self- similarity and domain-dissimilarity for person re-identification. In: CVPR, IEEE, pp. 994- 1003 (2018)

  20. [20]

    In: International Conference on Multimedia, ACM, pp

    Wang, G., Yuan, Y., Chen, X., et al.: Learning discriminative features with multiple granu- larities for person re-identification. In: International Conference on Multimedia, ACM, pp. 274282 (2018)