pith. sign in

arxiv: 1907.03424 · v1 · pith:AK5HWXFZnew · submitted 2019-07-08 · 💻 cs.RO · cs.CV· eess.IV

Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots

Pith reviewed 2026-05-25 01:29 UTC · model grok-4.3

classification 💻 cs.RO cs.CVeess.IV
keywords visual place recognitionSLAMbenchmark datasetdelivery robotsindoor navigationground truthsensor data collection
0
0 comments X

The pith

Segway delivery robots collected a year-long benchmark dataset for SLAM and place recognition in real office and mall environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Segway DRIVE benchmark to address the gap between academic SLAM datasets and real-world operations in tasks like food delivery. Data comes from a fleet of Segway robots performing routine indoor tasks over a year, equipped with a fisheye camera, IMU, wheel encoders, and a removable high-precision lidar for reference solutions. The sequences include planar motions, moving pedestrians, and changing lighting and environments that typically cause SLAM failures. New metrics are proposed to evaluate metric place recognition algorithms, and sample methods are tested on the data. A sympathetic reader would care because the benchmark supplies hundreds of sequences covering more than 50 km of indoor floors that better reflect deployment conditions.

Core claim

The Segway DRIVE benchmark is a dataset suite collected by a fleet of Segway delivery robots equipped with a global-shutter fisheye camera, consumer-grade IMU, low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As the robots carry out tasks in office buildings and shopping malls, the data spanning a year features planar motions, moving pedestrians, and changing environment and lighting that pose severe challenges for SLAM algorithms, and the benchmark includes several metrics to evaluate metric place recognition algorithms.

What carries the argument

The Segway DRIVE benchmark dataset suite, collected via synchronized onboard sensors plus temporary high-precision lidar references, which supplies representative sequences and evaluation metrics for metric place recognition.

If this is right

  • SLAM algorithms can be tested against ground truth on sequences that include real moving pedestrians and lighting variations.
  • Metric place recognition methods can be assessed using the proposed metrics on data from actual indoor operations.
  • The benchmark provides access to hundreds of sequences covering more than 50 km of indoor floors for standardized comparisons.
  • Ongoing robot operations will add more data over time, expanding coverage of changing environments.
  • Algorithms shown to work on this data are more likely to handle conditions encountered in autonomous indoor navigation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers of delivery robots could use the benchmark to identify which existing SLAM components break first under pedestrian traffic before field trials.
  • The dataset's focus on planar indoor motion may encourage new algorithms that explicitly exploit floor-plan constraints rather than full 6-DoF estimation.
  • Repeated traversals of the same buildings under varying lighting could support research on long-term map maintenance without requiring new collection campaigns.

Load-bearing premise

The reference solutions generated by the removable high-precision lidar are sufficiently accurate and the collected sequences adequately capture the failure modes of existing SLAM algorithms.

What would settle it

Independent measurements in the same environments showing that the lidar-based reference trajectories contain large errors, or new sequences from the robots revealing that the dataset lacks the claimed dynamic challenges.

Figures

Figures reproduced from arXiv: 1907.03424 by Fumin Pang, Jianzhu Huai, Yusen Qin, Zichong Chen.

Figure 1
Figure 1. Figure 1: First row shows a Segway delivery robot doing tasks in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A Segway delivery robot drawn with the sensor [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual odometry results on data 2018-08-02 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The sP RF indicates the regularity of localizations over distance. Each row shows the localizations in the pre-built map on the left and the histogram over traveled distance on the right for one test session. Except for the vertical axis of the histogram, all axes have a unit of meter. vehicle datasets,” The International Journal of Robotics Research, 2016. [Online]. Available: http://ijr.sagepub.com/conte… view at source ↗
read the original abstract

Visual place recognition and simultaneous localization and mapping (SLAM) have recently begun to be used in real-world autonomous navigation tasks like food delivery. Existing datasets for SLAM research are often not representative of in situ operations, leaving a gap between academic research and real-world deployment. In response, this paper presents the Segway DRIVE benchmark, a novel and challenging dataset suite collected by a fleet of Segway delivery robots. Each robot is equipped with a global-shutter fisheye camera, a consumer-grade IMU synced to the camera on chip, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As they routinely carry out tasks in office buildings and shopping malls while collecting data, the dataset spanning a year is characterized by planar motions, moving pedestrians in scenes, and changing environment and lighting. Such factors typically pose severe challenges and may lead to failures for SLAM algorithms. Moreover, several metrics are proposed to evaluate metric place recognition algorithms. With these metrics, sample SLAM and metric place recognition methods were evaluated on this benchmark. The first release of our benchmark has hundreds of sequences, covering more than 50 km of indoor floors. More data will be added as the robot fleet continues to operate in real life. The benchmark is available at http://drive.segwayrobotics.com/#/dataset/download.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the Segway DRIVE benchmark, a dataset suite collected by a fleet of Segway delivery robots equipped with a global-shutter fisheye camera, consumer-grade IMU synced to the camera, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. The dataset comprises hundreds of sequences spanning more than 50 km in indoor office and mall environments over a year, featuring planar motions, moving pedestrians, and varying lighting. The authors propose metrics for evaluating metric place recognition algorithms and report evaluations of sample SLAM and place recognition methods on the benchmark, with the data to be released publicly and expanded over time.

Significance. If the reference trajectories prove accurate, the benchmark would offer meaningful value by supplying long-term, real-deployment indoor data that captures operational challenges (pedestrians, lighting variation) not well represented in many existing SLAM datasets. The proposed evaluation metrics and sample results provide a concrete starting point for comparisons, and the ongoing data collection model is a practical strength for the community.

major comments (2)
  1. [sensor setup and reference generation] The description of reference trajectory generation (sensor setup and data collection sections) provides no quantitative error metrics, calibration procedure, cross-validation against wheel odometry or repeated passes, or accuracy assessment for the removable high-precision lidar. This directly undermines the central claim that the benchmark supplies reliable ground truth for evaluating SLAM and metric place recognition, as the accuracy needed to expose claimed failure modes remains unknown.
  2. [dataset characterization] The assertion that the collected sequences are representative of in situ operations and capture failure modes of existing algorithms (abstract and dataset characterization) rests solely on qualitative description without sequence selection criteria, quantitative coverage statistics, or comparison showing how the data exercises known SLAM weaknesses.
minor comments (2)
  1. [abstract] The abstract refers to 'several metrics' for place recognition without naming them; a forward reference to the specific section where they are defined would improve clarity.
  2. [sensor setup] Exact sensor models, resolutions, and synchronization details are mentioned at a high level but could be expanded in a table for full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of the benchmark's presentation. We address each major comment below and will revise the manuscript to incorporate additional details where feasible.

read point-by-point responses
  1. Referee: [sensor setup and reference generation] The description of reference trajectory generation (sensor setup and data collection sections) provides no quantitative error metrics, calibration procedure, cross-validation against wheel odometry or repeated passes, or accuracy assessment for the removable high-precision lidar. This directly undermines the central claim that the benchmark supplies reliable ground truth for evaluating SLAM and metric place recognition, as the accuracy needed to expose claimed failure modes remains unknown.

    Authors: We agree that the current manuscript lacks quantitative error metrics and detailed calibration procedures for the reference trajectories. The high-precision lidar is removable and used solely for ground-truth generation during data collection, with its accuracy relying on manufacturer specifications and standard SLAM-based fusion with the other sensors. In the revised manuscript, we will add a new subsection under sensor setup describing the calibration steps for the lidar, IMU, and encoders, along with any available cross-validation from repeated passes and wheel-odometry comparisons. We will also cite the lidar's specified accuracy and note the limitations of not having independent external validation (e.g., motion-capture). This addresses the concern without overstating the ground-truth precision. revision: yes

  2. Referee: [dataset characterization] The assertion that the collected sequences are representative of in situ operations and capture failure modes of existing algorithms (abstract and dataset characterization) rests solely on qualitative description without sequence selection criteria, quantitative coverage statistics, or comparison showing how the data exercises known SLAM weaknesses.

    Authors: The manuscript's characterization is primarily qualitative, based on the robots' year-long operational deployment. We will revise the dataset characterization section to include quantitative coverage statistics (e.g., total distance per environment type, number of sequences with documented pedestrian density or lighting changes) and explicit sequence selection criteria. A brief comparison to existing indoor SLAM datasets will be added to illustrate how the data targets specific weaknesses such as long-term appearance change and dynamic obstacles. These additions will be supported by the metadata already collected during operations. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release with no derivations or fitted predictions

full rationale

The paper is a benchmark dataset release describing collection of sequences by delivery robots equipped with cameras, IMU, encoders, and a removable lidar for reference trajectories. No equations, parameter fitting, predictions, or derivation chains appear in the abstract or described content. Claims of novelty, challenge, and representativeness are descriptive of the data characteristics (planar motion, pedestrians, lighting changes) rather than reductions of any result to its own inputs. Self-citations, if present, are not load-bearing for any central claim. The contribution is self-contained as data collection and metric proposal without circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset and benchmark paper with no mathematical derivations, free parameters, or invented entities; it relies on standard assumptions about sensor models and lidar-based ground truth that are not detailed in the abstract.

pith-pipeline@v0.9.0 · 5786 in / 987 out tokens · 23726 ms · 2026-05-25T01:29:08.694632+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    Burri, J

    M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial TABLE IV: Localization statistics for two sample data groups. sP RF - standard deviation of the place recognition frequencies, L - traveled distance,tn - dataset duration, Ntp - number of true positive localizations = #loc. - #outlier...

  2. [2]

    Available: http://ijr.sagepub.com/content/early/2016/01/ 21/0278364915620033.abstract 1, 2, 5

    [Online]. Available: http://ijr.sagepub.com/content/early/2016/01/ 21/0278364915620033.abstract 1, 2, 5

  3. [3]

    Penncosyvio: A challenging visual inertial odometry benchmark,

    B. Pfrommer, N. Sanket, K. Daniilidis, and J. Cleveland, “Penncosyvio: A challenging visual inertial odometry benchmark,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on . IEEE, 2017, pp. 3847–3854. 1

  4. [4]

    Summary maps for lifelong visual localization,

    P. M ¨uhlfellner, M. B ¨urki, M. Bosse, W. Derendarz, R. Philippsen, and P. Furgale, “Summary maps for lifelong visual localization,” Journal of Field Robotics, vol. 33, no. 5, pp. 561–590, 2016. 1, 2, 4

  5. [5]

    Vins on wheels,

    K. J. Wu, C. X. Guo, G. Georgiou, and S. I. Roumeliotis, “Vins on wheels,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 5155–5162. 2, 5

  6. [6]

    Keyframe-based visual–inertial odometry using nonlinear optimiza- tion,

    S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-based visual–inertial odometry using nonlinear optimiza- tion,” The International Journal of Robotics Research , vol. 34, no. 3, pp. 314–334, 2015. 2, 3, 5

  7. [7]

    Vins-mono: A robust and versatile monocular visual-inertial state estimator,

    T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics , no. 99, pp. 1–17, 2018. 2, 3, 5

  8. [8]

    Robust visual inertial odometry using a direct ekf-based approach,

    M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekf-based approach,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015, pp. 298–304. 2, 3, 5

  9. [9]

    A benchmark for the evaluation of rgb-d slam systems,

    J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent Robot Systems (IROS) , Oct

  10. [10]

    Are we ready for autonomous driving? the kitti vision benchmark suite,

    A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR) , 2012. 2, 4, 6

  11. [11]

    The m ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario,

    J.-L. Blanco-Claraco, F.- ´A. Moreno-Due ˜nas, and J. Gonz ´alez-Jim´enez, “The m ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario,” The International Journal of Robotics Research, vol. 33, no. 2, pp. 207–214, 2014. 2

  12. [12]

    University of michigan north campus long-term vision and lidar dataset,

    N. Carlevaris-Bianco, A. K. Ushani, and R. M. Eustice, “University of michigan north campus long-term vision and lidar dataset,” The International Journal of Robotics Research , vol. 35, no. 9, pp. 1023– 1035, 2016. 2

  13. [13]

    1 year, 1000 km: The oxford robotcar dataset,

    W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017. [Online]. Available: https://doi.org/10.1177/0278364916679498 2

  14. [14]

    The tum vi benchmark for evaluating visual-inertial odometry,

    D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St ¨uckler, and D. Cre- mers, “The tum vi benchmark for evaluating visual-inertial odometry,” arXiv preprint arXiv:1804.06120 , 2018. 2

  15. [15]

    Collaborative slam with crowdsourced data,

    J. Huai, “Collaborative slam with crowdsourced data,” Ph.D. dissertation, The Ohio State University, 2017. 3

  16. [16]

    Unified temporal and spatial calibration for multi-sensor systems,

    P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multi-sensor systems,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2013, pp. 1280–

  17. [17]

    Correll, Introduction to Autonomous Robots , ser

    N. Correll, Introduction to Autonomous Robots , ser. v1.7. Magellan Scientific, oct 2016. 3, 5

  18. [18]

    Real-time loop closure in 2d lidar slam,

    W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on . IEEE, 2016, pp. 1271–1278. 4

  19. [19]

    Furrer, M

    F. Furrer, M. Fehr, T. Novkovic, H. Sommer, I. Gilitschenski, and R. Siegwart, Evaluation of Combined Time-Offset Estimation and Hand- Eye Calibration on Robotic Datasets . Cham: Springer International Publishing, 2017. 4

  20. [20]

    An analytical least- squares solution to the odometer-camera extrinsic calibration problem,

    C. X. Guo, F. M. Mirzaei, and S. I. Roumeliotis, “An analytical least- squares solution to the odometer-camera extrinsic calibration problem,” in Robotics and Automation (ICRA), 2012 IEEE International Confer- ence on. IEEE, 2012, pp. 3962–3968. 4

  21. [21]

    Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry,

    L. Heng, B. Li, and M. Pollefeys, “Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry,” in Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on . IEEE, 2013, pp. 1793–1800. 4

  22. [22]

    Continuous- time estimation of attitude using b-splines on lie groups,

    H. Sommer, J. R. Forbes, R. Siegwart, and P. Furgale, “Continuous- time estimation of attitude using b-splines on lie groups,” Journal of Guidance, Control, and Dynamics , vol. 39, no. 2, pp. 242–261, 2015. 4

  23. [23]

    Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

    P.-E. Sarlin, F. Debraine, M. Dymczyk, R. Siegwart, and C. Cadena, “Leveraging deep visual descriptors for hierarchical efficient localiza- tion,” arXiv:1809.01019, 2018. 4

  24. [24]

    Posenet: A convolutional network for real-time 6-dof camera relocalization,

    A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” 2015. 4

  25. [25]

    On measuring the accuracy of slam algorithms,

    R. K ¨ummerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti, C. Stach- niss, and A. Kleiner, “On measuring the accuracy of slam algorithms,” Autonomous Robots, vol. 27, no. 4, p. 387, 2009. 4

  26. [26]

    (2012, apr) Nist/sematech e-handbook of statistical methods

    NIST. (2012, apr) Nist/sematech e-handbook of statistical methods. [Online]. Available: http://www3.med.unipmn.it/ ∼magnani/pdf/Tavole chi-quadrato.pdf 5

  27. [27]

    A benchmark comparison of monoc- ular visual-inertial odometry algorithms for flying robots,

    J. Delmerico and D. Scaramuzza, “A benchmark comparison of monoc- ular visual-inertial odometry algorithms for flying robots,” Memory, vol. 10, p. 20, 2018. 5

  28. [28]

    maplab: An open framework for research in visual-inertial mapping and localization,

    T. Schneider, M. T. Dymczyk, M. Fehr, K. Egger, S. Lynen, I. Gilitschen- ski, and R. Siegwart, “maplab: An open framework for research in visual-inertial mapping and localization,” IEEE Robotics and Automa- tion Letters, 2018. 5, 6

  29. [29]

    Trajectory-based place- recognition for efficient large scale localization,

    S. Lynen, M. Bosse, and R. Siegwart, “Trajectory-based place- recognition for efficient large scale localization,” International Journal of Computer Vision , vol. 124, no. 1, pp. 49–64, 2017. 6