Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots
Pith reviewed 2026-05-25 01:29 UTC · model grok-4.3
The pith
Segway delivery robots collected a year-long benchmark dataset for SLAM and place recognition in real office and mall environments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Segway DRIVE benchmark is a dataset suite collected by a fleet of Segway delivery robots equipped with a global-shutter fisheye camera, consumer-grade IMU, low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As the robots carry out tasks in office buildings and shopping malls, the data spanning a year features planar motions, moving pedestrians, and changing environment and lighting that pose severe challenges for SLAM algorithms, and the benchmark includes several metrics to evaluate metric place recognition algorithms.
What carries the argument
The Segway DRIVE benchmark dataset suite, collected via synchronized onboard sensors plus temporary high-precision lidar references, which supplies representative sequences and evaluation metrics for metric place recognition.
If this is right
- SLAM algorithms can be tested against ground truth on sequences that include real moving pedestrians and lighting variations.
- Metric place recognition methods can be assessed using the proposed metrics on data from actual indoor operations.
- The benchmark provides access to hundreds of sequences covering more than 50 km of indoor floors for standardized comparisons.
- Ongoing robot operations will add more data over time, expanding coverage of changing environments.
- Algorithms shown to work on this data are more likely to handle conditions encountered in autonomous indoor navigation.
Where Pith is reading between the lines
- Developers of delivery robots could use the benchmark to identify which existing SLAM components break first under pedestrian traffic before field trials.
- The dataset's focus on planar indoor motion may encourage new algorithms that explicitly exploit floor-plan constraints rather than full 6-DoF estimation.
- Repeated traversals of the same buildings under varying lighting could support research on long-term map maintenance without requiring new collection campaigns.
Load-bearing premise
The reference solutions generated by the removable high-precision lidar are sufficiently accurate and the collected sequences adequately capture the failure modes of existing SLAM algorithms.
What would settle it
Independent measurements in the same environments showing that the lidar-based reference trajectories contain large errors, or new sequences from the robots revealing that the dataset lacks the claimed dynamic challenges.
Figures
read the original abstract
Visual place recognition and simultaneous localization and mapping (SLAM) have recently begun to be used in real-world autonomous navigation tasks like food delivery. Existing datasets for SLAM research are often not representative of in situ operations, leaving a gap between academic research and real-world deployment. In response, this paper presents the Segway DRIVE benchmark, a novel and challenging dataset suite collected by a fleet of Segway delivery robots. Each robot is equipped with a global-shutter fisheye camera, a consumer-grade IMU synced to the camera on chip, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As they routinely carry out tasks in office buildings and shopping malls while collecting data, the dataset spanning a year is characterized by planar motions, moving pedestrians in scenes, and changing environment and lighting. Such factors typically pose severe challenges and may lead to failures for SLAM algorithms. Moreover, several metrics are proposed to evaluate metric place recognition algorithms. With these metrics, sample SLAM and metric place recognition methods were evaluated on this benchmark. The first release of our benchmark has hundreds of sequences, covering more than 50 km of indoor floors. More data will be added as the robot fleet continues to operate in real life. The benchmark is available at http://drive.segwayrobotics.com/#/dataset/download.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the Segway DRIVE benchmark, a dataset suite collected by a fleet of Segway delivery robots equipped with a global-shutter fisheye camera, consumer-grade IMU synced to the camera, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. The dataset comprises hundreds of sequences spanning more than 50 km in indoor office and mall environments over a year, featuring planar motions, moving pedestrians, and varying lighting. The authors propose metrics for evaluating metric place recognition algorithms and report evaluations of sample SLAM and place recognition methods on the benchmark, with the data to be released publicly and expanded over time.
Significance. If the reference trajectories prove accurate, the benchmark would offer meaningful value by supplying long-term, real-deployment indoor data that captures operational challenges (pedestrians, lighting variation) not well represented in many existing SLAM datasets. The proposed evaluation metrics and sample results provide a concrete starting point for comparisons, and the ongoing data collection model is a practical strength for the community.
major comments (2)
- [sensor setup and reference generation] The description of reference trajectory generation (sensor setup and data collection sections) provides no quantitative error metrics, calibration procedure, cross-validation against wheel odometry or repeated passes, or accuracy assessment for the removable high-precision lidar. This directly undermines the central claim that the benchmark supplies reliable ground truth for evaluating SLAM and metric place recognition, as the accuracy needed to expose claimed failure modes remains unknown.
- [dataset characterization] The assertion that the collected sequences are representative of in situ operations and capture failure modes of existing algorithms (abstract and dataset characterization) rests solely on qualitative description without sequence selection criteria, quantitative coverage statistics, or comparison showing how the data exercises known SLAM weaknesses.
minor comments (2)
- [abstract] The abstract refers to 'several metrics' for place recognition without naming them; a forward reference to the specific section where they are defined would improve clarity.
- [sensor setup] Exact sensor models, resolutions, and synchronization details are mentioned at a high level but could be expanded in a table for full reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of the benchmark's presentation. We address each major comment below and will revise the manuscript to incorporate additional details where feasible.
read point-by-point responses
-
Referee: [sensor setup and reference generation] The description of reference trajectory generation (sensor setup and data collection sections) provides no quantitative error metrics, calibration procedure, cross-validation against wheel odometry or repeated passes, or accuracy assessment for the removable high-precision lidar. This directly undermines the central claim that the benchmark supplies reliable ground truth for evaluating SLAM and metric place recognition, as the accuracy needed to expose claimed failure modes remains unknown.
Authors: We agree that the current manuscript lacks quantitative error metrics and detailed calibration procedures for the reference trajectories. The high-precision lidar is removable and used solely for ground-truth generation during data collection, with its accuracy relying on manufacturer specifications and standard SLAM-based fusion with the other sensors. In the revised manuscript, we will add a new subsection under sensor setup describing the calibration steps for the lidar, IMU, and encoders, along with any available cross-validation from repeated passes and wheel-odometry comparisons. We will also cite the lidar's specified accuracy and note the limitations of not having independent external validation (e.g., motion-capture). This addresses the concern without overstating the ground-truth precision. revision: yes
-
Referee: [dataset characterization] The assertion that the collected sequences are representative of in situ operations and capture failure modes of existing algorithms (abstract and dataset characterization) rests solely on qualitative description without sequence selection criteria, quantitative coverage statistics, or comparison showing how the data exercises known SLAM weaknesses.
Authors: The manuscript's characterization is primarily qualitative, based on the robots' year-long operational deployment. We will revise the dataset characterization section to include quantitative coverage statistics (e.g., total distance per environment type, number of sequences with documented pedestrian density or lighting changes) and explicit sequence selection criteria. A brief comparison to existing indoor SLAM datasets will be added to illustrate how the data targets specific weaknesses such as long-term appearance change and dynamic obstacles. These additions will be supported by the metadata already collected during operations. revision: yes
Circularity Check
No circularity: dataset release with no derivations or fitted predictions
full rationale
The paper is a benchmark dataset release describing collection of sequences by delivery robots equipped with cameras, IMU, encoders, and a removable lidar for reference trajectories. No equations, parameter fitting, predictions, or derivation chains appear in the abstract or described content. Claims of novelty, challenge, and representativeness are descriptive of the data characteristics (planar motion, pedestrians, lighting changes) rather than reductions of any result to its own inputs. Self-citations, if present, are not load-bearing for any central claim. The contribution is self-contained as data collection and metric proposal without circular structure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial TABLE IV: Localization statistics for two sample data groups. sP RF - standard deviation of the place recognition frequencies, L - traveled distance,tn - dataset duration, Ntp - number of true positive localizations = #loc. - #outlier...
work page 2018
-
[2]
Available: http://ijr.sagepub.com/content/early/2016/01/ 21/0278364915620033.abstract 1, 2, 5
[Online]. Available: http://ijr.sagepub.com/content/early/2016/01/ 21/0278364915620033.abstract 1, 2, 5
work page 2016
-
[3]
Penncosyvio: A challenging visual inertial odometry benchmark,
B. Pfrommer, N. Sanket, K. Daniilidis, and J. Cleveland, “Penncosyvio: A challenging visual inertial odometry benchmark,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on . IEEE, 2017, pp. 3847–3854. 1
work page 2017
-
[4]
Summary maps for lifelong visual localization,
P. M ¨uhlfellner, M. B ¨urki, M. Bosse, W. Derendarz, R. Philippsen, and P. Furgale, “Summary maps for lifelong visual localization,” Journal of Field Robotics, vol. 33, no. 5, pp. 561–590, 2016. 1, 2, 4
work page 2016
-
[5]
K. J. Wu, C. X. Guo, G. Georgiou, and S. I. Roumeliotis, “Vins on wheels,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 5155–5162. 2, 5
work page 2017
-
[6]
Keyframe-based visual–inertial odometry using nonlinear optimiza- tion,
S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-based visual–inertial odometry using nonlinear optimiza- tion,” The International Journal of Robotics Research , vol. 34, no. 3, pp. 314–334, 2015. 2, 3, 5
work page 2015
-
[7]
Vins-mono: A robust and versatile monocular visual-inertial state estimator,
T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics , no. 99, pp. 1–17, 2018. 2, 3, 5
work page 2018
-
[8]
Robust visual inertial odometry using a direct ekf-based approach,
M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekf-based approach,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015, pp. 298–304. 2, 3, 5
work page 2015
-
[9]
A benchmark for the evaluation of rgb-d slam systems,
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent Robot Systems (IROS) , Oct
-
[10]
Are we ready for autonomous driving? the kitti vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR) , 2012. 2, 4, 6
work page 2012
-
[11]
The m ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario,
J.-L. Blanco-Claraco, F.- ´A. Moreno-Due ˜nas, and J. Gonz ´alez-Jim´enez, “The m ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario,” The International Journal of Robotics Research, vol. 33, no. 2, pp. 207–214, 2014. 2
work page 2014
-
[12]
University of michigan north campus long-term vision and lidar dataset,
N. Carlevaris-Bianco, A. K. Ushani, and R. M. Eustice, “University of michigan north campus long-term vision and lidar dataset,” The International Journal of Robotics Research , vol. 35, no. 9, pp. 1023– 1035, 2016. 2
work page 2016
-
[13]
1 year, 1000 km: The oxford robotcar dataset,
W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017. [Online]. Available: https://doi.org/10.1177/0278364916679498 2
-
[14]
The tum vi benchmark for evaluating visual-inertial odometry,
D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St ¨uckler, and D. Cre- mers, “The tum vi benchmark for evaluating visual-inertial odometry,” arXiv preprint arXiv:1804.06120 , 2018. 2
-
[15]
Collaborative slam with crowdsourced data,
J. Huai, “Collaborative slam with crowdsourced data,” Ph.D. dissertation, The Ohio State University, 2017. 3
work page 2017
-
[16]
Unified temporal and spatial calibration for multi-sensor systems,
P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multi-sensor systems,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2013, pp. 1280–
work page 2013
-
[17]
Correll, Introduction to Autonomous Robots , ser
N. Correll, Introduction to Autonomous Robots , ser. v1.7. Magellan Scientific, oct 2016. 3, 5
work page 2016
-
[18]
Real-time loop closure in 2d lidar slam,
W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on . IEEE, 2016, pp. 1271–1278. 4
work page 2016
- [19]
-
[20]
An analytical least- squares solution to the odometer-camera extrinsic calibration problem,
C. X. Guo, F. M. Mirzaei, and S. I. Roumeliotis, “An analytical least- squares solution to the odometer-camera extrinsic calibration problem,” in Robotics and Automation (ICRA), 2012 IEEE International Confer- ence on. IEEE, 2012, pp. 3962–3968. 4
work page 2012
-
[21]
L. Heng, B. Li, and M. Pollefeys, “Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry,” in Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on . IEEE, 2013, pp. 1793–1800. 4
work page 2013
-
[22]
Continuous- time estimation of attitude using b-splines on lie groups,
H. Sommer, J. R. Forbes, R. Siegwart, and P. Furgale, “Continuous- time estimation of attitude using b-splines on lie groups,” Journal of Guidance, Control, and Dynamics , vol. 39, no. 2, pp. 242–261, 2015. 4
work page 2015
-
[23]
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
P.-E. Sarlin, F. Debraine, M. Dymczyk, R. Siegwart, and C. Cadena, “Leveraging deep visual descriptors for hierarchical efficient localiza- tion,” arXiv:1809.01019, 2018. 4
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
Posenet: A convolutional network for real-time 6-dof camera relocalization,
A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” 2015. 4
work page 2015
-
[25]
On measuring the accuracy of slam algorithms,
R. K ¨ummerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti, C. Stach- niss, and A. Kleiner, “On measuring the accuracy of slam algorithms,” Autonomous Robots, vol. 27, no. 4, p. 387, 2009. 4
work page 2009
-
[26]
(2012, apr) Nist/sematech e-handbook of statistical methods
NIST. (2012, apr) Nist/sematech e-handbook of statistical methods. [Online]. Available: http://www3.med.unipmn.it/ ∼magnani/pdf/Tavole chi-quadrato.pdf 5
work page 2012
-
[27]
A benchmark comparison of monoc- ular visual-inertial odometry algorithms for flying robots,
J. Delmerico and D. Scaramuzza, “A benchmark comparison of monoc- ular visual-inertial odometry algorithms for flying robots,” Memory, vol. 10, p. 20, 2018. 5
work page 2018
-
[28]
maplab: An open framework for research in visual-inertial mapping and localization,
T. Schneider, M. T. Dymczyk, M. Fehr, K. Egger, S. Lynen, I. Gilitschen- ski, and R. Siegwart, “maplab: An open framework for research in visual-inertial mapping and localization,” IEEE Robotics and Automa- tion Letters, 2018. 5, 6
work page 2018
-
[29]
Trajectory-based place- recognition for efficient large scale localization,
S. Lynen, M. Bosse, and R. Siegwart, “Trajectory-based place- recognition for efficient large scale localization,” International Journal of Computer Vision , vol. 124, no. 1, pp. 49–64, 2017. 6
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.