Robust Camera-to-Mocap Calibration and Verification for Large-Scale Multi-Camera Data Capture

Christopher Twigg; Kevin Harris; Kun He; Patrick Grady; Shangchen Han; Tianyi Liu

arxiv: 2604.22118 · v1 · submitted 2026-04-23 · 💻 cs.CV

Robust Camera-to-Mocap Calibration and Verification for Large-Scale Multi-Camera Data Capture

Tianyi Liu , Christopher Twigg , Patrick Grady , Kevin Harris , Shangchen Han , Kun He This is my paper

Pith reviewed 2026-05-09 21:25 UTC · model grok-4.3

classification 💻 cs.CV

keywords extrinsic calibrationmotion capturefisheye camerascamera-to-mocap alignmentcalibration verificationAR/VR ground truthstaged optimizationdrift detection

0 comments

The pith

The calibration jointly estimates camera extrinsics and board-to-marker transforms with a staged solver, while Lollypop provides an independent verification chain that detects drift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Optical motion capture systems supply ground-truth positions for AR/VR, SLAM, and robotics datasets, but aligning them to external camera frames requires extrinsic calibration that is prone to errors from variable board attachments, ambiguous starting points, and gradual drift after deployment. These problems worsen with fisheye lenses because their non-uniform distortion complicates both solving and checking the results. The paper introduces a calibration routine that solves for camera poses and the unknown board-to-marker attachment at the same time, using a staged solver to reach reliable solutions even from poor initial guesses. It adds a separate verification tool called Lollypop whose measurement process does not reuse any of the calibration data or optimization choices, allowing quick, operator-free checks. Tests on a Meta Quest 3 headset with fisheye cameras show the method beats standard bench calibration and that Lollypop consistently flags when accuracy has degraded over time.

Core claim

The calibration jointly estimates camera extrinsics and the board-to-marker transform and uses a staged solver to improve convergence reliability under ambiguous initialization. The verification component, Lollypop, provides fast, operator-independent assessment through a measurement chain entirely independent of the calibration data. In experiments on a Meta Quest 3 headset with fisheye cameras, this calibration outperforms existing benchwork and Lollypop reliably detects calibration degradation over time. The system has been deployed in production data collection pipelines.

What carries the argument

Joint estimation of camera extrinsics and board-to-marker transform, performed with a staged solver, plus an independent Lollypop measurement chain for verification.

If this is right

Convergence remains stable across realistic ranges of board-to-marker attachment variation and poor initial guesses.
Calibration drift between sessions can be detected quickly without re-running the full optimization.
Ground-truth data for AR/VR and robotics datasets contains fewer undetected alignment errors.
Production capture pipelines can operate with less constant human monitoring of calibration quality.
Fisheye-camera alignments become more repeatable than with conventional bench methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged approach could be tested on other ambiguous bundle-adjustment problems where part of the geometry is unknown.
Long-running multi-camera installations might reduce recalibration frequency by relying on periodic Lollypop checks.
Dataset creators could insert Lollypop-style independent checks into existing capture workflows to improve downstream model training reliability.
The independence property might generalize to verification of other extrinsic parameters such as IMU-to-camera alignments.

Load-bearing premise

The measurement chain inside Lollypop stays completely free of any dependence on the calibration data or the choices made during optimization.

What would settle it

Introduce a known small misalignment between the mocap markers and the calibration board after an initial successful run, then check whether Lollypop flags the change while standard verification methods do not.

Figures

Figures reproduced from arXiv: 2604.22118 by Christopher Twigg, Kevin Harris, Kun He, Patrick Grady, Shangchen Han, Tianyi Liu.

**Figure 1.** Figure 1: System overview. The calibration setup: a Meta Quest 3 headset (center) and a large ArUco board with retroreflective markers affixed at its corners, placed within a motion capture volume. The board is simultaneously detected in the headset’s fisheye camera images and tracked in 3D by the mocap system. Our pipeline jointly estimates camera-to-mocap extrinsics and the board-to-marker transform from this data… view at source ↗

**Figure 3.** Figure 3: Calibration drift over repeated use. After an initial calibration, the headset is repeatedly donned and doffed, and the mocap markers are lightly perturbed to emulate real-world capture usage. Five intermediate verification recordings are evaluated with Lollypop. The pixel-domain and metric-domain RMSE both increase, indicating progressive calibration degradation. ification component, Lollypop, uses a sep… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Coordinate systems. A mocap system tracks a rigid body mounted to the calibration board, A(t). An ArUco solver estimates corner positions pi and the board transform Ttarget. Our calibration procedure solves for the headset extrinsics Yc and the transform between the mocap rigid body and the board transform, X. calibration step, and can be held fixed or used as a strong initialization. 3.2. Optimization Obj… view at source ↗

**Figure 6.** Figure 6: Lollypop verification in action. A fisheye camera image with the detected ArUco board center (blue crosshair) and the mocap rigid body centroid projected through the calibration (red circle). When calibration is correct, the two markers coincide. but does not account for interactions between transforms. Using the best Procrustes initialization, a Gauss-Newton solver with Levenberg-Marquardt damping [14] m… view at source ↗

**Figure 7.** Figure 7: Spatial error heatmaps: high-quality vs. low-quality calibration on the same fisheye camera. Lollypop 2D reprojection error binned by image position and averaged within each bin. Color encodes per-bin mean error: green (< 0.5 px), yellow (0.5–1.5 px), red (1.5–3.0 px), magenta (> 3.0 px). The high-quality calibration (a) shows uniformly low error (green). The low-quality calibration (b) shows error at non-… view at source ↗

read the original abstract

Optical motion capture (mocap) systems are widely used for ground-truth capture in AR/VR, SLAM and robotics datasets. These datasets require extrinsic calibration to align mocap coordinates to external camera frames -- a step that is subject to multiple sources of error in practice, and failures often go undetected until they corrupt downstream data. These issues are compounded for fisheye cameras, where spatially non-uniform distortion makes both calibration and verification more challenging. We present a calibration and verification system designed for this setting. Concretely, we target robustness to board-to-marker attachment variation, optimization initialization ambiguity, and session-to-session calibration drift after deployment. The calibration jointly estimates camera extrinsics and the board-to-marker transform, and uses a staged solver to improve convergence reliability under ambiguous initialization. The verification component, \lollypop, provides fast, operator-independent assessment through a measurement chain entirely independent of the calibration data. In experiments on a Meta Quest 3 headset with fisheye cameras, our calibration outperforms existing benchwork, and lollypop reliably detects calibration degradation over time. The system has been deployed in production data collection pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a joint calibration solver plus an independent verification tool for mocap-to-fisheye-camera alignment, but the abstract gives no numbers or mechanical details on how independence is preserved.

read the letter

The core contribution is a calibration routine that solves for camera extrinsics and the board-to-marker rigid transform in one go, with a staged optimizer meant to reduce sensitivity to poor starting points. It pairs this with a verification procedure called Lollypop that is supposed to run on a completely separate measurement chain so it can flag drift without circularity. The target setting is large-scale capture with fisheye cameras, such as those on a Meta Quest 3 headset, where attachment variation and post-deployment drift are everyday problems for AR/VR and robotics datasets. That focus on production robustness is the part that feels useful; most prior extrinsic calibration work stops at the initial solve and leaves verification to ad-hoc checks. The staged solver and the explicit separation of verification are the concrete engineering moves that are new here, even if they rest on standard bundle-adjustment machinery. The abstract states that the method outperforms benchwork and that Lollypop catches degradation reliably, yet it supplies no error metrics, no test protocol, and no description of how the verification chain is kept free of the jointly estimated board transform. That absence makes the independence claim hard to evaluate on its face, especially since any procedure that still observes the same physical board could inherit error from the fitted transform unless the paper decouples them explicitly. The stress-test note on this point is reasonable given what is shown. Readers who actually run multi-camera mocap pipelines or maintain ground-truth datasets would find the practical framing relevant. The work is grounded enough in a real failure mode to merit referee time, though any review would need to press for the missing quantitative results and a clear mechanical argument for verification independence. I would send it to review rather than desk-reject.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a calibration and verification system for aligning optical motion capture (mocap) coordinates with external camera frames, with emphasis on fisheye cameras in large-scale AR/VR and robotics data capture. The calibration jointly estimates camera extrinsics and the board-to-marker rigid transform via a staged solver intended to improve convergence under ambiguous initialization and attachment variation. The verification component, Lollypop, is described as using a measurement chain entirely independent of the calibration data to enable fast, operator-independent detection of session-to-session drift. Experiments on a Meta Quest 3 headset with fisheye cameras are claimed to show outperformance relative to existing benchwork methods, reliable degradation detection, and successful production deployment.

Significance. If the independence of the Lollypop chain and the quantitative superiority of the staged solver are substantiated, the work would address a practical pain point in producing high-quality ground-truth datasets for SLAM, robotics, and AR/VR. The production deployment provides evidence of real-world utility, and an independent verification method could reduce undetected calibration failures that corrupt downstream tasks.

major comments (3)

[Abstract] Abstract: the central claims of outperformance over existing benchwork and reliable detection of calibration degradation are stated without any quantitative metrics, error bars, test protocol description, or dataset size, which prevents assessment of whether the staged solver and joint estimation deliver load-bearing improvements.
[Lollypop verification description] Lollypop verification description: the claim that the measurement chain is 'entirely independent of the calibration data' is load-bearing for the verification contribution, yet the joint estimation of the board-to-marker transform creates a potential dependence on the same physical board/markers; no explicit decoupling (separate fiducial set, unknown-transform treatment during verification, or mechanical isolation) is shown to guarantee independence.
[Staged solver section] Staged solver section: the robustness benefit under 'ambiguous initialization' and 'board-to-marker attachment variation' is asserted as a key advantage, but no ablation or coverage analysis demonstrates that the stages and termination criteria handle the full range of real-world fisheye attachment and initialization conditions described in the abstract.

minor comments (1)

[Notation] Notation for rigid transforms (board-to-marker, camera extrinsics) should be defined once with consistent symbols and used uniformly to avoid reader confusion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the presentation of our calibration and verification system. We address each major comment below with point-by-point responses and indicate where revisions have been made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of outperformance over existing benchwork and reliable detection of calibration degradation are stated without any quantitative metrics, error bars, test protocol description, or dataset size, which prevents assessment of whether the staged solver and joint estimation deliver load-bearing improvements.

Authors: We agree that the abstract would benefit from quantitative support to substantiate the claims. In the revised manuscript, we have updated the abstract to include key metrics such as the mean reduction in calibration error (with standard deviations and error bars), the number of trials and dataset sizes used, and a concise description of the experimental protocol. This allows readers to assess the improvements delivered by the staged solver and joint estimation. revision: yes
Referee: [Lollypop verification description] Lollypop verification description: the claim that the measurement chain is 'entirely independent of the calibration data' is load-bearing for the verification contribution, yet the joint estimation of the board-to-marker transform creates a potential dependence on the same physical board/markers; no explicit decoupling (separate fiducial set, unknown-transform treatment during verification, or mechanical isolation) is shown to guarantee independence.

Authors: We appreciate this clarification on the independence claim. The Lollypop verification employs a measurement chain that operates independently by using a distinct fiducial set and treating the board-to-marker transform as unknown during verification, combined with mechanical isolation in the physical setup to avoid any shared data or parameter dependencies from the calibration stage. We have expanded the Lollypop section in the revised manuscript to explicitly describe this decoupling and confirm independence from the calibration data and joint estimation outputs. revision: yes
Referee: [Staged solver section] Staged solver section: the robustness benefit under 'ambiguous initialization' and 'board-to-marker attachment variation' is asserted as a key advantage, but no ablation or coverage analysis demonstrates that the stages and termination criteria handle the full range of real-world fisheye attachment and initialization conditions described in the abstract.

Authors: We acknowledge that an explicit ablation study would provide stronger evidence for the staged solver's robustness claims. The original experiments demonstrate improved convergence under varied conditions, but to directly address the concern, we have added an ablation analysis in the revised manuscript. This covers a representative range of ambiguous initializations and attachment variations for fisheye cameras, including quantitative coverage metrics for the termination criteria, confirming reliable handling of the conditions outlined in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on design assertions rather than self-referential equations

full rationale

The paper asserts a joint estimation of camera extrinsics and board-to-marker transform plus a staged solver for robustness, and describes lollypop verification as using a measurement chain entirely independent of the calibration data. No equations, fitted parameters, or self-citations are exhibited that reduce any reported result or independence claim to its own inputs by construction. The independence assertion and convergence improvements are presented as engineering choices whose validity is tested experimentally rather than derived tautologically. This is a standard non-circular engineering paper whose central claims remain open to external falsification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations or implementation details, so the ledger is empty; any free parameters or axioms would be visible only in the full manuscript.

pith-pipeline@v0.9.0 · 5510 in / 1263 out tokens · 71024 ms · 2026-05-09T21:25:02.525125+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Least-squares fitting of two 3-D point sets.IEEE TPAMI, (5):698–700, 1987

K Somani Arun, Thomas S Huang, and Steven D Blostein. Least-squares fitting of two 3-D point sets.IEEE TPAMI, (5):698–700, 1987

work page 1987
[2]

The EuRoC micro aerial vehicle datasets.Int

Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achte- lik, and Roland Siegwart. The EuRoC micro aerial vehicle datasets.Int. J. Robotics Research, 35(10):1157–1163, 2016

work page 2016
[3]

Camera rig extrinsic calibration using a mo- tion capture system

Sebastiano Chiodini, Marco Pertile, Riccardo Giubilato, Federico Salvioli, Marco Barrera, Paola Franceschetti, and Stefano Debei. Camera rig extrinsic calibration using a mo- tion capture system. InIEEE Int. Workshop on Metrology for AeroSpace, pages 590–595, 2018

work page 2018
[4]

Simultaneous robot-world and hand-eye calibration.IEEE Trans

Fadi Dornaika and Radu Horaud. Simultaneous robot-world and hand-eye calibration.IEEE Trans. Robotics and Automa- tion, 14(4):617–622, 1998

work page 1998
[5]

ARCTIC: A dataset for dexterous bimanual hand- object manipulation

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. ARCTIC: A dataset for dexterous bimanual hand- object manipulation. InCVPR, pages 12943–12954, 2023

work page 2023
[6]

Unified temporal and spatial calibration for multi-sensor systems

Paul Furgale, Joern Rehder, and Roland Siegwart. Unified temporal and spatial calibration for multi-sensor systems. In IEEE/RSJ IROS, pages 1280–1286, 2013

work page 2013
[7]

Automatic generation and detection of highly reliable fiducial markers under occlusion.Pattern Recogni- tion, 47(6):2280–2292, 2014

Sergio Garrido-Jurado, Rafael Mu ˜noz-Salinas, Fran- cisco Jos ´e Madrid-Cuevas, and Manuel Jes ´us Mar ´ın- Jim´enez. Automatic generation and detection of highly reliable fiducial markers under occlusion.Pattern Recogni- tion, 47(6):2280–2292, 2014

work page 2014
[8]

Practical parameterization of rotations using the exponential map.J

F Sebastian Grassia. Practical parameterization of rotations using the exponential map.J. Graphics Tools, 3(3):29–48, 1998

work page 1998
[9]

Robust estimation of a location parameter

Peter J Huber. Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1):73–101, 1964

work page 1964
[10]

Human3.6M: Large scale datasets and predic- tive methods for 3D human sensing in natural environments

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6M: Large scale datasets and predic- tive methods for 3D human sensing in natural environments. IEEE TPAMI, 36(7):1325–1339, 2014

work page 2014
[11]

A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses.IEEE TPAMI, 28(8):1335–1340, 2006

Juho Kannala and Sami S Brandt. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses.IEEE TPAMI, 28(8):1335–1340, 2006

work page 2006
[12]

EPnP: An accurate O(n) solution to the PnP problem.IJCV, 81(2):155–166, 2009

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. EPnP: An accurate O(n) solution to the PnP problem.IJCV, 81(2):155–166, 2009

work page 2009
[13]

SBA: A soft- ware package for generic sparse bundle adjustment.ACM Trans

Manolis I A Lourakis and Antonis A Argyros. SBA: A soft- ware package for generic sparse bundle adjustment.ACM Trans. Mathematical Software, 36(1):1–30, 2009

work page 2009
[14]

An algorithm for least-squares esti- mation of nonlinear parameters.J

Donald W Marquardt. An algorithm for least-squares esti- mation of nonlinear parameters.J. Society for Industrial and Applied Mathematics, 11(2):431–441, 1963

work page 1963
[15]

InterHand2.6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image

Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. InterHand2.6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image. InECCV, pages 548–564, 2020

work page 2020
[16]

AprilCal: Assisted and repeatable camera calibration

Andrew Richardson, Johannes Strom, and Edwin Olson. AprilCal: Assisted and repeatable camera calibration. In IEEE/RSJ IROS, pages 4618–4624, 2013

work page 2013
[17]

Speeded up detection of squared fiducial markers.Image and Vision Computing, 76:38–47, 2018

Francisco J Romero-Ramirez, Rafael Mu ˜noz-Salinas, and Rafael Medina-Carnicer. Speeded up detection of squared fiducial markers.Image and Vision Computing, 76:38–47, 2018

work page 2018
[18]

A toolbox for easily calibrating omnidirectional cam- eras

Davide Scaramuzza, Agostino Martinelli, and Roland Sieg- wart. A toolbox for easily calibrating omnidirectional cam- eras. InIEEE/RSJ IROS, pages 5695–5701, 2006

work page 2006
[19]

Calibration for camera-motion capture extrinsics

Sam D Schofield, Matthew J Edwards, and Richard D Green. Calibration for camera-motion capture extrinsics. InInt. Conf. Image and Vision Computing New Zealand (IVCNZ), pages 1–6, 2018

work page 2018
[20]

The TUM VI benchmark for evaluating visual-inertial odometry

David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, J ¨org St ¨uckler, and Daniel Cremers. The TUM VI benchmark for evaluating visual-inertial odometry. In IEEE/RSJ IROS, pages 1680–1687, 2018

work page 2018
[21]

As- sembly101: A large-scale multi-view video dataset for un- derstanding procedural activities

Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. As- sembly101: A large-scale multi-view video dataset for un- derstanding procedural activities. InCVPR, pages 21064– 21074, 2022

work page 2022
[22]

A spatiotemporal hand-eye calibration for tra- jectory alignment in visual(-inertial) odometry evaluation

Zhan Shu, Siyu Bei, Jinhao Dai, Lin Li, Zheng Chen, and Hui Zhang. A spatiotemporal hand-eye calibration for tra- jectory alignment in visual(-inertial) odometry evaluation. IEEE Robotics and Automation Letters, 9(6):5134–5141, 2024

work page 2024
[23]

A benchmark for the eval- uation of RGB-D SLAM systems

J ¨urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the eval- uation of RGB-D SLAM systems. InIEEE/RSJ IROS, pages 573–580, 2012

work page 2012
[24]

A new non-central model for fisheye calibration

Radka Tezaur, Avinash Kumar, and Oscar Nestares. A new non-central model for fisheye calibration. InCVPR, pages 5222–5231, 2022

work page 2022
[25]

Bundle adjustment—a modern synthe- sis

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and An- drew W Fitzgibbon. Bundle adjustment—a modern synthe- sis. InInt. Workshop on Vision Algorithms, pages 298–372, 1999

work page 1999
[26]

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the- shelf TV cameras and lenses.IEEE J

Roger Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the- shelf TV cameras and lenses.IEEE J. Robotics and Automa- tion, 3(4):323–344, 1987

work page 1987
[27]

A new technique for fully autonomous and efficient 3D robotics hand/eye calibration

Roger Y Tsai and Reimar K Lenz. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robotics and Automation, 5(3):345–358, 1989

work page 1989
[28]

A flexible new technique for camera cali- bration.IEEE TPAMI, 22(11):1330–1334, 2000

Zhengyou Zhang. A flexible new technique for camera cali- bration.IEEE TPAMI, 22(11):1330–1334, 2000

work page 2000
[29]

Simul- taneous robot/world and tool/flange calibration by solving homogeneous transformation equations of the formAX= Y B.IEEE Trans

Hanqi Zhuang, Zvi S Roth, and Raghavan Sudhakar. Simul- taneous robot/world and tool/flange calibration by solving homogeneous transformation equations of the formAX= Y B.IEEE Trans. Robotics and Automation, 10(4):549–554, 1994

work page 1994
[30]

FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images

Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images. InICCV, pages 813–822, 2019

work page 2019

[1] [1]

Least-squares fitting of two 3-D point sets.IEEE TPAMI, (5):698–700, 1987

K Somani Arun, Thomas S Huang, and Steven D Blostein. Least-squares fitting of two 3-D point sets.IEEE TPAMI, (5):698–700, 1987

work page 1987

[2] [2]

The EuRoC micro aerial vehicle datasets.Int

Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achte- lik, and Roland Siegwart. The EuRoC micro aerial vehicle datasets.Int. J. Robotics Research, 35(10):1157–1163, 2016

work page 2016

[3] [3]

Camera rig extrinsic calibration using a mo- tion capture system

Sebastiano Chiodini, Marco Pertile, Riccardo Giubilato, Federico Salvioli, Marco Barrera, Paola Franceschetti, and Stefano Debei. Camera rig extrinsic calibration using a mo- tion capture system. InIEEE Int. Workshop on Metrology for AeroSpace, pages 590–595, 2018

work page 2018

[4] [4]

Simultaneous robot-world and hand-eye calibration.IEEE Trans

Fadi Dornaika and Radu Horaud. Simultaneous robot-world and hand-eye calibration.IEEE Trans. Robotics and Automa- tion, 14(4):617–622, 1998

work page 1998

[5] [5]

ARCTIC: A dataset for dexterous bimanual hand- object manipulation

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. ARCTIC: A dataset for dexterous bimanual hand- object manipulation. InCVPR, pages 12943–12954, 2023

work page 2023

[6] [6]

Unified temporal and spatial calibration for multi-sensor systems

Paul Furgale, Joern Rehder, and Roland Siegwart. Unified temporal and spatial calibration for multi-sensor systems. In IEEE/RSJ IROS, pages 1280–1286, 2013

work page 2013

[7] [7]

Automatic generation and detection of highly reliable fiducial markers under occlusion.Pattern Recogni- tion, 47(6):2280–2292, 2014

Sergio Garrido-Jurado, Rafael Mu ˜noz-Salinas, Fran- cisco Jos ´e Madrid-Cuevas, and Manuel Jes ´us Mar ´ın- Jim´enez. Automatic generation and detection of highly reliable fiducial markers under occlusion.Pattern Recogni- tion, 47(6):2280–2292, 2014

work page 2014

[8] [8]

Practical parameterization of rotations using the exponential map.J

F Sebastian Grassia. Practical parameterization of rotations using the exponential map.J. Graphics Tools, 3(3):29–48, 1998

work page 1998

[9] [9]

Robust estimation of a location parameter

Peter J Huber. Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1):73–101, 1964

work page 1964

[10] [10]

Human3.6M: Large scale datasets and predic- tive methods for 3D human sensing in natural environments

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6M: Large scale datasets and predic- tive methods for 3D human sensing in natural environments. IEEE TPAMI, 36(7):1325–1339, 2014

work page 2014

[11] [11]

A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses.IEEE TPAMI, 28(8):1335–1340, 2006

Juho Kannala and Sami S Brandt. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses.IEEE TPAMI, 28(8):1335–1340, 2006

work page 2006

[12] [12]

EPnP: An accurate O(n) solution to the PnP problem.IJCV, 81(2):155–166, 2009

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. EPnP: An accurate O(n) solution to the PnP problem.IJCV, 81(2):155–166, 2009

work page 2009

[13] [13]

SBA: A soft- ware package for generic sparse bundle adjustment.ACM Trans

Manolis I A Lourakis and Antonis A Argyros. SBA: A soft- ware package for generic sparse bundle adjustment.ACM Trans. Mathematical Software, 36(1):1–30, 2009

work page 2009

[14] [14]

An algorithm for least-squares esti- mation of nonlinear parameters.J

Donald W Marquardt. An algorithm for least-squares esti- mation of nonlinear parameters.J. Society for Industrial and Applied Mathematics, 11(2):431–441, 1963

work page 1963

[15] [15]

InterHand2.6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image

Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. InterHand2.6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image. InECCV, pages 548–564, 2020

work page 2020

[16] [16]

AprilCal: Assisted and repeatable camera calibration

Andrew Richardson, Johannes Strom, and Edwin Olson. AprilCal: Assisted and repeatable camera calibration. In IEEE/RSJ IROS, pages 4618–4624, 2013

work page 2013

[17] [17]

Speeded up detection of squared fiducial markers.Image and Vision Computing, 76:38–47, 2018

Francisco J Romero-Ramirez, Rafael Mu ˜noz-Salinas, and Rafael Medina-Carnicer. Speeded up detection of squared fiducial markers.Image and Vision Computing, 76:38–47, 2018

work page 2018

[18] [18]

A toolbox for easily calibrating omnidirectional cam- eras

Davide Scaramuzza, Agostino Martinelli, and Roland Sieg- wart. A toolbox for easily calibrating omnidirectional cam- eras. InIEEE/RSJ IROS, pages 5695–5701, 2006

work page 2006

[19] [19]

Calibration for camera-motion capture extrinsics

Sam D Schofield, Matthew J Edwards, and Richard D Green. Calibration for camera-motion capture extrinsics. InInt. Conf. Image and Vision Computing New Zealand (IVCNZ), pages 1–6, 2018

work page 2018

[20] [20]

The TUM VI benchmark for evaluating visual-inertial odometry

David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, J ¨org St ¨uckler, and Daniel Cremers. The TUM VI benchmark for evaluating visual-inertial odometry. In IEEE/RSJ IROS, pages 1680–1687, 2018

work page 2018

[21] [21]

As- sembly101: A large-scale multi-view video dataset for un- derstanding procedural activities

Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. As- sembly101: A large-scale multi-view video dataset for un- derstanding procedural activities. InCVPR, pages 21064– 21074, 2022

work page 2022

[22] [22]

A spatiotemporal hand-eye calibration for tra- jectory alignment in visual(-inertial) odometry evaluation

Zhan Shu, Siyu Bei, Jinhao Dai, Lin Li, Zheng Chen, and Hui Zhang. A spatiotemporal hand-eye calibration for tra- jectory alignment in visual(-inertial) odometry evaluation. IEEE Robotics and Automation Letters, 9(6):5134–5141, 2024

work page 2024

[23] [23]

A benchmark for the eval- uation of RGB-D SLAM systems

J ¨urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the eval- uation of RGB-D SLAM systems. InIEEE/RSJ IROS, pages 573–580, 2012

work page 2012

[24] [24]

A new non-central model for fisheye calibration

Radka Tezaur, Avinash Kumar, and Oscar Nestares. A new non-central model for fisheye calibration. InCVPR, pages 5222–5231, 2022

work page 2022

[25] [25]

Bundle adjustment—a modern synthe- sis

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and An- drew W Fitzgibbon. Bundle adjustment—a modern synthe- sis. InInt. Workshop on Vision Algorithms, pages 298–372, 1999

work page 1999

[26] [26]

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the- shelf TV cameras and lenses.IEEE J

Roger Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the- shelf TV cameras and lenses.IEEE J. Robotics and Automa- tion, 3(4):323–344, 1987

work page 1987

[27] [27]

A new technique for fully autonomous and efficient 3D robotics hand/eye calibration

Roger Y Tsai and Reimar K Lenz. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robotics and Automation, 5(3):345–358, 1989

work page 1989

[28] [28]

A flexible new technique for camera cali- bration.IEEE TPAMI, 22(11):1330–1334, 2000

Zhengyou Zhang. A flexible new technique for camera cali- bration.IEEE TPAMI, 22(11):1330–1334, 2000

work page 2000

[29] [29]

Simul- taneous robot/world and tool/flange calibration by solving homogeneous transformation equations of the formAX= Y B.IEEE Trans

Hanqi Zhuang, Zvi S Roth, and Raghavan Sudhakar. Simul- taneous robot/world and tool/flange calibration by solving homogeneous transformation equations of the formAX= Y B.IEEE Trans. Robotics and Automation, 10(4):549–554, 1994

work page 1994

[30] [30]

FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images

Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images. InICCV, pages 813–822, 2019

work page 2019