Safe Human-to-Humanoid Motion Imitation Using Control Barrier Functions

Anthony Tzes; John Abanes; Nikolaos Evangeliou; Wenqi Cai

arxiv: 2604.11447 · v1 · submitted 2026-04-13 · 💻 cs.RO · cs.SY· eess.SY

Safe Human-to-Humanoid Motion Imitation Using Control Barrier Functions

Wenqi Cai , John Abanes , Nikolaos Evangeliou , Anthony Tzes This is my paper

Pith reviewed 2026-05-10 15:39 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords humanoid robotsmotion imitationcontrol barrier functionscollision avoidancequadratic programmingvision-based retargetinghuman-robot safety

0 comments

The pith

A control barrier function layer can filter human motion commands to let humanoid robots imitate safely without collisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a framework that captures human motion with a single camera, converts the detected keypoints into target joint angles for a humanoid robot, and then applies a safety layer to those targets. The safety layer is a control barrier function expressed as a quadratic program that modifies the commands minimally to keep distances above collision thresholds for both the robot's own body parts and the human. A reader would care because direct retargeting often produces unsafe trajectories when the human and robot share space or when the robot folds in on itself, and this method adds protection without changing the core imitation pipeline. The approach is validated through simulations that show real-time operation and collision prevention.

Core claim

The central claim is that formulating safety constraints as control barrier functions inside a quadratic program allows the robot to follow retargeted human joint angles while provably avoiding self-collisions and human-robot collisions. The QP solves for the smallest adjustment to the desired velocities or positions that keeps the system in the safe set defined by the barrier functions.

What carries the argument

The control barrier function (CBF) layer formulated as a quadratic program (QP) that acts as a filter on the imitation commands.

If this is right

The robot can imitate human movements in real time while guaranteeing no collisions occur.
Only a single camera is needed for the vision input, simplifying the setup.
Safety is enforced at the command level rather than requiring full replanning.
The method works in simulation for various human motions without performance loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the CBF filter to include dynamic obstacles or environmental constraints would broaden its use beyond human interaction.
Deploying this on physical hardware would test whether the QP remains real-time under sensor noise and model mismatch.
Integrating the safety filter with learning-based retargeting could handle more complex or uncertain human motions.
The approach separates imitation from safety, so it could apply to other command sources like teleoperation.

Load-bearing premise

Single-camera keypoint detection is accurate enough to provide reliable human pose data, and the quadratic program can be solved fast enough on the robot's hardware to not degrade the imitation.

What would settle it

A test run where the humanoid robot makes physical contact with itself or the human despite the CBF-QP filter being enabled, or where the filter causes noticeable delays in motion tracking.

Figures

Figures reproduced from arXiv: 2604.11447 by Anthony Tzes, John Abanes, Nikolaos Evangeliou, Wenqi Cai.

**Figure 2.** Figure 2: Self-collision avoidance with and without CBF-QP. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Human-humanoid collision avoidance with and without [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Benchmarked collision geometry. From left to right: [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Ensuring operational safety is critical for human-to-humanoid motion imitation. This paper presents a vision-based framework that enables a humanoid robot to imitate human movements while avoiding collisions. Human skeletal keypoints are captured by a single camera and converted into joint angles for motion retargeting. Safety is enforced through a Control Barrier Function (CBF) layer formulated as a Quadratic Program (QP), which filters imitation commands to prevent both self-collisions and human-robot collisions. Simulation results validate the effectiveness of the proposed framework for real-time collision-aware motion imitation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper applies a standard CBF-QP safety filter to single-camera human-to-humanoid imitation and shows it working in simulation, but adds no new technique or quantitative evidence.

read the letter

The main point is that the authors take the usual control barrier function quadratic program and use it to filter retargeted joint commands so the humanoid avoids self-collisions and collisions with the human. They start from single-camera keypoints, convert them to angles, and let the QP adjust the nominal imitation input when needed. That combination is laid out clearly enough that someone could implement the safety layer on top of an existing retargeting pipeline. The simulation examples illustrate the filter kicking in to keep distances above the barrier thresholds. That is the useful part: a concrete, if unsurprising, way to add a safety wrapper around imitation for humanoids. The rest of the paper does not introduce new math or a broader framework. The evaluation stays at the level of showing that the QP finds feasible solutions in the simulated scenarios. No numbers appear on how much the filter alters the original motion, how close the robot gets to collisions, success rates across repeated trials, or timing of the solver. There is also no test of what happens when the single-camera keypoints have realistic noise or when the system runs on actual hardware. CBF guarantees depend on accurate state and dynamics; the vision step is the weakest link here and it is not stress-tested. The paper is aimed at robotics groups that already work on humanoid control or imitation and want a ready safety module to try. A reader could borrow the QP formulation and the retargeting steps without much trouble. It is not the kind of result I would cite for a new safety method or for claims about real-world deployment. I would send it to peer review. The application is timely and the description is straightforward, so referees could ask for the missing metrics, noise analysis, and hardware timing without the paper being a non-starter.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a vision-based framework for safe human-to-humanoid motion imitation. Human skeletal keypoints are captured by a single camera and converted into joint angles for motion retargeting. Safety is enforced by a Control Barrier Function (CBF) layer formulated as a Quadratic Program (QP) that filters imitation commands to prevent self-collisions and human-robot collisions. The central claim is that this approach enables real-time collision-aware motion imitation, with effectiveness shown via simulation results.

Significance. If the simulation results hold under more rigorous testing, the work demonstrates a practical integration of established CBF-QP safety filters with vision-based motion retargeting for humanoids. This could support safer close-proximity human-robot tasks by providing a modular safety layer on top of imitation commands. The use of standard CBF techniques without invented parameters is a methodological strength.

major comments (2)

[Abstract] Abstract: The statement that 'simulation results validate the effectiveness' is not supported by any quantitative metrics, baselines, error analysis, collision measurement details, or timing data. Without these, it is impossible to determine whether the CBF-QP layer maintains h(x) > 0 while keeping deviation from nominal imitation inputs within acceptable bounds.
[Simulation validation section] Simulation validation section: The central safety claim relies on accurate 3D states from single-camera keypoints to define valid barrier functions and on QP solutions being computed in real time. No noise-injection experiments, feasibility analysis under keypoint uncertainty, or hardware timing results are reported, leaving the CBF safety certificate unverified when these assumptions are relaxed.

minor comments (1)

[Abstract] The pipeline description would benefit from an explicit block diagram showing the flow from keypoint detection through retargeting to the CBF-QP filter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where we agree and will revise the paper to improve clarity and support for the claims.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that 'simulation results validate the effectiveness' is not supported by any quantitative metrics, baselines, error analysis, collision measurement details, or timing data. Without these, it is impossible to determine whether the CBF-QP layer maintains h(x) > 0 while keeping deviation from nominal imitation inputs within acceptable bounds.

Authors: We agree that the abstract overstates the support provided by the simulations. The manuscript shows qualitative demonstrations of collision-free imitation in simulation but lacks the quantitative metrics, baselines, or detailed analysis mentioned. We will revise the abstract to state that the simulations illustrate the framework's ability to enforce safety constraints in real time, without claiming broad validation. We will also augment the simulation section with available quantitative details on barrier function satisfaction and QP solve times to better substantiate the results. revision: yes
Referee: [Simulation validation section] Simulation validation section: The central safety claim relies on accurate 3D states from single-camera keypoints to define valid barrier functions and on QP solutions being computed in real time. No noise-injection experiments, feasibility analysis under keypoint uncertainty, or hardware timing results are reported, leaving the CBF safety certificate unverified when these assumptions are relaxed.

Authors: The simulations use ideal keypoint data to compute 3D states for the barrier functions, as described in the motion retargeting pipeline, and demonstrate real-time QP performance under these conditions. We acknowledge the absence of noise-injection experiments or uncertainty analysis, which means the safety certificate is verified only under perfect state assumptions. We will add a dedicated limitations paragraph discussing these assumptions and how the CBF-QP provides guarantees only when they hold, along with any feasible analysis from existing data. Hardware timing is not reported because the work is simulation-based; we will include simulation solve-time statistics but note that hardware deployment remains future work. revision: partial

Circularity Check

0 steps flagged

Standard CBF-QP safety filter with no circular derivation or self-referential steps

full rationale

The paper describes a vision-based human-to-humanoid imitation framework where safety is enforced by formulating a Control Barrier Function (CBF) layer as a Quadratic Program (QP) that filters imitation commands to avoid self-collisions and human-robot collisions. This is presented as a direct application of established CBF-QP techniques to the retargeted joint angles from single-camera keypoints, with no derivations, fitted parameters, or predictions that reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the simulation validation does not involve renaming known results or smuggling ansatzes. The central claim remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the CBF-QP construction is treated as standard background.

pith-pipeline@v0.9.0 · 5391 in / 943 out tokens · 39909 ms · 2026-05-10T15:39:03.039137+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

Leader- follower human-cobot improvised dance using motion capture systems,

V . Gonc ¸alves, N. Giakoumidis, M. Moore, and A. Tzes, “Leader- follower human-cobot improvised dance using motion capture systems,” inInternational Conference on ArtsIT, Interactivity and Game Creation. Springer, 2024, pp. 125–137

work page 2024
[2]

Real-time multi-camera 3d human pose estimation at the edge for industrial applications,

M. Boldo, M. De Marchi, E. Martini, S. Aldegheri, D. Quaglia, F. Fummi, and N. Bombieri, “Real-time multi-camera 3d human pose estimation at the edge for industrial applications,”Expert Systems with Applications, vol. 252, p. 124089, 2024

work page 2024
[3]

On the evaluation of diverse vision systems towards detecting human pose in collaborative robot applications,

A. K. Ramasubramanian, M. Kazasidis, B. Fay, and N. Papakostas, “On the evaluation of diverse vision systems towards detecting human pose in collaborative robot applications,”Sensors, vol. 24, no. 2, p. 578, 2024

work page 2024
[4]

Safe navigation and obstacle avoidance using differentiable optimization based control barrier functions,

B. Dai, R. Khorrambakht, P. Krishnamurthy, V . Gonc ¸alves, A. Tzes, and F. Khorrami, “Safe navigation and obstacle avoidance using differentiable optimization based control barrier functions,”IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5376–5383, 2023

work page 2023
[5]

Vision-aided Leader-Follower Collaborative Mobile Manipulation with Control Barrier Functions,

D. Chaikalis, H. U. Unlu, A. Tzes, and F. Khorrami, “Vision-aided Leader-Follower Collaborative Mobile Manipulation with Control Barrier Functions,”Journal of Intelligent & Robotic Systems, 2026

work page 2026
[6]

Safe, task-consistent manipulation with operational space control barrier functions,

D. Morton and M. Pavone, “Safe, task-consistent manipulation with operational space control barrier functions,”arXiv preprint arXiv:2503.06736, 2025, accepted to IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), Hangzhou, 2025

work page arXiv 2025
[7]

Diffpills: Differentiable collision detection for capsules and padded polygons,

K. Tracy, T. A. Howell, and Z. Manchester, “Diffpills: Differentiable collision detection for capsules and padded polygons,”arXiv preprint arXiv:2207.00202, 2022

work page arXiv 2022
[8]

Differentiable collision detection for a set of convex primitives,

K. Tracy, T. A. Howell, and Z. Manchester, “Differentiable collision detection for a set of convex primitives,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3663–3670

work page 2023
[9]

The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” inIEEE International Symposium on System Integrations (SII), 2019

work page 2019
[10]

JAX: composable transformations of Python+NumPy programs,

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y . Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/jax-ml/jax

work page 2018

[1] [1]

Leader- follower human-cobot improvised dance using motion capture systems,

V . Gonc ¸alves, N. Giakoumidis, M. Moore, and A. Tzes, “Leader- follower human-cobot improvised dance using motion capture systems,” inInternational Conference on ArtsIT, Interactivity and Game Creation. Springer, 2024, pp. 125–137

work page 2024

[2] [2]

Real-time multi-camera 3d human pose estimation at the edge for industrial applications,

M. Boldo, M. De Marchi, E. Martini, S. Aldegheri, D. Quaglia, F. Fummi, and N. Bombieri, “Real-time multi-camera 3d human pose estimation at the edge for industrial applications,”Expert Systems with Applications, vol. 252, p. 124089, 2024

work page 2024

[3] [3]

On the evaluation of diverse vision systems towards detecting human pose in collaborative robot applications,

A. K. Ramasubramanian, M. Kazasidis, B. Fay, and N. Papakostas, “On the evaluation of diverse vision systems towards detecting human pose in collaborative robot applications,”Sensors, vol. 24, no. 2, p. 578, 2024

work page 2024

[4] [4]

Safe navigation and obstacle avoidance using differentiable optimization based control barrier functions,

B. Dai, R. Khorrambakht, P. Krishnamurthy, V . Gonc ¸alves, A. Tzes, and F. Khorrami, “Safe navigation and obstacle avoidance using differentiable optimization based control barrier functions,”IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5376–5383, 2023

work page 2023

[5] [5]

Vision-aided Leader-Follower Collaborative Mobile Manipulation with Control Barrier Functions,

D. Chaikalis, H. U. Unlu, A. Tzes, and F. Khorrami, “Vision-aided Leader-Follower Collaborative Mobile Manipulation with Control Barrier Functions,”Journal of Intelligent & Robotic Systems, 2026

work page 2026

[6] [6]

Safe, task-consistent manipulation with operational space control barrier functions,

D. Morton and M. Pavone, “Safe, task-consistent manipulation with operational space control barrier functions,”arXiv preprint arXiv:2503.06736, 2025, accepted to IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), Hangzhou, 2025

work page arXiv 2025

[7] [7]

Diffpills: Differentiable collision detection for capsules and padded polygons,

K. Tracy, T. A. Howell, and Z. Manchester, “Diffpills: Differentiable collision detection for capsules and padded polygons,”arXiv preprint arXiv:2207.00202, 2022

work page arXiv 2022

[8] [8]

Differentiable collision detection for a set of convex primitives,

K. Tracy, T. A. Howell, and Z. Manchester, “Differentiable collision detection for a set of convex primitives,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3663–3670

work page 2023

[9] [9]

The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” inIEEE International Symposium on System Integrations (SII), 2019

work page 2019

[10] [10]

JAX: composable transformations of Python+NumPy programs,

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y . Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/jax-ml/jax

work page 2018