GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Dzmitry Tsetserukou; Jeffrin Sam; Konstantin Gubernatorov; Marcelino Julio Fernando; Miguel Altamirano Cabrera; Yara Mahmoud

arxiv: 2604.19522 · v1 · submitted 2026-04-21 · 💻 cs.RO

GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Marcelino Julio Fernando , Miguel Altamirano Cabrera , Jeffrin Sam , Yara Mahmoud , Konstantin Gubernatorov , Dzmitry Tsetserukou This is my paper

Pith reviewed 2026-05-10 01:52 UTC · model grok-4.3

classification 💻 cs.RO

keywords VLM-RAGWhole-Body MPCVirtual ImpedanceBimanual ManipulationSemantic GroundingHuman-Robot InteractionModel Predictive ControlRobotics

0 comments

The pith

GenerativeMPC uses VLM-RAG to translate semantic context into MPC constraints and impedance parameters for safe bimanual manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenerativeMPC as a way to connect high-level vision-language understanding with low-level robot control. A VLM with retrieval-augmented generation turns visual and language inputs into specific velocity limits and safety margins for a whole-body model predictive controller. It also adjusts virtual stiffness and damping for compliant interactions. An experience database keeps the parameters consistent across uses without retraining the model. Tests in simulators and on a physical robot show the system reduces speed by 60 percent near humans while enabling safe navigation and manipulation.

Core claim

GenerativeMPC is a hierarchical cyber-physical framework that uses a Vision-Language Model with Retrieval-Augmented Generation to convert visual and linguistic context into dynamic velocity limits and safety margins for Whole-Body Model Predictive Control, as well as virtual stiffness and damping gains for a unified impedance-admittance controller, with an experience-driven vector database ensuring consistent semantic-to-physical parameter grounding, leading to safe and socially-aware bimanual mobile manipulation as validated in MuJoCo, IsaacSim, and physical experiments.

What carries the argument

The VLM-RAG module paired with an experience-driven vector database, which maps semantic scene understanding to physical control parameters like velocity limits for MPC and gains for impedance control.

If this is right

Dynamic velocity limits allow 60% speed reduction near humans for safer interaction.
Virtual impedance modulation enables context-aware compliance during human-robot tasks.
Experience-driven database provides consistent parameter grounding without retraining.
Semantic-to-physical grounding supports socially-aware navigation and manipulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method may allow robots to adapt behavior in new environments using stored experiences rather than retraining.
Similar grounding could apply to other control systems beyond bimanual manipulation.
Real-world deployment could test the reliability of VLM outputs in varied lighting or occlusion conditions.

Load-bearing premise

The VLM-RAG can consistently produce control parameters that are safe and do not cause instability in the high-frequency MPC and impedance controllers.

What would settle it

Observation of the robot exceeding proposed safety margins or exhibiting unstable behavior when the VLM-RAG suggests specific velocity limits or impedance gains during human proximity tests.

Figures

Figures reproduced from arXiv: 2604.19522 by Dzmitry Tsetserukou, Jeffrin Sam, Konstantin Gubernatorov, Marcelino Julio Fernando, Miguel Altamirano Cabrera, Yara Mahmoud.

**Figure 1.** Figure 1: Left: Base trajectory from (0, 0) to (3.0, 2.0) m. The APF cost embedded in the whole-body MPC produces a smooth curved path around both obstacles. Right: real-world counterpart showing the robot navigating around a human as a dynamic obstacle in an indoor environment. [4]. While recent state-of-the-art models like MoManipVLA [5] and FALCON [6] have pushed the boundaries of end-toend mobile manipulation, … view at source ↗

**Figure 2.** Figure 2: Bimanual manipulation in IsaacSim. Left: the robot performs a pick [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: GenerativeMPC three-layer system architecture. Layer 1 (VLM-RAG) processes camera images and natural language instructions, outputting [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The GenerativeMPC hardware platform: differential-drive base [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 6.** Figure 6: MuJoCo warehouse simulation environment with cylindrical [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Position error (left) and heading error (right) convergence. The [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

read the original abstract

Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control parameters for bimanual mobile manipulators. The system utilizes a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control constraints, specifically outputting dynamic velocity limits and safety margins for a Whole-Body Model Predictive Controller (MPC). Simultaneously, the VLM-RAG module modulates virtual stiffness and damping gains for a unified impedance-admittance controller, enabling context-aware compliance during human-robot interaction. Our framework leverages an experience-driven vector database to ensure consistent parameter grounding without retraining. Experimental results in MuJoCo, IsaacSim, and on a physical bimanual platform confirm a 60% speed reduction near humans and safe, socially-aware navigation and manipulation through semantic-to-physical parameter grounding. This work advances the field of human-centric cybernetics by grounding large-scale cognitive models into predictable, high-frequency physical control loops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a VLM-RAG layer that feeds velocity limits and impedance gains into whole-body MPC for bimanual robots, but the 60% speed reduction and safety claims rest on descriptions without baselines or stability checks.

read the letter

The core idea is a hierarchical setup that lets a vision-language model with retrieval from an experience database generate dynamic constraints for model predictive control and virtual stiffness/damping values for impedance control. This is aimed at making bimanual mobile manipulators move more slowly and compliantly around people based on visual and linguistic context. The specific mapping from RAG outputs to MPC velocity limits plus impedance modulation on a bimanual platform is presented as a new combination, and they test it across MuJoCo, IsaacSim, and a physical robot, which is a reasonable spread of validation environments. The separation of slow semantic reasoning from fast control loops is a practical choice that keeps the high-frequency loops intact. The main shortcoming is in the results. The abstract states a 60% speed reduction near humans and safe socially-aware behavior, yet supplies no baseline controller for comparison, no statistical details, and no account of how safety or stability was measured. There is also no derivation or filter showing that the retrieved parameters keep the MPC quadratic program feasible or the closed-loop impedance matrix positive definite. A single poor retrieval could inject values that violate those assumptions, and the paper does not address failure modes or guardrails for that case. This work is for researchers who want concrete examples of grounding language-model outputs into compliant control parameters for human-robot settings. A reader looking for architectural patterns could extract the RAG-to-constraint flow even if the numbers need more support. It deserves peer review because the integration direction is worth developing, though any referee would need to press for proper baselines, statistical reporting, and explicit stability analysis on the generated parameters. I would send it out with those specific requests rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript presents GenerativeMPC, a hierarchical cyber-physical framework that employs a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control parameters for bimanual mobile manipulators. It generates dynamic velocity limits and safety margins for a whole-body Model Predictive Controller (MPC) while modulating virtual stiffness and damping gains for a unified impedance-admittance controller. An experience-driven vector database is used to ensure consistent parameter grounding without retraining. Experiments in MuJoCo, IsaacSim, and on a physical platform are claimed to confirm a 60% speed reduction near humans along with safe, socially-aware navigation and manipulation.

Significance. If the results hold, the work is significant for explicitly bridging high-level semantic reasoning from VLMs into high-frequency physical control loops in a modular way that avoids retraining. The experience-driven RAG component for consistent grounding and the multi-environment validation (including real hardware) are strengths that could advance human-centric cybernetics and safe HRI in manipulation tasks. The central claim of reliable semantic-to-physical parameter transfer, however, requires stronger supporting analysis to realize this potential.

major comments (2)

[Abstract] Abstract: the experimental confirmation of a 60% speed reduction near humans is stated without baselines, statistical details, error bars, or description of how safety was quantified, leaving the central performance claim difficult to evaluate.
[Framework] Framework description: no derivation, bounds, or feasibility analysis is provided showing that VLM-RAG outputs (velocity limits, safety margins, stiffness/damping) are constrained to regions where the whole-body MPC quadratic program remains feasible and the closed-loop impedance matrix remains positive definite.

minor comments (1)

[Abstract] The abstract could specify the exact VLM model, vector database implementation, and quantitative metrics for 'socially-aware' behavior to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the experimental confirmation of a 60% speed reduction near humans is stated without baselines, statistical details, error bars, or description of how safety was quantified, leaving the central performance claim difficult to evaluate.

Authors: We agree that the abstract would benefit from additional context to allow independent evaluation of the central claim. In the revised version we will expand the abstract to briefly note the baselines (standard whole-body MPC without VLM-RAG grounding), the total number of trials across MuJoCo, IsaacSim and hardware, and the safety metrics used (minimum human-robot distance and collision-free rate). Detailed statistics, error bars and significance tests remain in the experimental section; the abstract revision will be kept concise by tightening other sentences. revision: yes
Referee: [Framework] Framework description: no derivation, bounds, or feasibility analysis is provided showing that VLM-RAG outputs (velocity limits, safety margins, stiffness/damping) are constrained to regions where the whole-body MPC quadratic program remains feasible and the closed-loop impedance matrix remains positive definite.

Authors: The referee correctly notes the absence of explicit feasibility analysis. In the revision we will insert a new subsection that derives the admissible parameter ranges: velocity limits are clipped to values that keep the reference trajectory inside the MPC feasible set (ensuring the QP remains solvable), while stiffness and damping are retrieved only from database entries that satisfy k > 0 and d > 2 sqrt(k m) to guarantee positive-definiteness of the impedance matrix. We will also state that the experience-driven RAG database contains only parameters validated in prior safe interactions, providing an empirical feasibility envelope, and will reference standard MPC and impedance stability results. revision: yes

Circularity Check

0 steps flagged

No circularity: framework description relies on external VLM-RAG and database without self-referential derivations

full rationale

The paper presents a hierarchical cyber-physical framework that integrates an external Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) and an experience-driven vector database to generate control parameters for Whole-Body MPC and impedance-admittance control. No equations, derivations, or first-principles results are shown that reduce by construction to fitted inputs, self-citations, or renamed known results. The central claims (e.g., 60% speed reduction and semantic-to-physical grounding) are supported by experimental results in MuJoCo, IsaacSim, and hardware rather than internal loops, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the reliability of VLM-RAG for safety-critical parameter generation and on standard assumptions of MPC stability and impedance control passivity.

axioms (2)

domain assumption VLM-RAG outputs can be directly used as dynamic constraints and gains without introducing instability or safety violations in the closed-loop controller
Invoked when the paper states the VLM-RAG module outputs velocity limits and impedance parameters for the MPC and admittance controller.
domain assumption Whole-body MPC with virtual impedance remains stable under the time-varying parameters supplied by the VLM-RAG
Required for the claim of safe, compliant interaction.

invented entities (1)

GenerativeMPC hierarchical framework no independent evidence
purpose: To explicitly bridge semantic scene understanding with physical control parameters
New system architecture introduced to combine VLM-RAG with whole-body MPC and virtual impedance.

pith-pipeline@v0.9.0 · 5538 in / 1453 out tokens · 62757 ms · 2026-05-10T01:52:43.205764+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Hypermotion: Learning hybrid behavior planning for autonomous loco-manipulation,

J. Wang, R. Dai, W. Wang, L. Rossini, F. Ruscelli, and N. Tsagarakis, “Hypermotion: Learning hybrid behavior planning for autonomous loco-manipulation,” 2024, arXiv:2406.14655

work page arXiv 2024
[2]

Robomind 2.0: A multimodal, bimanual mobile manipulation dataset for generalizable embodied intelligence.arXiv preprint arXiv:2512.24653, 2025

C. Hou, K. Wu, J. Liu, Z. Che, D. Wu, F. Liao, G. Li, J. He, Q. Feng, Z. Jinet al., “Robomind 2.0: A multimodal, bimanual mobile manipulation dataset for generalizable embodied intelligence,” 2025, arXiv:2512.24653

work page arXiv 2025
[3]

Model predictive variable impedance control of manipulators for adaptive precision- compliance tradeoff,

Z. Jin, D. Qin, A. Liu, W.-a. Zhang, and L. Yu, “Model predictive variable impedance control of manipulators for adaptive precision- compliance tradeoff,”IEEE/ASME Transactions on Mechatronics, vol. 28, no. 2, pp. 1174–1186, 2023

work page 2023
[4]

An adaptive impedance control for dual-arm manipulators incorporated with the virtual decomposition control,

X. Jing, L. Roveda, J. Li, Y . Wang, and H. Gao, “An adaptive impedance control for dual-arm manipulators incorporated with the virtual decomposition control,”Journal of Vibration and Control, vol. 30, no. 11-12, pp. 2647–2660, 2024

work page 2024
[5]

MoManipVLA: Transferring Vision-language-action Models for General Mobile Ma- nipulation ,

Z. Wu, Y . Zhou, X. Xu, Z. Wang, and H. Yan, “ MoManipVLA: Transferring Vision-language-action Models for General Mobile Ma- nipulation ,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1714–1723

work page 2025
[6]

Falcon: Actively decoupled visuomotor policies for loco- manipulation with foundation-model-based coordination,

C. He, G. Sun, Y . Bai, J. Lu, J. Zhao, and G. Sar- toretti, “Falcon: Actively decoupled visuomotor policies for loco- manipulation with foundation-model-based coordination,”arXiv preprint arXiv:2512.04381, 2025

work page arXiv 2025
[7]

Whole-body mpc for highly redundant legged manipulators: Experimental evaluation with a 37 dof dual-arm quadruped,

I. Dadiotis, A. Laurenzi, and N. Tsagarakis, “Whole-body mpc for highly redundant legged manipulators: Experimental evaluation with a 37 dof dual-arm quadruped,” inProc. IEEE-RAS Int. Conf. on Humanoid Robots (Humanoids). IEEE, Dec. 2023, p. 1–8

work page 2023
[8]

Whole-body model predictive control for mobile manipulation with task priority transition,

Y . Wang, R. Chen, and M. Zhao, “Whole-body model predictive control for mobile manipulation with task priority transition,” in Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 13 356–13 362

work page 2025
[9]

A collision-free mpc for whole-body dynamic locomotion and manipu- lation,

J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipu- lation,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2022, pp. 4686–4693

work page 2022
[10]

Rm- planner: Integrating reinforcement learning with whole-body model predictive control for mobile manipulation,

Z. Zhuang, L. Zheng, W. Li, R. Liu, P. Lu, and H. Cheng, “Rm- planner: Integrating reinforcement learning with whole-body model predictive control for mobile manipulation,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 7263–7269

work page 2025
[11]

Safehumanoid: Vlm-rag-driven impedance control of humanoid robot,

Y . Mahmoud, J. Sam, K. Nguyen, M. J. Fernando, I. Tokmurziyev, M. Altamirano Cabrera, M. H. Khan, A. Lykov, and D. Tsetserukou, “Safehumanoid: Vlm-rag-driven impedance control of humanoid robot,” inProc. ACM/IEEE Int. Conf. on Human- Robot Interaction. New York, NY , USA: Association for Computing Machinery, 2026, p. 974–978. [Online]. Available: https:/...

work page doi:10.1145/3776734.3794539 2026
[12]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,

Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Liu, L. Lu, B. Liet al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 24 185–24 198

work page 2024
[13]

Chroma: The ai-native open-source embedding database,

J. Antonet al., “Chroma: The ai-native open-source embedding database,” https://github.com/chroma-core/chroma, 2022

work page 2022
[14]

V oxact-b: V oxel-based acting and stabi- lizing policy for bimanual manipulation.arXiv preprint arXiv:2407.04152, 2024

I.-C. A. Liu, S. He, D. Seita, and G. Sukhatme, “V oxact-b: V oxel- based acting and stabilizing policy for bimanual manipulation,” 2024, arXiv:2407.04152

work page arXiv 2024
[15]

Impedance control: An approach to manipulation,

N. Hogan, “Impedance control: An approach to manipulation,” inProc. American Control Conf., 1984, pp. 304–313

work page 1984
[16]

Da- vil: Adaptive dual-arm manipulation with reinforcement learning and variable impedance control,

M. F. Karim, S. Bollimuntha, M. S. Hashmi, A. Das, G. Singh, S. Sridhar, A. K. Singh, N. Govindan, and K. M. Krishna, “Da- vil: Adaptive dual-arm manipulation with reinforcement learning and variable impedance control,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 11 896–11 903

work page 2025
[17]

Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,

R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, S. Palma, P. Kooijmans, M. Aractingi, M. Shukor, D. Aubakirova, M. Russi, F. Capuano, C. Pascal, J. Choghari, J. Moss, and T. Wolf, “Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,” https://github.com/huggingface/lerobot, 2024

work page 2024
[18]

Khatib,Real-Time Obstacle Avoidance for Manipulators and Mo- bile Robots

O. Khatib,Real-Time Obstacle Avoidance for Manipulators and Mo- bile Robots. New York, NY: Springer New York, 1990, pp. 396–404

work page 1990

[1] [1]

Hypermotion: Learning hybrid behavior planning for autonomous loco-manipulation,

J. Wang, R. Dai, W. Wang, L. Rossini, F. Ruscelli, and N. Tsagarakis, “Hypermotion: Learning hybrid behavior planning for autonomous loco-manipulation,” 2024, arXiv:2406.14655

work page arXiv 2024

[2] [2]

Robomind 2.0: A multimodal, bimanual mobile manipulation dataset for generalizable embodied intelligence.arXiv preprint arXiv:2512.24653, 2025

C. Hou, K. Wu, J. Liu, Z. Che, D. Wu, F. Liao, G. Li, J. He, Q. Feng, Z. Jinet al., “Robomind 2.0: A multimodal, bimanual mobile manipulation dataset for generalizable embodied intelligence,” 2025, arXiv:2512.24653

work page arXiv 2025

[3] [3]

Model predictive variable impedance control of manipulators for adaptive precision- compliance tradeoff,

Z. Jin, D. Qin, A. Liu, W.-a. Zhang, and L. Yu, “Model predictive variable impedance control of manipulators for adaptive precision- compliance tradeoff,”IEEE/ASME Transactions on Mechatronics, vol. 28, no. 2, pp. 1174–1186, 2023

work page 2023

[4] [4]

An adaptive impedance control for dual-arm manipulators incorporated with the virtual decomposition control,

X. Jing, L. Roveda, J. Li, Y . Wang, and H. Gao, “An adaptive impedance control for dual-arm manipulators incorporated with the virtual decomposition control,”Journal of Vibration and Control, vol. 30, no. 11-12, pp. 2647–2660, 2024

work page 2024

[5] [5]

MoManipVLA: Transferring Vision-language-action Models for General Mobile Ma- nipulation ,

Z. Wu, Y . Zhou, X. Xu, Z. Wang, and H. Yan, “ MoManipVLA: Transferring Vision-language-action Models for General Mobile Ma- nipulation ,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1714–1723

work page 2025

[6] [6]

Falcon: Actively decoupled visuomotor policies for loco- manipulation with foundation-model-based coordination,

C. He, G. Sun, Y . Bai, J. Lu, J. Zhao, and G. Sar- toretti, “Falcon: Actively decoupled visuomotor policies for loco- manipulation with foundation-model-based coordination,”arXiv preprint arXiv:2512.04381, 2025

work page arXiv 2025

[7] [7]

Whole-body mpc for highly redundant legged manipulators: Experimental evaluation with a 37 dof dual-arm quadruped,

I. Dadiotis, A. Laurenzi, and N. Tsagarakis, “Whole-body mpc for highly redundant legged manipulators: Experimental evaluation with a 37 dof dual-arm quadruped,” inProc. IEEE-RAS Int. Conf. on Humanoid Robots (Humanoids). IEEE, Dec. 2023, p. 1–8

work page 2023

[8] [8]

Whole-body model predictive control for mobile manipulation with task priority transition,

Y . Wang, R. Chen, and M. Zhao, “Whole-body model predictive control for mobile manipulation with task priority transition,” in Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 13 356–13 362

work page 2025

[9] [9]

A collision-free mpc for whole-body dynamic locomotion and manipu- lation,

J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipu- lation,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2022, pp. 4686–4693

work page 2022

[10] [10]

Rm- planner: Integrating reinforcement learning with whole-body model predictive control for mobile manipulation,

Z. Zhuang, L. Zheng, W. Li, R. Liu, P. Lu, and H. Cheng, “Rm- planner: Integrating reinforcement learning with whole-body model predictive control for mobile manipulation,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 7263–7269

work page 2025

[11] [11]

Safehumanoid: Vlm-rag-driven impedance control of humanoid robot,

Y . Mahmoud, J. Sam, K. Nguyen, M. J. Fernando, I. Tokmurziyev, M. Altamirano Cabrera, M. H. Khan, A. Lykov, and D. Tsetserukou, “Safehumanoid: Vlm-rag-driven impedance control of humanoid robot,” inProc. ACM/IEEE Int. Conf. on Human- Robot Interaction. New York, NY , USA: Association for Computing Machinery, 2026, p. 974–978. [Online]. Available: https:/...

work page doi:10.1145/3776734.3794539 2026

[12] [12]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,

Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Liu, L. Lu, B. Liet al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 24 185–24 198

work page 2024

[13] [13]

Chroma: The ai-native open-source embedding database,

J. Antonet al., “Chroma: The ai-native open-source embedding database,” https://github.com/chroma-core/chroma, 2022

work page 2022

[14] [14]

V oxact-b: V oxel-based acting and stabi- lizing policy for bimanual manipulation.arXiv preprint arXiv:2407.04152, 2024

I.-C. A. Liu, S. He, D. Seita, and G. Sukhatme, “V oxact-b: V oxel- based acting and stabilizing policy for bimanual manipulation,” 2024, arXiv:2407.04152

work page arXiv 2024

[15] [15]

Impedance control: An approach to manipulation,

N. Hogan, “Impedance control: An approach to manipulation,” inProc. American Control Conf., 1984, pp. 304–313

work page 1984

[16] [16]

Da- vil: Adaptive dual-arm manipulation with reinforcement learning and variable impedance control,

M. F. Karim, S. Bollimuntha, M. S. Hashmi, A. Das, G. Singh, S. Sridhar, A. K. Singh, N. Govindan, and K. M. Krishna, “Da- vil: Adaptive dual-arm manipulation with reinforcement learning and variable impedance control,” inProc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2025, pp. 11 896–11 903

work page 2025

[17] [17]

Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,

R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, S. Palma, P. Kooijmans, M. Aractingi, M. Shukor, D. Aubakirova, M. Russi, F. Capuano, C. Pascal, J. Choghari, J. Moss, and T. Wolf, “Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,” https://github.com/huggingface/lerobot, 2024

work page 2024

[18] [18]

Khatib,Real-Time Obstacle Avoidance for Manipulators and Mo- bile Robots

O. Khatib,Real-Time Obstacle Avoidance for Manipulators and Mo- bile Robots. New York, NY: Springer New York, 1990, pp. 396–404

work page 1990