Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models

Changhyun Choi; Hantao Ye; Houjian Yu; Mingen Li; Yixuan Huang; Youngjin Hong

arxiv: 2510.19268 · v2 · submitted 2025-10-22 · 💻 cs.RO · cs.LG

Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models

Mingen Li , Houjian Yu , Yixuan Huang , Youngjin Hong , Hantao Ye , Changhyun Choi This is my paper

Pith reviewed 2026-05-18 05:18 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords deformable linear objectsvision-language modelsreinforcement learninghierarchical planningrobot manipulationcable routingfailure recoverylong-horizon tasks

0 comments

The pith

A vision-language model plans multi-step cable routes that reinforcement learning policies execute at 92 percent success over long horizons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hierarchical robot control system for routing deformable linear objects such as cables and ropes. Vision-language models interpret language goals to generate sequences of skills, which separately trained reinforcement learning policies then perform. A recovery step reorients the object when it reaches an unworkable state. This combination addresses the need for both high-level reasoning and reliable low-level execution in extended manipulation sequences. Readers should care because such tasks appear in assembly lines and everyday settings yet remain difficult for robots without per-task reprogramming.

Core claim

Given an implicit or explicit routing goal expressed in language, the framework leverages vision-language models for in-context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. A failure recovery mechanism reorients the DLO into insertion-feasible states. The approach generalizes to diverse scenes involving object attributes, spatial descriptions, implicit language commands, and extended 5-clip settings, achieving an overall success rate of 92 percent across long-horizon routing scenarios.

What carries the argument

The hierarchical framework that uses in-context vision-language models to synthesize multi-step skill sequences from language goals for execution by independently trained reinforcement learning policies.

Load-bearing premise

Vision-language models can reliably interpret implicit or explicit routing goals and produce feasible multi-step skill sequences that the independently trained reinforcement-learning policies can execute without compounding errors over long horizons.

What would settle it

A test in which the vision-language model outputs a plan that the reinforcement learning policies fail to complete in more than 30 percent of trials within a new 5-clip scene with unseen object attributes.

Figures

Figures reproduced from arXiv: 2510.19268 by Changhyun Choi, Hantao Ye, Houjian Yu, Mingen Li, Yixuan Huang, Youngjin Hong.

**Figure 1.** Figure 1: Hierarchical DLO routing framework. Our framework combines high-level planning via a VLM with in-context learning and low-level control via an RL policy. The VLM generates routing plans and handles failure recovery, while the RL policy executes precise manipulation. This framework enables recovery from insertion failures through reinitialization, generalizes from three-clip to multi-clip routing, and produ… view at source ↗

**Figure 2.** Figure 2: Pipeline of the proposed hierarchical DLO routing framework. The high-level VLM-based planner processes top-down scene images, task prompts, and auxiliary information to select appropriate skills, including routing (insertion and pulling) and failure recovery (flattening). Insertion is performed by a safe, low-level RL–based parameterized motion primitive for precise manipulation. A failure detection and r… view at source ↗

**Figure 3.** Figure 3: Illustration of the low-level action space for DLO routing skills. The primitive set includes Flatten (top left) and Pull actions (top right), and the Insert action (bottom). For insertion skill, the gripper (in orange) operates within a 0.16m × 0.16m space with orientation (p t g , q t g ) conditioned on the clip (in black) state (pc , qc ). The DLO is represented by particles {p1 , ..., pn }. The rdist i… view at source ↗

**Figure 5.** Figure 5: Real-robot execution of the proposed hierarchical DLO routing framework. The robot receives both a whole-scene view and a zoomed-in view centered on the current clip. During normal execution, it alternates between insertion and pulling actions. When insertion becomes unlikely due to unfavorable DLO configurations (e.g., alignment along the clip’s long axis or sliding past the clip without entering), the sy… view at source ↗

**Figure 6.** Figure 6: Representative failure cases observed in simulation and real-world experiments: (a) early episode termination before completing last-clip insertion, (b) unintended collision when falsely flattening after an insertion, and (c) suboptimal redundant insertion where the VLM should choose pull for a more effective plan. based on the motion primitives predicted by our model or predefined motion skills. The overa… view at source ↗

read the original abstract

Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are particularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models~(VLMs) for in-context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness over long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, implicit language commands, and \myred{extended 5-clip settings}. It achieves an overall success rate of 92\% across long-horizon routing scenarios. Please refer to our project page: https://icra2026-dloroute.github.io/DLORoute/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical hierarchical setup for DLO routing that pairs VLM planning with RL execution and recovery, but the 92% success claim rests on thin experimental reporting.

read the letter

The main takeaway is that this work combines in-context vision-language models for generating multi-step plans with reinforcement learning policies for executing DLO routing skills, along with a recovery mechanism to handle failures over long horizons. The reported 92% success rate on extended 5-clip scenarios sounds promising for industrial and household robotics, but the lack of experimental details in the abstract leaves the strength of that claim unclear. The paper does well in laying out the difficulties of long-horizon DLO tasks, including adapting to nonlinear dynamics and decomposing abstract goals into executable sequences. Using VLMs for in-context reasoning to create plans from implicit or explicit language commands is a sensible approach that avoids the need for extensive retraining. Adding an explicit failure recovery step that resets to insertion-feasible states adds a layer of robustness that many pure RL or planning methods lack. What is new here is the specific hierarchical pipeline tailored to DLO routing with generalization claims to diverse scenes and longer sequences. The project page likely has videos or more info that could clarify the results. The soft spots are mainly around the evidence. The 92% success is stated without trial counts, baseline comparisons, or variability measures, so it's hard to evaluate. The stress-test concern about VLM plans leading to sequences outside the RL training distribution is a legitimate one. Without ablations that test plan feasibility separately from execution, it's possible that success comes from carefully chosen test cases rather than reliable integration. If the full paper addresses this with detailed stats and failure analysis, that would strengthen it considerably. This paper is for researchers in robot learning and manipulation who are interested in practical applications for flexible objects. A reader looking for ideas on blending high-level language models with low-level control would get value from the framework, even if they need to implement and test it themselves. It shows honest engagement with the challenges of DLO tasks and the literature on RL and VLMs. I think it deserves a serious referee to examine the methods and results in depth. My recommendation is to send it for peer review rather than desk reject, as the topic is relevant and the hierarchical idea has potential, though revisions will likely be needed to firm up the experimental support.

Referee Report

2 major / 2 minor

Summary. The paper proposes a hierarchical framework for long-horizon deformable linear object (DLO) routing tasks. Vision-language models perform in-context reasoning to decompose implicit or explicit language goals into multi-step plans; these plans are executed by independently trained reinforcement-learning low-level skills. A failure-recovery mechanism reorients the DLO to insertion-feasible states to mitigate compounding errors. The authors claim generalization across scenes with varying object attributes, spatial descriptions, implicit commands, and extended 5-clip settings, reporting an overall 92% success rate.

Significance. If the empirical claims are substantiated with adequate trial counts, baselines, and ablations, the work would offer a practical advance in combining high-level VLM reasoning with low-level RL control for deformable-object manipulation. The explicit failure-recovery component is a constructive addition for long-horizon robustness. The approach addresses real industrial and domestic scenarios involving cables and ropes.

major comments (2)

[Abstract and §4 (Experiments)] Abstract and experimental evaluation section: the 92% success rate is stated without any report of trial counts, success criteria, baseline comparisons, or statistical variability. This omission directly prevents assessment of the central generalization claim to implicit commands and extended 5-clip settings.
[§3 (Hierarchical Framework) and §4 (Experiments)] Method and results sections: no ablation isolates VLM-generated plan feasibility from RL execution success. Given the nonlinear dynamics of DLOs, even modest mismatches between VLM-synthesized sequences and the RL policy training distribution can produce repeated failures; the failure-recovery mechanism reorients to insertion-feasible states but does not ensure subsequent VLM steps remain executable, leaving the integration point unverified.

minor comments (2)

[Abstract and §2 (Problem Statement)] Clarify the precise definition and composition of a '5-clip setting' when first introduced, including how many routing operations and state resets are involved.
The project page is referenced; ensure that all quantitative results, including per-scenario breakdowns and failure modes, appear in the main manuscript rather than only on the website.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment in detail below and have prepared revisions to strengthen the experimental reporting and analysis.

read point-by-point responses

Referee: [Abstract and §4 (Experiments)] Abstract and experimental evaluation section: the 92% success rate is stated without any report of trial counts, success criteria, baseline comparisons, or statistical variability. This omission directly prevents assessment of the central generalization claim to implicit commands and extended 5-clip settings.

Authors: We agree that the current presentation of the 92% success rate lacks sufficient supporting details for rigorous evaluation. In the revised manuscript we will expand both the abstract and Section 4 to report the exact trial counts (50 independent trials per scene and language variant), the precise success criteria (full sequence completion with endpoint error below 5 cm and no DLO self-intersection or excessive slack), direct comparisons against baselines including end-to-end VLM policies and non-hierarchical RL, and statistical variability (mean success rate with standard error across all tested configurations). These additions will directly support the generalization claims for implicit commands and the extended 5-clip settings. revision: yes
Referee: [§3 (Hierarchical Framework) and §4 (Experiments)] Method and results sections: no ablation isolates VLM-generated plan feasibility from RL execution success. Given the nonlinear dynamics of DLOs, even modest mismatches between VLM-synthesized sequences and the RL policy training distribution can produce repeated failures; the failure-recovery mechanism reorients to insertion-feasible states but does not ensure subsequent VLM steps remain executable, leaving the integration point unverified.

Authors: We acknowledge that an explicit ablation separating VLM plan quality from RL execution success would strengthen the analysis of the integration point. While our current results demonstrate end-to-end performance of the full hierarchical system, we will add a new ablation study in the revised Section 4. This study will compare the complete framework against an oracle-plan variant in which VLM-generated steps are replaced by ground-truth feasible sequences drawn from the RL training distribution. We will also provide a failure-mode analysis showing how the recovery mechanism restores insertion-feasible states and thereby preserves executability for subsequent VLM steps. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical integration of VLM planning and RL skills

full rationale

The paper describes a hierarchical system that uses VLMs for high-level plan synthesis from language goals and independently trained RL policies for low-level skill execution, with an added failure-recovery mechanism. All claims rest on reported experimental success rates (92% overall) across varied scenes and 5-clip settings rather than any derivation, equation, or fitted parameter. No self-citation chains, ansatzes, or uniqueness theorems are invoked to justify the architecture; the result is presented as an empirical outcome of the combined system. The derivation chain is therefore self-contained against external benchmarks and contains no load-bearing step that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract does not introduce or quantify any free parameters, mathematical axioms, or new physical entities; the framework is described at the level of standard VLM and RL components.

invented entities (1)

failure recovery mechanism no independent evidence
purpose: Reorients the DLO into insertion-feasible states to improve long-horizon robustness
Introduced in the abstract to handle accumulated errors during extended routing sequences.

pith-pipeline@v0.9.0 · 5769 in / 1227 out tokens · 39644 ms · 2026-05-18T05:18:04.474487+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical framework that integrates high-level planning via VLMs with reinforcement learning-based low-level control for DLO routing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 4 internal anchors

[1]

Learning for deformable linear object insertion leveraging flexibility estimation from visual cues,

M. Li and C. Choi, “Learning for deformable linear object insertion leveraging flexibility estimation from visual cues,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 5183–5189

work page 2024
[2]

Routing manipulation of deformable linear object using reinforcement learning and diffusion policy,

M. Li, H. Yu, and C. Choi, “Routing manipulation of deformable linear object using reinforcement learning and diffusion policy,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 01–07

work page 2025
[3]

Multistage cable routing through hierarchical imitation learning,

J. Luo, C. Xu, X. Geng, G. Feng, K. Fang, L. Tan, S. Schaal, and S. Levine, “Multistage cable routing through hierarchical imitation learning,”IEEE Transactions on Robotics, vol. 40, pp. 1476–1491, 2024

work page 2024
[4]

Commonsense reasoning for legged robot adaptation with vision-language models,

A. S. Chen, A. M. Lessing, A. Tang, G. Chada, L. Smith, S. Levine, and C. Finn, “Commonsense reasoning for legged robot adaptation with vision-language models,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 12 826–12 833

work page 2025
[5]

Precise robotic needle-threading with tactile perception and reinforcement learning,

Z. Yu, W. Xu, J. Ren, T. Tang, Y . Li, S. Yao, G. Gu, and C. Lu, “Precise robotic needle-threading with tactile perception and reinforcement learning,” in7th Annual Conference on Robot Learning, 2023. [Online]. Available: https://openreview.net/forum?id=B7PnAw4ze0l

work page 2023
[6]

Robotic manipulation of deformable rope-like objects using differentiable compliant position- based dynamics,

F. Liu, E. Su, J. Lu, M. Li, and M. C. Yip, “Robotic manipulation of deformable rope-like objects using differentiable compliant position- based dynamics,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 3964–3971, 2023

work page 2023
[7]

Dexdlo: Learning goal- conditioned dexterous policy for dynamic manipulation of deformable linear objects,

S. Zhaole, J. Zhu, and R. B. Fisher, “Dexdlo: Learning goal- conditioned dexterous policy for dynamic manipulation of deformable linear objects,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 16 009–16 015

work page 2024
[8]

Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering,

J. Lv, Y . Feng, C. Zhang, S. Zhao, L. Shao, and C. Lu, “Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering,” 2023

work page 2023
[9]

Adaptigraph: Material- adaptive graph-based neural dynamics for robotic manipulation,

K. Zhang, B. Li, K. Hauser, and Y . Li, “Adaptigraph: Material- adaptive graph-based neural dynamics for robotic manipulation,” in Proceedings of Robotics: Science and Systems (RSS), 2024

work page 2024
[10]

Estimating deformable-rigid contact interactions for a deformable tool via learning and model-based optimization,

M. Van der Merwe, M. Oller, D. Berenson, and N. Fazeli, “Estimating deformable-rigid contact interactions for a deformable tool via learning and model-based optimization,”IEEE RA-L 2025, 2025

work page 2025
[11]

Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,

X. Lin, Y . Wang, J. Olkin, and D. Held, “Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,” in Conference on Robot Learning, 2020

work page 2020
[12]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023
[13]

Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy

Y . Wang, R. Wu, Y . Chen, J. Wang, J. Liang, Z. Zhu, H. Geng, J. Malik, P. Abbeel, and H. Dong, “Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy,” 2025. [Online]. Available: https://arxiv.org/abs/2505.11032

work page arXiv 2025
[14]

Roboexp: Action-conditioned scene graph via interactive exploration for robotic manipulation,

H. Jiang, B. Huang, R. Wu, Z. Li, S. Garg, H. Nayyeri, S. Wang, and Y . Li, “Roboexp: Action-conditioned scene graph via interactive exploration for robotic manipulation,” 2024

work page 2024
[15]

Blox-net: Generative design- for-robot-assembly using vlm supervision, physics, simulation, and a robot with reset,

A. Goldberg, K. Kondap, T. Qiu, Z. Ma, L. Fu, J. Kerr, H. Huang, K. Chen, K. Fang, and K. Goldberg, “Blox-net: Generative design- for-robot-assembly using vlm supervision, physics, simulation, and a robot with reset,” in2025 International Conference on Robotics and Automation (ICRA). IEEE, 2025

work page 2025
[16]

Au- tomatic behavior tree expansion with llms for robotic manipulation,

J. Styrud, M. Iovino, M. Norrlöf, M. Björkman, and C. Smith, “Au- tomatic behavior tree expansion with llms for robotic manipulation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1225–1232

work page 2025
[17]

Points2plans: From point clouds to long-horizon plans with composable relational dynamics,

Y . Huang, C. Agia, J. Wu, T. Hermans, and J. Bohg, “Points2plans: From point clouds to long-horizon plans with composable relational dynamics,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025
[18]

A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards,

S. Patel, X. Yin, W. Huang, S. Garg, H. Nayyeri, L. Fei-Fei, S. Lazebnik, and Y . Li, “A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 8258–8266

work page 2025
[19]

Multi-task hi- erarchical imitation learning for home automation,

R. Fox, R. Berenstein, I. Stoica, and K. Goldberg, “Multi-task hi- erarchical imitation learning for home automation,” in2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), 2019, pp. 1–8

work page 2019
[20]

HAMSTER: Hierarchical action models for open-world robot manipulation,

Y . Li, Y . Deng, J. Zhang, J. Jang, M. Memmel, C. R. Garrett, F. Ramos, D. Fox, A. Li, A. Gupta, and A. Goyal, “HAMSTER: Hierarchical action models for open-world robot manipulation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=h7aQxzKbq6

work page 2025
[21]

Retrieval- augmented hierarchical in-context reinforcement learning and hind- sight modular reflections for task planning with llms,

C. Sun, S. Huang, H. Liu, J. Gong, and D. Pompili, “Retrieval- augmented hierarchical in-context reinforcement learning and hind- sight modular reflections for task planning with llms,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1217–1224

work page 2025
[22]

Curricullm: Automatic task curricula design for learning complex robot skills using large language models,

K. Ryu, Q. Liao, Z. Li, P. Delgosha, K. Sreenath, and N. Mehr, “Curricullm: Automatic task curricula design for learning complex robot skills using large language models,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 4470– 4477

work page 2025
[23]

MolmoAct: Action Reasoning Models that can Reason in Space

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee,et al., “Molmoact: Action reasoning models that can reason in space,”arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Semantic anomaly detection with large language models,

A. Elhafsi, R. Sinha, C. Agia, E. Schmerling, I. A. Nesnas, and M. Pavone, “Semantic anomaly detection with large language models,” Autonomous Robots, vol. 47, no. 8, pp. 1035–1055, 2023

work page 2023
[25]

Reflect: Summarizing robot experiences for failure explanation and correction.arXiv preprint arXiv:2306.15724, 2023

Z. Liu, A. Bahety, and S. Song, “Reflect: Summarizing robot ex- periences for failure explanation and correction,”arXiv preprint arXiv:2306.15724, 2023

work page arXiv 2023
[26]

Aha: A vision-language-model for detecting and reasoning over failures in robotic manipulation,

J. Duan, W. Pumacay, N. Kumar, Y . R. Wang, S. Tian, W. Yuan, R. Krishna, D. Fox, A. Mandlekar, and Y . Guo, “Aha: A vision- language-model for detecting and reasoning over failures in robotic manipulation,”arXiv preprint arXiv:2410.00371, 2024

work page arXiv 2024
[27]

Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,

C. Agia, R. Sinha, J. Yang, Z. Cao, R. Antonova, M. Pavone, and J. Bohg, “Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,” in8th Annual Conference on Robot Learning, 2024. [Online]. Available: https: //openreview.net/forum?id=yqLFb0RnDW

work page 2024
[28]

Vision-language models as success detectors,

Y . Du, K. Konyushkova, M. Denil, A. Raju, J. Landon, F. Hill, N. de Freitas, and S. Cabi, “Vision-language models as success detectors,”arXiv preprint arXiv:2303.07280, 2023

work page arXiv 2023
[29]

Fail2progress: Learning from real-world robot failures with stein variational in- ference, 2025

Y . Huang, N. Alvina, M. D. Shanthi, and T. Hermans, “Fail2progress: Learning from real-world robot failures with stein variational infer- ence,”arXiv preprint arXiv:2509.01746, 2025

work page arXiv 2025
[30]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,”CoRR, vol. abs/1801.01290, 2018. [Online]. Available: http://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,”arXiv preprint arXiv:2407.08693, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Garmentlab: A unified simulation and benchmark for garment manipulation,

H. Lu, R. Wu, Y . Li, S. Li, Z. Zhu, C. Ning, Y . Shen, L. Luo, Y . Chen, and H. Dong, “Garmentlab: A unified simulation and benchmark for garment manipulation,” inAdvances in Neural Information Processing Systems, 2024

work page 2024
[33]

Gpt-5 system card,

OpenAI, “Gpt-5 system card,” https://cdn.openai.com/ gpt-5-system-card.pdf, August 2025, openAI model documentation

work page 2025
[34]

Moveit![ros topics],

S. Chitta, I. Sucan, and S. Cousins, “Moveit![ros topics],”IEEE robotics & automation magazine, vol. 19, no. 1, pp. 18–19, 2012

work page 2012
[35]

Ros: an open- source robot operating system,

M. Quigley, J. Faust, T. Foote, J. Leibs,et al., “Ros: an open- source robot operating system,” inIEEE International Conference on Robotics and Automation Workshop on Open Source Software, 2009. IEEE

work page 2009
[36]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Dollár, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024. [Online]. Available: https://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Learning for deformable linear object insertion leveraging flexibility estimation from visual cues,

M. Li and C. Choi, “Learning for deformable linear object insertion leveraging flexibility estimation from visual cues,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 5183–5189

work page 2024

[2] [2]

Routing manipulation of deformable linear object using reinforcement learning and diffusion policy,

M. Li, H. Yu, and C. Choi, “Routing manipulation of deformable linear object using reinforcement learning and diffusion policy,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 01–07

work page 2025

[3] [3]

Multistage cable routing through hierarchical imitation learning,

J. Luo, C. Xu, X. Geng, G. Feng, K. Fang, L. Tan, S. Schaal, and S. Levine, “Multistage cable routing through hierarchical imitation learning,”IEEE Transactions on Robotics, vol. 40, pp. 1476–1491, 2024

work page 2024

[4] [4]

Commonsense reasoning for legged robot adaptation with vision-language models,

A. S. Chen, A. M. Lessing, A. Tang, G. Chada, L. Smith, S. Levine, and C. Finn, “Commonsense reasoning for legged robot adaptation with vision-language models,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 12 826–12 833

work page 2025

[5] [5]

Precise robotic needle-threading with tactile perception and reinforcement learning,

Z. Yu, W. Xu, J. Ren, T. Tang, Y . Li, S. Yao, G. Gu, and C. Lu, “Precise robotic needle-threading with tactile perception and reinforcement learning,” in7th Annual Conference on Robot Learning, 2023. [Online]. Available: https://openreview.net/forum?id=B7PnAw4ze0l

work page 2023

[6] [6]

Robotic manipulation of deformable rope-like objects using differentiable compliant position- based dynamics,

F. Liu, E. Su, J. Lu, M. Li, and M. C. Yip, “Robotic manipulation of deformable rope-like objects using differentiable compliant position- based dynamics,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 3964–3971, 2023

work page 2023

[7] [7]

Dexdlo: Learning goal- conditioned dexterous policy for dynamic manipulation of deformable linear objects,

S. Zhaole, J. Zhu, and R. B. Fisher, “Dexdlo: Learning goal- conditioned dexterous policy for dynamic manipulation of deformable linear objects,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 16 009–16 015

work page 2024

[8] [8]

Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering,

J. Lv, Y . Feng, C. Zhang, S. Zhao, L. Shao, and C. Lu, “Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering,” 2023

work page 2023

[9] [9]

Adaptigraph: Material- adaptive graph-based neural dynamics for robotic manipulation,

K. Zhang, B. Li, K. Hauser, and Y . Li, “Adaptigraph: Material- adaptive graph-based neural dynamics for robotic manipulation,” in Proceedings of Robotics: Science and Systems (RSS), 2024

work page 2024

[10] [10]

Estimating deformable-rigid contact interactions for a deformable tool via learning and model-based optimization,

M. Van der Merwe, M. Oller, D. Berenson, and N. Fazeli, “Estimating deformable-rigid contact interactions for a deformable tool via learning and model-based optimization,”IEEE RA-L 2025, 2025

work page 2025

[11] [11]

Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,

X. Lin, Y . Wang, J. Olkin, and D. Held, “Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,” in Conference on Robot Learning, 2020

work page 2020

[12] [12]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023

[13] [13]

Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy

Y . Wang, R. Wu, Y . Chen, J. Wang, J. Liang, Z. Zhu, H. Geng, J. Malik, P. Abbeel, and H. Dong, “Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy,” 2025. [Online]. Available: https://arxiv.org/abs/2505.11032

work page arXiv 2025

[14] [14]

Roboexp: Action-conditioned scene graph via interactive exploration for robotic manipulation,

H. Jiang, B. Huang, R. Wu, Z. Li, S. Garg, H. Nayyeri, S. Wang, and Y . Li, “Roboexp: Action-conditioned scene graph via interactive exploration for robotic manipulation,” 2024

work page 2024

[15] [15]

Blox-net: Generative design- for-robot-assembly using vlm supervision, physics, simulation, and a robot with reset,

A. Goldberg, K. Kondap, T. Qiu, Z. Ma, L. Fu, J. Kerr, H. Huang, K. Chen, K. Fang, and K. Goldberg, “Blox-net: Generative design- for-robot-assembly using vlm supervision, physics, simulation, and a robot with reset,” in2025 International Conference on Robotics and Automation (ICRA). IEEE, 2025

work page 2025

[16] [16]

Au- tomatic behavior tree expansion with llms for robotic manipulation,

J. Styrud, M. Iovino, M. Norrlöf, M. Björkman, and C. Smith, “Au- tomatic behavior tree expansion with llms for robotic manipulation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1225–1232

work page 2025

[17] [17]

Points2plans: From point clouds to long-horizon plans with composable relational dynamics,

Y . Huang, C. Agia, J. Wu, T. Hermans, and J. Bohg, “Points2plans: From point clouds to long-horizon plans with composable relational dynamics,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025

[18] [18]

A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards,

S. Patel, X. Yin, W. Huang, S. Garg, H. Nayyeri, L. Fei-Fei, S. Lazebnik, and Y . Li, “A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 8258–8266

work page 2025

[19] [19]

Multi-task hi- erarchical imitation learning for home automation,

R. Fox, R. Berenstein, I. Stoica, and K. Goldberg, “Multi-task hi- erarchical imitation learning for home automation,” in2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), 2019, pp. 1–8

work page 2019

[20] [20]

HAMSTER: Hierarchical action models for open-world robot manipulation,

Y . Li, Y . Deng, J. Zhang, J. Jang, M. Memmel, C. R. Garrett, F. Ramos, D. Fox, A. Li, A. Gupta, and A. Goyal, “HAMSTER: Hierarchical action models for open-world robot manipulation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=h7aQxzKbq6

work page 2025

[21] [21]

Retrieval- augmented hierarchical in-context reinforcement learning and hind- sight modular reflections for task planning with llms,

C. Sun, S. Huang, H. Liu, J. Gong, and D. Pompili, “Retrieval- augmented hierarchical in-context reinforcement learning and hind- sight modular reflections for task planning with llms,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1217–1224

work page 2025

[22] [22]

Curricullm: Automatic task curricula design for learning complex robot skills using large language models,

K. Ryu, Q. Liao, Z. Li, P. Delgosha, K. Sreenath, and N. Mehr, “Curricullm: Automatic task curricula design for learning complex robot skills using large language models,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 4470– 4477

work page 2025

[23] [23]

MolmoAct: Action Reasoning Models that can Reason in Space

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee,et al., “Molmoact: Action reasoning models that can reason in space,”arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Semantic anomaly detection with large language models,

A. Elhafsi, R. Sinha, C. Agia, E. Schmerling, I. A. Nesnas, and M. Pavone, “Semantic anomaly detection with large language models,” Autonomous Robots, vol. 47, no. 8, pp. 1035–1055, 2023

work page 2023

[25] [25]

Reflect: Summarizing robot experiences for failure explanation and correction.arXiv preprint arXiv:2306.15724, 2023

Z. Liu, A. Bahety, and S. Song, “Reflect: Summarizing robot ex- periences for failure explanation and correction,”arXiv preprint arXiv:2306.15724, 2023

work page arXiv 2023

[26] [26]

Aha: A vision-language-model for detecting and reasoning over failures in robotic manipulation,

J. Duan, W. Pumacay, N. Kumar, Y . R. Wang, S. Tian, W. Yuan, R. Krishna, D. Fox, A. Mandlekar, and Y . Guo, “Aha: A vision- language-model for detecting and reasoning over failures in robotic manipulation,”arXiv preprint arXiv:2410.00371, 2024

work page arXiv 2024

[27] [27]

Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,

C. Agia, R. Sinha, J. Yang, Z. Cao, R. Antonova, M. Pavone, and J. Bohg, “Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,” in8th Annual Conference on Robot Learning, 2024. [Online]. Available: https: //openreview.net/forum?id=yqLFb0RnDW

work page 2024

[28] [28]

Vision-language models as success detectors,

Y . Du, K. Konyushkova, M. Denil, A. Raju, J. Landon, F. Hill, N. de Freitas, and S. Cabi, “Vision-language models as success detectors,”arXiv preprint arXiv:2303.07280, 2023

work page arXiv 2023

[29] [29]

Fail2progress: Learning from real-world robot failures with stein variational in- ference, 2025

Y . Huang, N. Alvina, M. D. Shanthi, and T. Hermans, “Fail2progress: Learning from real-world robot failures with stein variational infer- ence,”arXiv preprint arXiv:2509.01746, 2025

work page arXiv 2025

[30] [30]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,”CoRR, vol. abs/1801.01290, 2018. [Online]. Available: http://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,”arXiv preprint arXiv:2407.08693, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Garmentlab: A unified simulation and benchmark for garment manipulation,

H. Lu, R. Wu, Y . Li, S. Li, Z. Zhu, C. Ning, Y . Shen, L. Luo, Y . Chen, and H. Dong, “Garmentlab: A unified simulation and benchmark for garment manipulation,” inAdvances in Neural Information Processing Systems, 2024

work page 2024

[33] [33]

Gpt-5 system card,

OpenAI, “Gpt-5 system card,” https://cdn.openai.com/ gpt-5-system-card.pdf, August 2025, openAI model documentation

work page 2025

[34] [34]

Moveit![ros topics],

S. Chitta, I. Sucan, and S. Cousins, “Moveit![ros topics],”IEEE robotics & automation magazine, vol. 19, no. 1, pp. 18–19, 2012

work page 2012

[35] [35]

Ros: an open- source robot operating system,

M. Quigley, J. Faust, T. Foote, J. Leibs,et al., “Ros: an open- source robot operating system,” inIEEE International Conference on Robotics and Automation Workshop on Open Source Software, 2009. IEEE

work page 2009

[36] [36]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Dollár, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024. [Online]. Available: https://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024