pith. sign in

arxiv: 2606.07999 · v1 · pith:EMQCNV3Hnew · submitted 2026-06-06 · 💻 cs.AI

Efficient Skill Grounding via Code Refactoring with Small Language Models

Pith reviewed 2026-06-27 19:53 UTC · model grok-4.3

classification 💻 cs.AI
keywords skill groundingcode refactoringsmall language modelsembodied agentsCode-as-Policieslong-horizon control
0
0 comments X

The pith

RECENT lets small language models ground skills by refactoring code bindings instead of regenerating entire programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RECENT as a framework that represents reusable skills as executable code so that small language models can adapt them to new robot bodies and environments. It does this by keeping the skill's control structure fixed and using localized refactoring to change only the execution bindings that tie the skill to a particular embodiment or setting. This approach is tested across multiple robot platforms in dynamic, partially observable environments, where RECENT with small models outperforms other small-model Code-as-Policies baselines and reaches the same task success rates as large-model versions. A sympathetic reader would care because large language models are often unavailable at deployment time, so a method that makes small models sufficient for reliable long-horizon control directly widens the range of agents that can use pre-learned skills.

Core claim

By representing skills as executable code, RECENT preserves semantic intent in the control structure while grounding the skill through localized refactoring of only the embodiment- and environment-specific execution bindings; this enables small language models to achieve the best performance among sLM-based Code-as-Policies methods and to match the task performance of LLM-based Code-as-Policies across diverse robot embodiments and dynamic environments.

What carries the argument

Localized refactoring of executable code that changes only execution bindings while leaving the skill's control structure intact.

If this is right

  • Small language models become sufficient for reliable long-horizon control in partially observable embodied settings.
  • Skills transfer across robot embodiments by editing bindings rather than rewriting entire programs.
  • Code-as-Policies agents avoid the cost and latency of full code regeneration at each grounding step.
  • Performance parity with large-model methods is reached without requiring access to those models at runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same refactoring pattern could reduce the need to store separate skill versions for every possible robot body.
  • If refactoring errors remain low, incremental skill updates become practical without full re-verification.
  • The method might extend to other code-based agent domains where environment bindings change frequently.

Load-bearing premise

Small language models can perform localized refactoring that keeps the original semantic intent of a skill without introducing errors that break long-horizon execution.

What would settle it

A long-horizon task in which a skill refactored by a small model produces incorrect behavior due to unintended changes in control flow, while the same skill executed without refactoring succeeds.

Figures

Figures reproduced from arXiv: 2606.07999 by Chaeun Lee, Daehee Lee, Honguk Woo, Jooyoung Kim, Saehun Chun, Sera Choi, Wonje Choi.

Figure 1
Figure 1. Figure 1: Key concept comparing (top) the skill grounding pro￾cedure used in existing approaches with (bottom) our refactoring￾centric skill grounding procedure. embodiment mismatches through lightweight code modi￾fications without the extensive reasoning associated with regenerating code from scratch. Environmental variations are handled through in-situ adaptation, where execution￾time feedback is incorporated to p… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of RECENT. (i) Offline skill repository stores reusable skill code with ontology-based metadata specifying functional intent, robot embodiment, and semantic relations. (ii) Ontology-based reasoning maps each skill to the target robot, diagnosing determined conflicts and undetermined warnings. (iii) In-situ adaptation patches code at execution time using environment feedback. a skill’s functional r… view at source ↗
Figure 3
Figure 3. Figure 3: Evaluation settings. (left) Kinematic variations, trans￾ferring from the source Panda robot to the target robots UR5 and Sawyer. (right) End-effector variations, transferring from the Franka Hand to the Robotiq 2F-85 and vacuum grippers. Franka Emika Panda as the source embodiment and evaluate deployment on UR5 and Sawyer manipulators, which differ in kinematic structure and joint configuration. All robots… view at source ↗
Figure 4
Figure 4. Figure 4: illustrates the schema of the skill ontology, which connects robots, capabilities, skills, and primitives used for deployment-time reasoning [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Partial instance graph of the skill ontology, illustrating ontology-based reasoning, where a capability mismatch is detected and resolved by substituting an embodiment-compatible primitive groups and end-effectors are assigned motion-planning capabilities, while gripper-equipped embodiments are assigned grasp-related capabilities. Optionally, an sLM may provide auxiliary capability suggestions using primit… view at source ↗
Figure 6
Figure 6. Figure 6: Example scene of kinematic variation scenarios [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example scene of end-effector variation scenarios [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Real-world robot platforms for deployment. The embodiments differ in robot morphology and gripper hardware. Safety safeguards for real-robot execution. RECENT performs code refactoring at the skill level and does not directly bypass low-level robot safety mechanisms. Ontology-based unit tests validate not only API-level compatibility but also embodiment-level execution consistency, so unsafe or infeasible … view at source ↗
Figure 9
Figure 9. Figure 9: Real-world checkout-counter setup for the scan-and-bag task. The unscanned area, where items awaiting scanning are placed, scanning station, barcode scanner, and bagging basket are annotated in the scene. Initial state Cookie snack box Madeleine cake Final state Peach Scan-and-bag for 5 checkout items Detailed view of obstacle handling (pick-and-place) Pick madeleine cake Detect obstacle Place held madelei… view at source ↗
Figure 10
Figure 10. Figure 10: Task sequence for the real-world scan-and-bag task with pick-and-place obstacle handling. The label N/5 indicates that the robot has completed scan-and-bag for N out of five checkout items. The second row provides a detailed view of the obstacle-handling condition during the scan-and-bag sequence for the second item. After picking the second item, the madeleine cake pouch, a graspable teddy bear is introd… view at source ↗
Figure 11
Figure 11. Figure 11: Task sequence for the real-world scan-and-bag task with sweeping obstacle handling. The label N/5 indicates that scan-and-bag has been completed for N out of five checkout items. The second row details the obstacle-handling condition during scan-and-bag for the second item. After picking the second item, the vanilla wafer pouch, a non-graspable camping stove box is introduced into the scanning station. Si… view at source ↗
read the original abstract

Effective skill grounding is essential for deploying reusable skills in embodied agents, as even minor embodiment or environmental differences can render an entire skill incompatible. This challenge is particularly pronounced in embodied settings, where agents must operate in dynamic, partially observable environments without access to large language models (LLMs). In this setting, reliance on LLMs is impractical, while small language models (sLMs) remain insufficient for the effective skill grounding required for reliable long-horizon control. We present RECENT, a refactoring-centric agent framework that enables efficient skill grounding with sLMs by decoupling skill semantics from embodiment- and environment-specific execution binding. By representing skills as executable code, RECENT preserves the semantic intent encoded in a skill's control structure while grounding it by modifying only execution bindings through localized refactoring, rather than regenerating code from scratch. We evaluate RECENT across diverse skill grounding scenarios spanning multiple robot embodiments in dynamic environments, demonstrating robust long-horizon performance when deployed with an sLM. Across all scenarios, RECENT achieves the best performance among sLM-based Code-as-Policies (CaP) methods and matches the task performance of LLM-based CaP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces RECENT, a refactoring-centric framework for skill grounding in embodied agents that uses small language models (sLMs) to decouple skill semantics (preserved in control structure) from embodiment- and environment-specific execution bindings via localized code refactoring rather than full regeneration. It claims that, when deployed with sLMs, RECENT achieves the best performance among sLM-based Code-as-Policies (CaP) methods and matches the task performance of LLM-based CaP across diverse scenarios involving multiple robot embodiments in dynamic, partially observable environments.

Significance. If the empirical claims hold under rigorous verification, the work would be significant for enabling reliable long-horizon control with resource-efficient sLMs in embodied settings where LLMs are impractical, by providing a practical mechanism for skill reuse across embodiments without full re-planning.

major comments (2)
  1. [Abstract] Abstract: the headline claim that 'RECENT achieves the best performance among sLM-based CaP methods and matches the task performance of LLM-based CaP' is presented without any supporting data, error bars, statistical tests, or even high-level method details (e.g., how localization of refactoring is enforced or how success is measured in long-horizon tasks), rendering the central performance result impossible to assess from the provided text.
  2. [Method description (framework section)] Method description (framework section): the core premise that sLM-based localized refactoring 'preserves the semantic intent encoded in a skill's control structure' while only modifying execution bindings is load-bearing for all long-horizon claims, yet no mechanism, invariant check, or trajectory-level verification is described that would detect or prevent semantic drift (e.g., altered timing constants or sensor mappings) in partially observable settings; a single undetected binding error can invalidate multi-step plans without being caught by short-horizon metrics.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'across all scenarios' is vague; a parenthetical listing of the robot embodiments and environment types would improve readability without lengthening the paragraph.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that 'RECENT achieves the best performance among sLM-based CaP methods and matches the task performance of LLM-based CaP' is presented without any supporting data, error bars, statistical tests, or even high-level method details (e.g., how localization of refactoring is enforced or how success is measured in long-horizon tasks), rendering the central performance result impossible to assess from the provided text.

    Authors: We agree that the abstract presents headline claims without supporting data or method details, which limits standalone assessment. The full manuscript contains the supporting experimental results, error bars, statistical comparisons, and descriptions of success metrics for long-horizon tasks. To address the concern, we will revise the abstract to include concise high-level information on the evaluation setup, how localization is enforced via prompting, and the definition of task success. revision: yes

  2. Referee: [Method description (framework section)] Method description (framework section): the core premise that sLM-based localized refactoring 'preserves the semantic intent encoded in a skill's control structure' while only modifying execution bindings is load-bearing for all long-horizon claims, yet no mechanism, invariant check, or trajectory-level verification is described that would detect or prevent semantic drift (e.g., altered timing constants or sensor mappings) in partially observable settings; a single undetected binding error can invalidate multi-step plans without being caught by short-horizon metrics.

    Authors: The framework section explains that RECENT uses targeted prompts to restrict the sLM to modifying only execution bindings while leaving control structure unchanged. We acknowledge that the manuscript does not describe explicit invariant checks or trajectory-level verification to detect semantic drift. We will add a dedicated paragraph detailing the localization enforcement strategy and will include additional verification experiments or analysis in the evaluation section to examine potential drift in partially observable settings. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces RECENT as an independent refactoring-centric framework that decouples skill semantics from execution bindings, with claims supported by empirical evaluation across robot embodiments rather than any derivation chain. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the abstract or description that would reduce the central contribution to its own inputs by construction. The approach is presented as a self-contained methodological contribution without load-bearing references to prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input provides no equations, parameters, or background assumptions to audit.

pith-pipeline@v0.9.1-grok · 5745 in / 844 out tokens · 18928 ms · 2026-06-27T19:53:41.557208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

116 extracted references · 1 canonical work pages

  1. [2]

    The surprising effectiveness of test-time training for few-shot learning

    Aky \"u rek, E., Damani, M., Zweiger, A., Qiu, L., Guo, H., Pari, J., Kim, Y., and Andreas, J. The surprising effectiveness of test-time training for few-shot learning. In Proceedings of the 42nd International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=asgBo3FNdg. Poster

  2. [4]

    Repairagent: An autonomous, llm-based agent for program repair

    Bouzenia, I., Devanbu, P., and Pradel, M. Repairagent: An autonomous, llm-based agent for program repair. In Proceedings of the IEEE/ACM 47th International Conference on Software Engineering, 2025

  3. [5]

    Do as i can, not as i say: Grounding language in robotic affordances

    Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., et al. Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the 6th Conference on Robot Learning, 2023

  4. [6]

    Genchip: generating robot policy code for high-precision and contact-rich manipulation tasks

    Burns, K., Jain, A., Go, K., Xia, F., Stark, M., Schaal, S., and Hausman, K. Genchip: generating robot policy code for high-precision and contact-rich manipulation tasks. In Proceedings of the 37th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 9596--9603. IEEE, 2024

  5. [7]

    Sam 3: Segment anything with concepts

    Carion, N., Gustafson, L., Hu, Y.-T., et al. Sam 3: Segment anything with concepts. In International Conference on Learning Representations, 2026

  6. [9]

    K., Yoo, M., and Woo, H

    Choi, W., Kim, W. K., Yoo, M., and Woo, H. Embodied C o T distillation from LLM to off-the-shelf agents. In Proceedings of the 41st International Conference on Machine Learning, 2024

  7. [11]

    Anygrasp: Robust and efficient grasp perception in spatial and temporal domains

    Fang, H.-S., Wang, C., Fang, H., Gou, M., Liu, J., Yan, H., Liu, W., Xie, Y., and Lu, C. Anygrasp: Robust and efficient grasp perception in spatial and temporal domains. IEEE Transactions on Robotics, 39 0 (5): 0 3929--3945, 2023

  8. [12]

    Genesis: A universal and generative physics engine for robotics and beyond, 2024

    Genesis Authors . Genesis: A universal and generative physics engine for robotics and beyond, 2024. URL https://genesis-embodied-ai.github.io/

  9. [18]

    R., and Davison, A

    James, S., Ma, Z., Arrojo, D. R., and Davison, A. J. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 2020

  10. [20]

    and Peters, J

    Kober, J. and Peters, J. Learning motor primitives for robotics. In 2009 IEEE International Conference on Robotics and Automation, pp.\ 2112--2118. IEEE, 2009

  11. [21]

    A review of robot learning for manipulation: Challenges, representations, and algorithms

    Kroemer, O., Niekum, S., and Konidaris, G. A review of robot learning for manipulation: Challenges, representations, and algorithms. Journal of machine learning research, 22 0 (30): 0 1--82, 2021

  12. [22]

    K., Choi, W., and Woo, H

    Lee, D., Yoo, M., Kim, W. K., Choi, W., and Woo, H. Incremental learning of retrievable skills for efficient continual task adaptation. In Advances in neural information processing systems, volume 37, pp.\ 17286--17312, 2024

  13. [24]

    Structured chain-of-thought prompting for code generation

    Li, J., Li, G., Li, Y., and Jin, Z. Structured chain-of-thought prompting for code generation. ACM Transactions on Software Engineering and Methodology, 34 0 (2): 0 1--23, 2025 b

  14. [26]

    Code as policies: Language model programs for embodied control

    Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., and Zeng, A. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9493--9500. IEEE, 2023

  15. [28]

    Robocodex: multimodal code generation for robotic behavior synthesis

    Mu, Y., Chen, J., Zhang, Q., Chen, S., Yu, Q., Ge, C., Chen, R., Liang, Z., Hu, M., Tao, C., Sun, P., Yu, H., Yang, C., Shao, W., Wang, W., Dai, J., Qiao, Y., Ding, M., and Luo, P. Robocodex: multimodal code generation for robotic behavior synthesis. In Proceedings of the 41st International Conference on Machine Learning (ICML), pp.\ 36434--36454. PMLR, 2024

  16. [29]

    Addendum to gpt-5.2 system card: Gpt-5.2-codex

    OpenAI. Addendum to gpt-5.2 system card: Gpt-5.2-codex. Technical report, December 2025. URL https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

  17. [30]

    Rohmer, E., Singh, S. P. N., and Freese, M. Coppeliasim (formerly v-rep): a versatile and scalable robot simulation framework. In Proc. of The International Conference on Intelligent Robots and Systems (IROS), 2013

  18. [32]

    G., Todescato, M., Schillinger, P., Giftthaler, M., Ochs, M., Spies, M., Waniek, N., Kesper, P., et al

    Rozo, L., Guo, M., Kupcsik, A. G., Todescato, M., Schillinger, P., Giftthaler, M., Ochs, M., Spies, M., Waniek, N., Kesper, P., et al. Learning and sequencing of object-centric manipulation skills for industrial tasks. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 9072--9079. IEEE, 2020

  19. [34]

    Open-ended instructable embodied agents with memory-augmented large language models

    Sarch, G., Wu, Y., Tarr, M., and Fragkiadaki, K. Open-ended instructable embodied agents with memory-augmented large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.\ 3468--3500, 2023

  20. [36]

    H., Wu, J., Washington, C., Sadler, B

    Song, C. H., Wu, J., Washington, C., Sadler, B. M., Chao, W.-L., and Su, Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the 19th IEEE/CVF International Conference on Computer Vision, 2023

  21. [37]

    Sundaralingam, B., Hari, S. K. S., Fishman, A., Garrett, C., Wyk, K. V., Blukis, V., Millane, A., Oleynikova, H., Handa, A., Ramos, F., Ratliff, N., and Fox, D. curobo: Parallelized collision-free minimum-jerk robot motion generation, 2023

  22. [38]

    Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

  23. [39]

    and Beetz, M

    Tenorth, M. and Beetz, M. Representations for robot knowledge in the knowrob framework. Artificial Intelligence, 247: 0 151--169, 2017

  24. [40]

    and Kasaei, H

    Tziafas, G. and Kasaei, H. Lifelong robot library learning: Bootstrapping composable and generalizable skills for embodied control with language models. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 515--522. IEEE, 2024

  25. [41]

    Chatgpt for robotics: Design principles and model abilities

    Vemprala, S., Bonatti, R., Bucker, A., and Kapoor, A. Chatgpt for robotics: Design principles and model abilities. Published by Microsoft, 2023

  26. [44]

    Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought

    Wang, Y., Gonzalez-Pumariega, G., Sharma, Y., and Choudhury, S. Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought. Advances in Neural Information Processing Systems, 2023 b

  27. [48]

    S., Deng, Y., Dunn, S., and Zhang, L

    Xia, C. S., Deng, Y., Dunn, S., and Zhang, L. Demystifying llm-based software engineering agents. Proc. ACM Softw. Eng., 2025

  28. [49]

    Xskill: Cross embodiment skill discovery

    Xu, M., Xu, Z., Chi, C., Veloso, M., and Song, S. Xskill: Cross embodiment skill discovery. In Conference on robot learning, pp.\ 3536--3555. PMLR, 2023

  29. [54]

    Qimeng-codev-r1: Reasoning-enhanced verilog generation

    Zhu, Y., Huang, D., Lyu, H., Zhang, X., Li, C., Shi, W., Wu, Y., Mu, J., Wang, J., Zhao, Y., Jin, P., Cheng, S., Liang, S., Zhang, X., Zhang, R., Du, Z., Guo, Q., Hu, X., and Chen, Y. Qimeng-codev-r1: Reasoning-enhanced verilog generation. In Advances in Neural Information Processing Systems. NeurIPS, 2025. Poster presentation

  30. [55]

    arXiv preprint arXiv:2207.14255 , year=

    Efficient training of language models to fill in the middle , author=. arXiv preprint arXiv:2207.14255 , year=

  31. [56]

    arXiv preprint arXiv:2207.05608 , year=

    Inner monologue: Embodied reasoning through planning with language models , author=. arXiv preprint arXiv:2207.05608 , year=

  32. [57]

    Proceedings of the 6th Conference on Robot Learning , year =

    Do as i can, not as i say: Grounding language in robotic affordances , author =. Proceedings of the 6th Conference on Robot Learning , year =

  33. [58]

    arXiv preprint arXiv:2408.01024 , year=

    Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments , author=. arXiv preprint arXiv:2408.01024 , year=

  34. [59]

    Proceedings of the 19th IEEE/CVF International Conference on Computer Vision , year =

    LLM-planner: Few-shot grounded planning for embodied agents with large language models , author =. Proceedings of the 19th IEEE/CVF International Conference on Computer Vision , year =

  35. [60]

    Embodied

    Choi, Wonje and Kim, Woo Kyung and Yoo, Minjong and Woo, Honguk , booktitle =. Embodied

  36. [61]

    arXiv preprint arXiv:2409.12411 , year=

    Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation , author=. arXiv preprint arXiv:2409.12411 , year=

  37. [62]

    arXiv preprint arXiv:2310.04406 , year=

    Language agent tree search unifies reasoning acting and planning in language models , author=. arXiv preprint arXiv:2310.04406 , year=

  38. [63]

    Advances in Neural Information Processing Systems , year=

    Large language models as commonsense knowledge for large-scale task planning , author=. Advances in Neural Information Processing Systems , year=

  39. [64]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Semtra: A semantic skill translator for cross-domain zero-shot policy adaptation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  40. [65]

    arXiv preprint arXiv:2303.00855 , year=

    Grounded decoding: Guiding text generation with grounded models for robot control , author=. arXiv preprint arXiv:2303.00855 , year=

  41. [66]

    arXiv preprint arXiv:2303.03378 , year=

    PaLM-E: An embodied multimodal language model , author=. arXiv preprint arXiv:2303.03378 , year=

  42. [67]

    Proceedings of the 37th Advances in Neural Information Processing Systems , year=

    Describe, explain, plan and select: Interactive planning with LLMs enables open-world multi-task agents , author=. Proceedings of the 37th Advances in Neural Information Processing Systems , year=

  43. [68]

    The Eleventh International Conference on Learning Representations , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  44. [69]

    arXiv preprint arXiv:2107.03374 , year=

    Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

  45. [70]

    arXiv preprint arXiv:2308.12950 , year=

    Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

  46. [71]

    arXiv preprint arXiv:2203.13474 , year=

    Codegen: An open large language model for code with multi-turn program synthesis , author=. arXiv preprint arXiv:2203.13474 , year=

  47. [72]

    5-coder technical report , author=

    Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

  48. [73]

    arXiv preprint arXiv:2406.11931 , year=

    Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence , author=. arXiv preprint arXiv:2406.11931 , year=

  49. [74]

    arXiv preprint arXiv:2401.14196 , year=

    DeepSeek-Coder: When the Large Language Model Meets Programming--The Rise of Code Intelligence , author=. arXiv preprint arXiv:2401.14196 , year=

  50. [75]

    arXiv preprint arXiv:2406.11409 , year=

    CodeGemma: Open Code Models Based on Gemma , author=. arXiv preprint arXiv:2406.11409 , year=

  51. [76]

    2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Code as Policies: Language Model Programs for Embodied Control , author=. 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2023 , organization=

  52. [77]

    Advances in Neural Information Processing Systems , year=

    Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought , author=. Advances in Neural Information Processing Systems , year=

  53. [78]

    arXiv preprint arXiv:2305.11176 , year=

    Instruct2act: Mapping multi-modality instructions to robotic actions with large language model , author=. arXiv preprint arXiv:2305.11176 , year=

  54. [79]

    Published by Microsoft , year=

    Chatgpt for robotics: Design principles and model abilities , author=. Published by Microsoft , year=

  55. [80]

    Proceedings of the 41st International Conference on Machine Learning (ICML) , pages =

    Yao Mu and Junting Chen and Qinglong Zhang and Shoufa Chen and Qiaojun Yu and Chongjian Ge and Runjian Chen and Zhixuan Liang and Mengkang Hu and Chaofan Tao and Peize Sun and Haibao Yu and Chao Yang and Wenqi Shao and Wenhai Wang and Jifeng Dai and Yu Qiao and Mingyu Ding and Ping Luo , title =. Proceedings of the 41st International Conference on Machine...

  56. [81]

    Proceedings of the 37th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

    Kaylee Burns and Ajinkya Jain and Keegan Go and Fei Xia and Michael Stark and Stefan Schaal and Karol Hausman , title =. Proceedings of the 37th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

  57. [82]

    arXiv preprint arXiv:2410.15154 , year =

    Yin Li and Liangwei Wang and Shiyuan Piao and Boo-Ho Yang and Ziyue Li and Wei Zeng and Fugee Tsung , title =. arXiv preprint arXiv:2410.15154 , year =

  58. [83]

    arXiv preprint arXiv:2307.05973 , year=

    Voxposer: Composable 3d value maps for robotic manipulation with language models , author=. arXiv preprint arXiv:2307.05973 , year=

  59. [84]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    How to prompt your robot: A promptbook for manipulation skills with code as policies , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  60. [85]

    2020 , eprint=

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. 2020 , eprint=

  61. [86]

    ACM Transactions on Software Engineering and Methodology , volume=

    Structured chain-of-thought prompting for code generation , author=. ACM Transactions on Software Engineering and Methodology , volume=. 2025 , publisher=

  62. [87]

    arXiv preprint arXiv:2310.15127 , year=

    Open-ended instructable embodied agents with memory-augmented large language models , author=. arXiv preprint arXiv:2310.15127 , year=

  63. [88]

    arXiv preprint arXiv:2404.19065 , year=

    Helper-x: A unified instructable embodied agent to tackle four interactive vision-language domains with memory-augmented language models , author=. arXiv preprint arXiv:2404.19065 , year=

  64. [89]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Lifelong robot library learning: Bootstrapping composable and generalizable skills for embodied control with language models , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  65. [90]

    arXiv preprint arXiv:2501.07278 , year=

    Lifelong learning of large language model based agents: A roadmap , author=. arXiv preprint arXiv:2501.07278 , year=

  66. [91]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Lotus: Continual imitation learning for robot manipulation through unsupervised skill discovery , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  67. [92]

    Advances in Neural Information Processing Systems , volume=

    Incremental learning of retrievable skills for efficient continual task adaptation , author=. Advances in Neural Information Processing Systems , volume=

  68. [93]

    arXiv preprint arXiv:2509.20612 , year=

    Policy Compatible Skill Incremental Learning via Lazy Learning Interface , author=. arXiv preprint arXiv:2509.20612 , year=

  69. [94]

    arXiv preprint arXiv:2310.05905 , year=

    Tail: Task-specific adapters for imitation learning with large pretrained models , author=. arXiv preprint arXiv:2310.05905 , year=

  70. [95]

    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

    Continual Training of Language Models for Few-Shot Learning , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

  71. [96]

    arXiv preprint arXiv:2305.16291 , year=

    Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

  72. [97]

    arXiv preprint arXiv:2310.10021 , year=

    Bootstrap your own skills: Learning to solve new tasks with large language model guidance , author=. arXiv preprint arXiv:2310.10021 , year=

  73. [98]

    Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

    Open-ended instructable embodied agents with memory-augmented large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

  74. [99]

    European Conference on Computer Vision , pages=

    See and think: Embodied agent in virtual environment , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  75. [100]

    2009 IEEE International Conference on Robotics and Automation , pages=

    Learning motor primitives for robotics , author=. 2009 IEEE International Conference on Robotics and Automation , pages=. 2009 , organization=

  76. [101]

    2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

    Learning and sequencing of object-centric manipulation skills for industrial tasks , author=. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2020 , organization=

  77. [102]

    Journal of machine learning research , volume=

    A review of robot learning for manipulation: Challenges, representations, and algorithms , author=. Journal of machine learning research , volume=

  78. [103]

    arXiv preprint arXiv:2107.14483 , year=

    Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations , author=. arXiv preprint arXiv:2107.14483 , year=

  79. [104]

    arXiv preprint arXiv:2204.01691 , year=

    Do as i can, not as i say: Grounding language in robotic affordances , author=. arXiv preprint arXiv:2204.01691 , year=

  80. [105]

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Skill induction and planning with latent language , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Showing first 80 references.