Recognition: unknown
ShapeGen: Robotic Data Generation for Category-Level Manipulation
Pith reviewed 2026-05-10 10:14 UTC · model grok-4.3
The pith
ShapeGen generates diverse manipulation demonstrations by training functional point mappings across 3D object shapes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShapeGen decomposes data generation into Shape Library curation, where spatial warpings are trained to map points to functionally corresponding locations across shapes, and Function-Aware Generation, which leverages established libraries to produce physically plausible and functionally correct novel demonstrations from minimal human annotation.
What carries the argument
The Shape Library, which stores 3D models together with trained spatial warpings that map functionally corresponding points between shapes.
If this is right
- Policies trained on ShapeGen data exhibit higher success rates on unseen shapes within the same category during real-world deployment.
- Large-scale shape-diversified datasets can be produced without simulators or exhaustive object collections.
- Each new demonstration set requires only minimal additional human annotation while preserving functional correctness.
- The generated demonstrations transfer directly to physical robots without further simulation-to-real adaptation.
Where Pith is reading between the lines
- The same Shape Library could be reused across multiple manipulation skills for one category, amortizing the initial warping training cost.
- Functional point mappings might support automatic transfer of annotations between categories that share abstract part structures.
- Combining the library with existing simulation pipelines could further increase data volume while retaining the 3D functional alignment.
Load-bearing premise
The trained spatial warpings reliably map points to functionally corresponding locations across shapes, and the resulting generated demonstrations remain physically plausible and functionally correct after only minimal human annotation.
What would settle it
A controlled real-world trial that measures success rate on novel shapes for policies trained with versus without ShapeGen-generated data; if the gap disappears, the claim fails.
Figures
read the original abstract
Manipulation policies deployed in uncontrolled real-world scenarios are faced with great in-category geometric diversity of everyday objects. In order to function robustly under such variations, policies need to work in a category-level manner, i.e. knowing how to interact with any object in a certain category, instead of only a specific one seen during training. This in-category generalizability is usually nurtured with shape-diversified training data; however, manually collecting such a corpus of data is infeasible due to the requirement of intense human labor and large collections of divergent objects at hand. In this paper, we propose ShapeGen, a data generation method that aims at generating shape-variated manipulation data in a simulator-free and 3D manner. ShapeGen decomposes the process into two stages: Shape Library curation and Function-Aware Generation. In the first stage, we train spatial warpings between shapes mapping points to points that correspond functionally, and aggregate 3D models along with the warpings into a plug-and-play Shape Library. In the second stage, we design a pipeline that, leveraging established Libraries, requires only minimal human annotation to generate physically plausible and functionally correct novel demonstrations. Experiments in the real world demonstrate the effectiveness of ShapeGen to boost policies' in-category shape generalizability. Project page: https://wangyr22.github.io/ShapeGen/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ShapeGen, a simulator-free method for generating shape-diverse manipulation demonstrations to improve category-level policy generalization. It consists of two stages: (1) curating a Shape Library by training spatial warpings that map functionally corresponding points across 3D models, and (2) a Function-Aware Generation pipeline that transfers demonstration trajectories using these warpings with only minimal human annotation to produce novel, plausible data. The central claim is that real-world experiments show this boosts policies' in-category shape generalizability.
Significance. If the generated demonstrations are shown to be physically plausible and functionally correct at scale, ShapeGen could meaningfully reduce the human labor and object-collection costs of building diverse training corpora for category-level manipulation, a persistent bottleneck in real-world robotics. The simulator-free, 3D warping approach is a distinctive technical choice that avoids simulation-to-real gaps, but its assessed significance is limited by the absence of supporting quantitative evidence.
major comments (3)
- [Abstract] Abstract: the claim that 'Experiments in the real world demonstrate the effectiveness of ShapeGen to boost policies' in-category shape generalizability' is unsupported by any quantitative metrics, baselines, success rates, or failure-case analysis. This is load-bearing for the central contribution and prevents evaluation of whether the method actually improves generalization.
- [§3 (Function-Aware Generation)] Function-Aware Generation pipeline (described in the abstract and §3): the assertion that warped trajectories remain 'physically plausible and functionally correct' after minimal human annotation lacks any quantitative validation, such as penetration-depth statistics, torque-limit checks, contact-constraint preservation metrics, or success-rate comparisons when replayed on target shapes. Without these, it is impossible to confirm that purely geometric warpings preserve dynamics and stability.
- [§4 (Experiments)] Experimental validation: no details are supplied on how physical plausibility or functional correctness of the generated demonstrations was verified (e.g., via motion capture, force sensing, or policy rollout success), nor are any ablation studies or comparisons to alternative data-generation methods presented. This directly undermines the real-world effectiveness claim.
minor comments (2)
- [§2 (Shape Library curation)] The description of how spatial warpings are trained and aggregated into the plug-and-play Shape Library would benefit from explicit notation for the warping function and any regularization terms used to enforce functional correspondence.
- Figure captions and the project-page reference could more clearly indicate which qualitative results correspond to the quantitative claims (once added) to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which identifies key areas where additional quantitative evidence can strengthen the presentation of our results. We address each major comment below and will revise the manuscript to incorporate the suggested details and metrics.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'Experiments in the real world demonstrate the effectiveness of ShapeGen to boost policies' in-category shape generalizability' is unsupported by any quantitative metrics, baselines, success rates, or failure-case analysis. This is load-bearing for the central contribution and prevents evaluation of whether the method actually improves generalization.
Authors: We agree that the abstract claim would be more convincing with explicit quantitative support. In the revised manuscript, we will update the abstract to reference specific success rates (e.g., policy performance on held-out shapes with and without ShapeGen data) and will add a results table plus failure-case analysis in Section 4 to enable direct evaluation of the generalization gains. revision: yes
-
Referee: [§3 (Function-Aware Generation)] Function-Aware Generation pipeline (described in the abstract and §3): the assertion that warped trajectories remain 'physically plausible and functionally correct' after minimal human annotation lacks any quantitative validation, such as penetration-depth statistics, torque-limit checks, contact-constraint preservation metrics, or success-rate comparisons when replayed on target shapes. Without these, it is impossible to confirm that purely geometric warpings preserve dynamics and stability.
Authors: We acknowledge that quantitative checks would better substantiate the plausibility claim. Although the minimal-annotation step allows targeted corrections for functional correctness, we will add supporting metrics in the revision, including average penetration-depth statistics obtained via post-generation collision checks, contact-constraint preservation rates, and success rates when the warped trajectories are replayed on the target objects. revision: yes
-
Referee: [§4 (Experiments)] Experimental validation: no details are supplied on how physical plausibility or functional correctness of the generated demonstrations was verified (e.g., via motion capture, force sensing, or policy rollout success), nor are any ablation studies or comparisons to alternative data-generation methods presented. This directly undermines the real-world effectiveness claim.
Authors: We will expand Section 4 to describe the verification procedure in detail: all generated demonstrations were executed on the physical robot, with success defined by task completion without object drops or functional failures. We will also include ablation studies isolating the contribution of ShapeGen data and, where data permits, comparisons against alternative generation approaches to address the lack of supporting details. revision: yes
Circularity Check
No circularity: pipeline relies on external training, annotation, and real-world validation
full rationale
The paper trains spatial warpings on 3D models to establish functional point correspondences, aggregates them into a Shape Library, and applies a pipeline with minimal human annotation to produce new demonstrations. Policy performance is then measured in separate real-world experiments. None of these steps reduce a claimed result to a fitted parameter or self-citation by construction; the final effectiveness claim is an external empirical outcome rather than a renaming or tautological reuse of the warping fit itself.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, et al. Cosmos-transfer1: Conditional world generation with adaptive multimodal control.arXiv preprint arXiv:2503.14492, 2025
-
[2]
World Simulation with Video Foundation Models for Physical AI
Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A vision- language-action flow model for general robot control. arXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review arXiv 2024
-
[4]
Agibot world colosseo: A large- scale manipulation platform for scalable and intelligent embodied systems
Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, et al. Agibot world colosseo: A large- scale manipulation platform for scalable and intelligent embodied systems. InIROS, 2025
2025
-
[5]
Rovi-aug: Robot and viewpoint augmentation for cross-embodiment robot learning
Lawrence Yunliang Chen, Chenfeng Xu, Karthik Dhar- marajan, Muhammad Zubair Irshad, Richard Cheng, Kurt Keutzer, Masayoshi Tomizuka, Quan Vuong, and Ken Goldberg. Rovi-aug: Robot and viewpoint augmentation for cross-embodiment robot learning. InCoRL, 2024
2024
-
[6]
SAM 3D: 3Dfy Anything in Images
Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, et al. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624, 2025
work page internal anchor Pith review arXiv 2025
-
[7]
Nod-tamp: Generalizable long-horizon planning with neural object descriptors
Shuo Cheng, Caelan Garrett, Ajay Mandlekar, and Dan- fei Xu. Nod-tamp: Generalizable long-horizon planning with neural object descriptors. InCoRL, 2024
2024
-
[8]
Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025
2025
-
[9]
Rebot: Scaling robot learning with real-to-sim-to-real robotic video synthesis
Yu Fang, Yue Yang, Xinghao Zhu, Kaiyuan Zheng, Gedas Bertasius, Daniel Szafir, and Mingyu Ding. Rebot: Scaling robot learning with real-to-sim-to-real robotic video synthesis. InIROS, 2025
2025
-
[10]
Xingyi He, Adhitya Polavaram, Yunhao Cao, Om Desh- mukh, Tianrui Wang, Xiaowei Zhou, and Kuan Fang. Generate, transfer, adapt: Learning functional dexterous grasping from a single human demonstration.arXiv preprint arXiv:2601.05243, 2026
-
[11]
Data scaling laws in imitation learning for robotic manipulation
Yingdong Hu, Fanqi Lin, Pingyue Sheng, Chuan Wen, Jiacheng You, and Yang Gao. Data scaling laws in imitation learning for robotic manipulation. InICLR, 2025
2025
-
[12]
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. π0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Robo-abc: Affordance generalization beyond categories via semantic correspon- dence for robot manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang, Gu Zhang, Mingrun Jiang, and Huazhe Xu. Robo-abc: Affordance generalization beyond categories via semantic correspon- dence for robot manipulation. InECCV, 2024
2024
-
[14]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139– 1, 2023
2023
-
[15]
Droid: A large-scale in-the-wild robot manipulation dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ash- win Balakrishna, Sudeep Dasari, Siddharth Karam- cheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. In RSS, 2024
2024
-
[16]
Openvla: An open-source vision-language-action model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model. InCoRL, 2024
2024
-
[17]
Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation
Yuxuan Kuang, Junjie Ye, Haoran Geng, Jiageng Mao, Congyue Deng, Leonidas Guibas, He Wang, and Yue Wang. Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation. InCoRL, 2024
2024
-
[18]
Constraint-preserving data gen- eration for visuomotor policy learning
Kevin Lin, Varun Ragunath, Andrew McAlinden, Aa- ditya Prasad, Jimmy Wu, Yuke Zhu, and Jeannette Bohg. Constraint-preserving data generation for visuomotor policy learning.arXiv preprint arXiv:2508.03944, 2025
-
[19]
Robotransfer: Controllable geometry- consistent video diffusion for manipulation policy transfer,
Liu Liu, Xiaofeng Wang, Guosheng Zhao, Keyu Li, Wenkang Qin, Jiaxiong Qiu, Zheng Zhu, Guan Huang, and Zhizhong Su. Robotransfer: Geometry-consistent video diffusion for robotic visual policy transfer.arXiv preprint arXiv:2505.23171, 2025
-
[20]
Geometry-aware 4d video generation for robot manipulation.CoRR, abs/2507.01099, 2025
Zeyi Liu, Shuang Li, Eric Cousineau, Siyuan Feng, Ben- jamin Burchfiel, and Shuran Song. Geometry-aware 4d video generation for robot manipulation.arXiv preprint arXiv:2507.01099, 2025
-
[21]
Mimicgen: A data generation system for scalable robot learning using human demonstrations
Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Ireti- ayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In CoRL, 2023
2023
-
[22]
Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99– 106, 2021
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99– 106, 2021
2021
-
[23]
Instant neural graphics primitives with a multiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022
2022
-
[24]
Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0
Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Ab- hishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. InICRA, 2024
2024
-
[25]
Deepsdf: Learning continuous signed distance functions for shape represen- tation
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape represen- tation. InCVPR, 2019
2019
-
[26]
arXiv preprint arXiv:2510.07313 (2025)
Zezhong Qian, Xiaowei Chi, Yuming Li, Shizun Wang, Zhiyuan Qin, Xiaozhu Ju, Sirui Han, and Shanghang Zhang. Wristworld: Generating wrist-views via 4d world models for robotic manipulation.arXiv preprint arXiv:2510.07313, 2025
-
[27]
Sam 2: Segment anything in images and videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Ro- man R ¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. InICLR, 2025
2025
-
[28]
What matters in learning from large-scale datasets for robot manipula- tion
Vaibhav Saxena, Matthew Bronars, Nadun Ranawaka Arachchige, Kuancheng Wang, Woo Chul Shin, Soroush Nasiriany, Ajay Mandlekar, and Danfei Xu. What matters in learning from large-scale datasets for robot manipula- tion. InICLR, 2025
2025
-
[29]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
Neural descriptor fields: Se (3)-equivariant object representations for manipula- tion
Anthony Simeonov, Yilun Du, Andrea Tagliasac- chi, Joshua B Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, and Vincent Sitzmann. Neural descriptor fields: Se (3)-equivariant object representations for manipula- tion. InICRA, 2022
2022
-
[31]
Mimicfunc: Imitating tool manipulation from a single human video via functional correspondence,
Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, and Hong Zhang. Mimicfunc: Imitating tool manipulation from a single human video via functional correspondence.arXiv preprint arXiv:2508.13534, 2025
-
[32]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InCVPR, 2025
2025
-
[33]
Sparsedff: Sparse-view feature distillation for one-shot dexterous manipulation
Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, and Leonidas Guibas. Sparsedff: Sparse-view feature distillation for one-shot dexterous manipulation. InICLR, 2024
2024
-
[34]
D 3 fields: Dynamic 3d descriptor fields for zero-shot generalizable rearrangement
Yixuan Wang, Mingtong Zhang, Zhuoran Li, Tarik Ke- lestemur, Katherine Driggs-Campbell, Jiajun Wu, Li Fei- Fei, and Yunzhu Li. D 3 fields: Dynamic 3d descriptor fields for zero-shot generalizable rearrangement. In CoRL, 2024
2024
-
[35]
Foundationpose: Unified 6d pose estimation and tracking of novel objects
Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InCVPR, 2024
2024
-
[36]
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Xiuwei Xu, Angyuan Ma, Hankun Li, Bingyao Yu, Zheng Zhu, Jie Zhou, and Jiwen Lu. R2rgen: Real-to-real 3d data generation for spatially generalized manipulation. arXiv preprint arXiv:2510.08547, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Demogen: Synthetic demonstration generation for data-efficient vi- suomotor policy learning
Zhengrong Xue, Shuying Deng, Zhenyang Chen, Yixuan Wang, Zhecheng Yuan, and Huazhe Xu. Demogen: Synthetic demonstration generation for data-efficient vi- suomotor policy learning. InRSS, 2025
2025
-
[38]
arXiv preprint arXiv:2509.01819 (2025) 3, 9, 10, 12, 13, 21
Ge Yan, Jiyue Zhu, Yuquan Deng, Shiqi Yang, Ri-Zhao Qiu, Xuxin Cheng, Marius Memmel, Ranjay Krishna, Ankit Goyal, Xiaolong Wang, et al. Maniflow: A general robot manipulation policy via consistency flow training. arXiv preprint arXiv:2509.01819, 2025
-
[39]
Novel demonstration generation with gaussian splatting enables robust one-shot manipulation,
Sizhe Yang, Wenye Yu, Jia Zeng, Jun Lv, Kerui Ren, Cewu Lu, Dahua Lin, and Jiangmiao Pang. Novel demonstration generation with gaussian splatting en- ables robust one-shot manipulation.arXiv preprint arXiv:2504.13175, 2025
-
[40]
Real2render2real: Scaling robot data without dynamics simulation or robot hardware
Justin Yu, Letian Fu, Huang Huang, Karim El- Refai, Rares Andrei Ambrus, Richard Cheng, Muham- mad Zubair Irshad, and Ken Goldberg. Real2render2real: Scaling robot data without dynamics simulation or robot hardware. InCoRL, 2025
2025
-
[41]
Scaling robot learning with semantically imagined experience
Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan, Su Wang, Jaspiar Singh, Clayton Tan, Jodilyn Peralta, Brian Ichter, et al. Scaling robot learning with semantically imagined experience. InRSS, 2023
2023
-
[42]
Roboengine: Plug-and-play robot data augmentation with semantic robot segmenta- tion and background generation
Chengbo Yuan, Suraj Joshi, Shaoting Zhu, Hang Su, Hang Zhao, and Yang Gao. Roboengine: Plug-and-play robot data augmentation with semantic robot segmenta- tion and background generation. InIROS, 2025
2025
-
[43]
Generalizable humanoid manipulation with improved 3d diffusion policies.arXiv preprint arXiv:410.10803, 2024
Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, and Jiajun Wu. Generalizable humanoid manipulation with improved 3d diffusion policies.arXiv preprint arXiv:410.10803, 2024
2024
-
[44]
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations.arXiv preprint arXiv:2403.03954, 2024
work page internal anchor Pith review arXiv 2024
-
[45]
Real2edit2real: Generating robotic demonstrations via a 3d control interface
Yujie Zhao, Hongwei Fan, Di Chen, Shengcong Chen, Liliang Chen, Xiaoqi Li, Guanghui Ren, and Hao Dong. Real2edit2real: Generating robotic demonstrations via a 3d control interface. InCVPR, 2026
2026
-
[46]
Deep implicit templates for 3d shape representation
Zerong Zheng, Tao Yu, Qionghai Dai, and Yebin Liu. Deep implicit templates for 3d shape representation. In CVPR, 2021
2021
-
[47]
Robodreamer: Learning compositional world models for robot imagination
Siyuan Zhou, Yilun Du, Jiaben Chen, Yandong Li, Dit- Yan Yeung, and Chuang Gan. Robodreamer: Learning compositional world models for robot imagination. In ICML, 2024
2024
-
[48]
objects":{ aaaa
Junzhe Zhu, Yuanchen Ju, Junyi Zhang, Muhan Wang, Zhecheng Yuan, Kaizhe Hu, and Huazhe Xu. Dense- matcher: Learning 3d semantic correspondence for category-level manipulation from a single demo. In ICLR, 2025. APPENDIXA IMPLEMENTATIONDETAILS A. Warping Training We adopt the LSTM network architecture used in DIT [46] with an additional residual connection ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.