Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations
Pith reviewed 2026-05-22 05:27 UTC · model grok-4.3
The pith
Learning invariant symbolic rewards from few demonstrations enables zero-shot generalization across visual changes in robot manipulation tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that invariant symbolic reward functions can be learned from as few as five demonstrations by shifting focus to task-level properties that remain constant across visual instantiations. This is realized through two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations. Experiments show stronger process alignment and policy rollout ranking on eight Meta-World tasks and three Franka tasks, faster downstream learning, and zero-shot transfer in three real-world out-of-distribution tests.
What carries the argument
The structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, coupled with a hybrid symbolic-numerical procedure to distill invariants from demonstrations.
If this is right
- The method produces stronger process alignment and better policy rollout ranking than baselines on eight Meta-World tasks and three Franka manipulation tasks.
- Downstream policy learning is accelerated when using the learned reward.
- A single reward transfers zero-shot to new positions, viewpoints, and objects in real-world experiments without retraining or online interaction.
Where Pith is reading between the lines
- The same invariants could let a robot reuse one reward model for a family of related manipulation problems that differ only in surface appearance.
- Extending the structural constraints to include additional physical rules might handle tasks with deformable objects or partial observability.
- If the distillation procedure scales, it could reduce reliance on hand-crafted rewards when deploying robots in unstructured environments.
Load-bearing premise
Task-level properties and the ranking of optimal policies stay the same even when object instances, positions, and viewpoints change substantially.
What would settle it
A demonstration that the learned reward ranks unsuccessful policies higher than successful ones or fails to produce working rollouts under new object, position, or viewpoint conditions beyond the three tested real-world variations.
Figures
read the original abstract
Designing reward functions that generalize beyond controlled laboratory settings remains a fundamental challenge in reinforcement learning for robotics. In open-world manipulation problems, a single task can appear in numerous variants through different object instances, positions, and camera viewpoints. Recent vision-based reward models tend to memorize specific pixel distributions and fail to generalize beyond their training conditions. To address this, we propose a framework that learns invariant symbolic reward functions from as few as five demonstrations. The insight is to shift from visual feature-fitting to the discovery of behavioral invariants: task-level properties that remain constant across diverse visual instantiations. The framework has two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations without online interaction. Experiments on eight Meta-World tasks and three Franka manipulation tasks demonstrate that our method achieves stronger process alignment and policy rollout ranking abilities compared to baselines, accelerating downstream policy learning. Three real-world out-of-distribution experiments further show that the same learned reward generalizes zero-shot to position, viewpoint, and object variations, enabling a single reward representation to be reused across diverse task variants in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a framework to learn invariant symbolic reward functions from as few as five demonstrations for robotic manipulation. It introduces two coupled components: a structural reward formulation encoding task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical distillation procedure that extracts these invariants without online interaction. Experiments on eight Meta-World tasks and three Franka tasks show improved process alignment and rollout ranking over baselines, with three real-world OOD tests demonstrating zero-shot generalization to position, viewpoint, and object variations.
Significance. If the invariance properties and generalization results hold under scrutiny, the work could meaningfully advance reward learning in robotics by moving beyond pixel-memorization approaches, enabling reusable rewards across visual variants and reducing demonstration requirements for policy learning in open-world settings.
major comments (2)
- [§3.2] §3.2 (Structural Reward Formulation): The claim that the structural reward encodes task-level properties while preserving optimal policy invariance across visual instantiations (camera parameters, object geometry) is central to the zero-shot OOD transfer results. No derivation, invariance proof, or counterexample analysis is provided to establish that the chosen form remains invariant or optimality-preserving under these changes; if invariance fails for even one variant, the reported real-world generalization does not follow from the construction.
- [§5.3] §5.3 (Real-World OOD Experiments): The three real-world experiments report zero-shot transfer, but without access to full error analysis, variance across trials, or explicit checks that the structural form was not post-hoc adjusted to the test variants, it is difficult to confirm that the results support the invariance claim rather than task-specific fitting.
minor comments (2)
- [Abstract and §5] The metrics 'process alignment' and 'policy rollout ranking' are referenced in the abstract and results but lack a concise definition or pointer to their exact computation in the main text; adding this would improve readability.
- [Figures 4-6] Figure captions for the real-world setups could more explicitly label the variations (position, viewpoint, object) tested in each OOD case to make the generalization evidence easier to parse.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and outline the revisions we will make to strengthen the presentation of the invariance properties and experimental details.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Structural Reward Formulation): The claim that the structural reward encodes task-level properties while preserving optimal policy invariance across visual instantiations (camera parameters, object geometry) is central to the zero-shot OOD transfer results. No derivation, invariance proof, or counterexample analysis is provided to establish that the chosen form remains invariant or optimality-preserving under these changes; if invariance fails for even one variant, the reported real-world generalization does not follow from the construction.
Authors: We agree that an explicit derivation would clarify the central claim. The structural reward is defined over a symbolic state space consisting of task predicates (e.g., contact relations, relative goal distances) that are invariant to camera intrinsics, extrinsics, and object geometry by construction; the numerical component is used only for grounding and does not alter the symbolic structure. Because the reward depends solely on these invariant predicates, any policy that is optimal with respect to the original task remains optimal under visual transformations that preserve the symbolic state. We will add a concise derivation of this invariance property together with a short counterexample analysis in the revised Section 3.2. revision: yes
-
Referee: [§5.3] §5.3 (Real-World OOD Experiments): The three real-world experiments report zero-shot transfer, but without access to full error analysis, variance across trials, or explicit checks that the structural form was not post-hoc adjusted to the test variants, it is difficult to confirm that the results support the invariance claim rather than task-specific fitting.
Authors: We acknowledge that additional statistical reporting and clarification on experimental procedure would increase confidence in the results. The real-world trials were performed with a fixed structural reward form determined exclusively from the five training demonstrations; no post-hoc modification occurred. We will expand Section 5.3 to include the complete per-variant success rates, standard deviations across repeated trials, and an explicit statement confirming that the symbolic structure was not adjusted after seeing the OOD test outcomes. revision: yes
Circularity Check
No load-bearing circularity; invariance stated as property of formulation without reduction to inputs
full rationale
The visible abstract and context describe a structural reward formulation that 'encodes task-level strategies and physical constraints while preserving optimal policy invariance' and a hybrid procedure that distills invariants from five demonstrations. No equations, fitted parameters, or self-citations are shown that would reduce a claimed prediction or zero-shot generalization to a fitted input or self-definition by construction. Experimental results on Meta-World, Franka, and real-world OOD cases function as independent benchmarks rather than tautological outputs. This matches the default expectation of no significant circularity for a high-level framework description.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Task-level properties remain constant across diverse visual instantiations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt a Potential-Based Reward Shaping (PBRS) formulation... the final reward signal is derived via a PBRS-style post-processing module, which mathematically guarantees optimal policy invariance.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Vision-language models are zero-shot re- ward models for reinforcement learning,
J. Rocamonde, V . Montesinos, E. Nava, E. Perez, and D. Lindner, “Vision-language models are zero-shot re- ward models for reinforcement learning,” inProc. Int. Conf. Learn. Representations (ICLR), 2023
work page 2023
-
[2]
Vision language models are in-context value learners,
Y . J. Ma, J. Hejna, C. Fu, D. Shah, J. Liang, Z. Xu, S. Kirmani, P. Xu, D. Driess, T. Xiaoet al., “Vision language models are in-context value learners,” inProc. Int. Conf. Learn. Representations (ICLR), 2024
work page 2024
-
[3]
Subtask-aware visual reward learning from segmented demonstrations,
C. Kim, M. Heo, D. Lee, H. Lee, J. Shin, J. J. Lim, and K. Lee, “Subtask-aware visual reward learning from segmented demonstrations,” inProc. Int. Conf. Learn. Representations (ICLR), 2025
work page 2025
-
[4]
RoboClip: One demon- stration is enough to learn robot policies,
S. Sontakke, J. Zhang, S. Arnold, K. Pertsch, E. Bıyık, D. Sadigh, C. Finn, and L. Itti, “RoboClip: One demon- stration is enough to learn robot policies,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 55 681–55 693, 2023
work page 2023
-
[5]
K.-H. Hung, P.-C. Lo, J.-F. Yeh, H.-Y . Hsu, Y .-T. Chen, and W. H. Hsu, “VICtoR: Learning hierarchical vision- instruction correlation rewards for long-horizon manipu- lation,”arXiv:2405.16545, 2024
-
[6]
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own
W. Ye, Y . Zhang, H. Weng, X. Gu, S. Wang, T. Zhang, M. Wang, P. Abbeel, and Y . Gao, “Reinforcement learn- ing with foundation priors: Let the embodied agent efficiently learn on its own,”arXiv:2310.02635, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Progressor: A perceptually guided reward estimator with self-supervised online refinement,
T. W. Ayalew, X. Zhang, K. Y . Wu, T. Jiang, M. Maire, and M. R. Walter, “Progressor: A perceptually guided reward estimator with self-supervised online refinement,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 10 297–10 306
work page 2025
-
[8]
TOPRe- ward: Token probabilities as hidden zero-shot rewards for robotics,
S. Chen, C. Harrison, Y .-C. Lee, A. J. Yang, Z. Ren, L. J. Ratliff, J. Duan, D. Fox, and R. Krishna, “TOPRe- ward: Token probabilities as hidden zero-shot rewards for robotics,”arXiv:2602.19313, 2026
-
[9]
Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
A. Liang, Y . Korkmaz, J. Zhang, M. Hwang, A. An- war, S. Kaushik, A. Shah, A. S. Huang, L. Zettle- moyer, D. Foxet al., “RoboMeter: Scaling general- purpose robotic reward models via trajectory compar- isons,”arXiv:2603.02115, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[10]
RoboReward: General-purpose vision- language reward models for robotics,
T. Lee, A. Wagenmaker, K. Pertsch, P. Liang, S. Levine, and C. Finn, “RoboReward: General-purpose vision- language reward models for robotics,”arXiv:2601.00675, 2026
-
[11]
LIV: Language-image representations and rewards for robotic control,
Y . J. Ma, V . Kumar, A. Zhang, O. Bastani, and D. Jayara- man, “LIV: Language-image representations and rewards for robotic control,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, 2023, pp. 23 301–23 320
work page 2023
-
[12]
Video-language critic: Transferable reward functions for language- conditioned robotics,
M. Alakuijala, R. McLean, I. Woungang, N. Farsad, S. Kaski, P. Marttinen, and K. Yuan, “Video-language critic: Transferable reward functions for language- conditioned robotics,”Transactions on Machine Learning Research (TMLR), 2025
work page 2025
-
[13]
ReWiND: Language-guided rewards teach robot policies without new demonstrations,
J. Zhang, Y . Luo, A. Anwar, S. A. Sontakke, J. J. Lim, J. Thomason, E. Biyik, and J. Zhang, “ReWiND: Language-guided rewards teach robot policies without new demonstrations,”arXiv:2505.10911, 2025
-
[14]
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Q. Chen, J. Yu, M. Schwager, P. Abbeel, Y . Shentu, and P. Wu, “SARM: Stage-aware reward modeling for long horizon robot manipulation,”arXiv:2509.25358, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
Eureka: Human-Level Reward Design via Coding Large Language Models
Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, L. Fan, and A. Anandkumar, “Eureka: Human-level reward design via coding large language models,”arXiv:2310.12931, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Text2Reward: Reward shaping with language models for reinforcement learning,
T. Xie, S. Zhao, C. H. Wu, Y . Liu, Q. Luo, V . Zhong, Y . Yang, and T. Yu, “Text2Reward: Reward shaping with language models for reinforcement learning,” inProc. Int. Conf. Learn. Representations (ICLR), 2023
work page 2023
-
[17]
Algorithms for inverse rein- forcement learning
A. Y . Ng and S. Russell, “Algorithms for inverse rein- forcement learning.” inProc. Int. Conf. Mach. Learn. 12 (ICML), 2000
work page 2000
-
[18]
A survey of inverse reinforce- ment learning: Challenges, methods and progress,
S. Arora and P. Doshi, “A survey of inverse reinforce- ment learning: Challenges, methods and progress,”Artif. Intell., vol. 297, p. 103500, 2021
work page 2021
-
[19]
Apprenticeship learning via inverse reinforcement learning,
P. Abbeel and A. Y . Ng, “Apprenticeship learning via inverse reinforcement learning,” inProc. Int. Conf. Mach. Learn. (ICML), 2004, p. 1
work page 2004
-
[20]
Maximum entropy inverse reinforcement learning
B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Deyet al., “Maximum entropy inverse reinforcement learning.” in Proc. AAAI Conf. Artif. Intell, vol. 8, 2008, pp. 1433– 1438
work page 2008
-
[21]
Guided cost learning: Deep inverse optimal control via policy optimization,
C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in Proc. Int. Conf. Mach. Learn. (ICML). PMLR, 2016, pp. 49–58
work page 2016
-
[22]
Few-shot preference learning for human-in-the-loop RL,
D. J. Hejna III and D. Sadigh, “Few-shot preference learning for human-in-the-loop RL,” inProc. Conf. Robot Learn. (CoRL). PMLR, 2023, pp. 2014–2025
work page 2023
-
[23]
Deep reinforcement learning from hu- man preferences,
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from hu- man preferences,”Adv. Neural Inf. Process. Syst., vol. 30, 2017
work page 2017
-
[24]
Active preference-based learning of reward functions,
D. Sadigh, A. D. Dragan, S. Sastry, and S. A. Seshia, “Active preference-based learning of reward functions,” inProc. Robot.: Sci. Syst. (RSS), 2017
work page 2017
-
[25]
RL-VLM-F: Reinforcement learn- ing from vision language foundation model feedback,
Y . Wang, Z. Sun, J. Zhang, Z. Xian, E. Biyik, D. Held, and Z. Erickson, “RL-VLM-F: Reinforcement learn- ing from vision language foundation model feedback,” arXiv:2402.03681, 2024
-
[26]
Real-world offline reinforce- ment learning from vision language model feedback,
S. Venkataraman, Y . Wang, Z. Wang, N. S. Ravie, Z. Erickson, and D. Held, “Real-world offline reinforce- ment learning from vision language model feedback,” arXiv:2411.05273, 2024
-
[27]
Preference VLM: Leveraging VLMs for scalable preference-based reinforcement learn- ing,
U. Ghosh, D. S. Raychaudhuri, J. Li, K. Karydis, and A. Roy-Chowdhury, “Preference VLM: Leveraging VLMs for scalable preference-based reinforcement learn- ing,”arXiv:2502.01616, 2025
-
[28]
Language instructed reinforce- ment learning for human-Ai coordination,
H. Hu and D. Sadigh, “Language instructed reinforce- ment learning for human-Ai coordination,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, 2023, pp. 13 584– 13 598
work page 2023
-
[29]
Language to rewards for robotic skill synthesis,
W. Yu, N. Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.-T. L. Chiang, T. Erez, L. Hasenclever, J. Humpliket al., “Language to rewards for robotic skill synthesis,”arXiv:2306.08647, 2023
-
[30]
DrEureka: Language model guided sim-to-real transfer,
J. Ma, W. Liang, H.-J. Wang, Y . Zhu, L. Fan, O. Bastani, and D. Jayaraman, “DrEureka: Language model guided sim-to-real transfer,” inProc. Robot.: Sci. Syst. (RSS), 2024
work page 2024
-
[31]
Video2reward: Generating reward function from videos for legged robot behavior learning,
R. Zeng, D. Zhou, Q. Liang, J. Liu, H. Li, C. Huang, J. Li, X. Hu, and F. Sun, “Video2reward: Generating reward function from videos for legged robot behavior learning,”arXiv:2412.05515, 2024
-
[32]
Any-point Trajectory Modeling for Policy Learning
C. Wen, X. Lin, J. So, K. Chen, Q. Dou, Y . Gao, and P. Abbeel, “Any-point trajectory modeling for policy learning,”arXiv:2401.00025, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[33]
Flow as the cross-domain manipulation interface,
M. Xu, Z. Xu, Y . Xu, C. Chi, G. Wetzstein, M. Veloso, and S. Song, “Flow as the cross-domain manipulation interface,” inProc. Conf. Robot Learn. (CoRL), 2024
work page 2024
-
[34]
General flow as foundation affordance for scalable robot learning,
C. Yuan, C. Wen, T. Zhang, and Y . Gao, “General flow as foundation affordance for scalable robot learning,” arXiv:2401.11439, 2024
-
[35]
3DFlowAction: Learning cross- embodiment manipulation from 3d flow world model,
H. Zhi, P. Chen, S. Zhou, Y . Dong, Q. Wu, L. Han, and M. Tan, “3DFlowAction: Learning cross- embodiment manipulation from 3d flow world model,” arXiv:2506.06199, 2025
-
[36]
A0: An affordance-aware hierarchical model for general robotic manipulation,
R. Xu, J. Zhang, M. Guo, Y . Wen, H. Yang, M. Lin, J. Huang, Z. Li, K. Zhang, L. Wanget al., “A0: An affordance-aware hierarchical model for general robotic manipulation,”arXiv:2504.12636, 2025
-
[37]
HuDOR: Bridging the human to robot dexterity gap through object-oriented rewards,
I. Guzey, Y . Dai, G. Savva, R. Bhirangi, and L. Pinto, “HuDOR: Bridging the human to robot dexterity gap through object-oriented rewards,” inWorkshop on Con- tinual Robot Learning from Humans, 2024
work page 2024
-
[38]
K. Yu, S. Zhang, H. Soora, F. Huang, H. Huang, P. Tokekar, and R. Gao, “GenFlowRL: Generative object- centric flow matching for reward shaping in visual re- inforcement learning,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025
work page 2025
-
[39]
Policy invariance under reward transformations: Theory and application to reward shaping,
A. Y . Ng, D. Harada, and S. J. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inProc. 16th Int. Conf. Mach. Learn. (ICML), 1999, pp. 278–287
work page 1999
-
[40]
Rapidly adapting policies to the real world via simulation-guided fine- tuning,
P. Yin, T. Westenbroek, S. Bagaria, K. Huang, C.-a. Cheng, A. Kobolov, and A. Gupta, “Rapidly adapting policies to the real world via simulation-guided fine- tuning,”arXiv:2502.02705, 2025
-
[41]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhuet al., “Grounding DINO: Marrying dino with grounded pre-training for open-set object detection,”arXiv:2303.05499, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
DINOv2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “DINOv2: Learning robust visual features without supervision,”Trans. Mach. Learn. Res., 2024
work page 2024
-
[43]
TAPIP3D: Tracking any point in persistent 3d geom- etry,
B. Zhang, L. Ke, A. W. Harley, and K. Fragkiadaki, “TAPIP3D: Tracking any point in persistent 3d geom- etry,”arXiv:2504.14717, 2025
-
[44]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Alt- man, S. Anadkatet al., “GPT-4 technical report,” arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
A Tutorial on Bayesian Optimization
P. I. Frazier, “A tutorial on bayesian optimization,” arXiv:1807.02811, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
On bayesian upper confidence bounds for bandit problems,
E. Kaufmann, O. Cappe, and A. Garivier, “On bayesian upper confidence bounds for bandit problems,” inProc. Int. Conf. Artif. Intell. Stat. (AISTATS), ser. Proceedings of Machine Learning Research, N. D. Lawrence and M. Girolami, Eds., vol. 22. La Palma, Canary Islands: PMLR, 21–23 Apr 2012, pp. 592–600
work page 2012
-
[47]
Meta-World: A benchmark and evalua- tion for multi-task and meta reinforcement learning,
T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-World: A benchmark and evalua- tion for multi-task and meta reinforcement learning,” in Proc. Conf. Robot Learn. (CoRL). PMLR, 2020, pp. 1094–1100. 13
work page 2020
-
[48]
The EPIC-Kitchens dataset: Col- lection, challenges and baselines,
D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Per- rett, W. Priceet al., “The EPIC-Kitchens dataset: Col- lection, challenges and baselines,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4125–4141, 2020
work page 2020
-
[49]
Open X-Embodiment: Robotic learning datasets and RT-X models,
A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jainet al., “Open X-Embodiment: Robotic learning datasets and RT-X models,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA)). IEEE, 2024, pp. 6892–6903
work page 2024
-
[50]
Efficient online reinforcement learning with offline data,
P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient online reinforcement learning with offline data,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, 2023, pp. 1577–1594
work page 2023
-
[51]
The Franka Emika robot: A reference platform for robotics research and education,
S. Haddadin, S. Parusel, L. Johannsmeier, S. Golz, S. Gabl, F. Walch, M. Sabaghian, C. J ¨ahne, L. Haus- perger, and S. Haddadin, “The Franka Emika robot: A reference platform for robotics research and education,” IEEE Robot. Autom. Mag, vol. 29, no. 2, pp. 46–64, 2022
work page 2022
-
[52]
Cotracker3: Simpler and better point tracking by pseudo-labelling real videos,
N. Karaev, Y . Makarov, J. Wang, N. Neverova, A. Vedaldi, and C. Rupprecht, “Cotracker3: Simpler and better point tracking by pseudo-labelling real videos,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 6013–6022
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.