Recognition: 3 theorem links
· Lean TheoremTAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning
Pith reviewed 2026-05-11 02:59 UTC · model grok-4.3
The pith
TAVIS benchmark shows active vision improves imitation learning in a task-dependent manner while imitation alone produces anticipatory gaze matching human timing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TAVIS establishes that active-vision generally helps imitation learning for manipulation but benefits are task-conditional rather than uniform, that multi-task policies degrade sharply under controlled distribution shifts on both suites, and that imitation alone yields anticipatory gaze with median lead times comparable to the human teleoperator reference.
What carries the argument
TAVIS benchmark infrastructure, consisting of the paired headcam-vs-fixedcam protocol on identical demonstrations, the GALT (Gaze-Action Lead Time) metric, and procedural ID/OOD splits applied to the TAVIS-Head and TAVIS-Hands task suites.
If this is right
- Active vision provides performance gains that depend on task type rather than applying uniformly across manipulation settings.
- Multi-task imitation policies experience sharp degradation when encountering controlled distribution shifts in active-vision conditions.
- Imitation training from demonstrations alone is sufficient to produce anticipatory gaze whose timing matches human teleoperator references.
- The paired protocol and GALT metric together allow direct quantification of how much active vision contributes on each task.
Where Pith is reading between the lines
- If gaze anticipation emerges from imitation, then policies may implicitly learn useful viewpoint prediction as part of action forecasting in embodied settings.
- The task-conditional nature of benefits suggests that future policies could incorporate mechanisms to decide dynamically whether to move the camera.
- Testing the same primitives on physical hardware rather than simulation would reveal whether latency and sensor noise alter the observed advantages.
- Adding tasks that require longer-horizon planning could show whether anticipatory gaze scales beyond the current short-horizon manipulation suites.
Load-bearing premise
The selected tasks, embodiments, and distribution shifts in TAVIS-Head and TAVIS-Hands sufficiently represent the real challenges and benefits of active vision in imitation learning for manipulation.
What would settle it
Running the same baselines on a new set of manipulation tasks outside TAVIS and finding either uniform benefits or no benefits at all from active vision would falsify the task-conditional claim.
Figures
read the original abstract
Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare approaches or quantify what active vision contributes, on which task types, and under what conditions. We introduce TAVIS, evaluation infrastructure for active-vision imitation learning, with two complementary task suites -- TAVIS-Head (5 tasks, global search via pan/tilt necks) and TAVIS-Hands (3 tasks, local occlusion via wrist cameras) -- on two humanoid torso embodiments (GR1T2, Reachy2), built on IsaacLab. TAVIS provides three evaluation primitives: a paired headcam-vs-fixedcam protocol on identical demonstrations; GALT (Gaze-Action Lead Time), a novel metric grounded in cognitive science and HRI that quantifies anticipatory gaze in learned policies; and procedural ID/OOD splits. Baseline experiments with Diffusion Policy and $\pi_0$ reveal that (i) active-vision generally helps, but benefits are task-conditional rather than uniform; (ii) multi-task policies degrade sharply under controlled distribution shifts on both suites; and (iii) imitation alone yields anticipatory gaze, with median lead times comparable to the human teleoperator reference. Code, evaluation scripts, demonstrations (LeRobot v3.0; ~2200 episodes) and trained baselines are released at https://github.com/spiglerg/tavis and https://huggingface.co/tavis-benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TAVIS, a benchmark for egocentric active vision and anticipatory gaze in imitation learning. It features two task suites: TAVIS-Head (5 tasks with global search via pan/tilt necks) and TAVIS-Hands (3 tasks with local occlusion via wrist cameras) on GR1T2 and Reachy2 humanoids in IsaacLab. The benchmark includes a paired headcam-vs-fixedcam protocol on identical demonstrations, the novel GALT metric for quantifying anticipatory gaze lead times, and procedural ID/OOD splits. Baseline experiments using Diffusion Policy and π0 show that active vision provides task-conditional benefits, multi-task policies degrade under distribution shifts, and imitation learning produces anticipatory gaze with median lead times similar to human teleoperators. Code, data, and models are released.
Significance. If the results hold, TAVIS offers a much-needed standardized evaluation platform for active-vision approaches in imitation learning, filling a gap in the field. The open release of ~2200 episodes, evaluation scripts, and baselines promotes reproducibility and comparison. The GALT metric, grounded in cognitive science and HRI, provides a new way to measure anticipatory behavior. The findings on task-conditional benefits and multi-task degradation highlight important considerations for policy design. The limited task set means broader significance depends on representativeness of these scenarios.
major comments (2)
- The claim that active-vision 'generally helps' is based on experiments with 8 tasks. This may overstate the generality given the specific embodiments and procedural splits; the task-conditional benefits are interesting but their broader implications require more qualification in the abstract.
- Limited details are provided on the number of runs, statistical tests, and data exclusion rules supporting the three findings. This weakens the strength of the empirical claims and should be expanded for reproducibility and confidence in the results.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for minor revision. We address the major comments below and have updated the manuscript to improve clarity and reproducibility.
read point-by-point responses
-
Referee: The claim that active-vision 'generally helps' is based on experiments with 8 tasks. This may overstate the generality given the specific embodiments and procedural splits; the task-conditional benefits are interesting but their broader implications require more qualification in the abstract.
Authors: We agree that the abstract should more explicitly qualify the generality of the findings. Although the original text already notes that benefits are 'task-conditional rather than uniform', we have revised the abstract to state: 'active vision provides task-conditional benefits to imitation learning' and removed the 'generally helps' phrasing to avoid any overstatement. We have also added a qualification in the introduction and discussion sections emphasizing that these results are based on the specific 8 tasks, two embodiments, and procedural splits, and that broader implications would require further validation. The task-dependent nature remains the key insight supported by the data. revision: yes
-
Referee: Limited details are provided on the number of runs, statistical tests, and data exclusion rules supporting the three findings. This weakens the strength of the empirical claims and should be expanded for reproducibility and confidence in the results.
Authors: We thank the referee for pointing this out. The original manuscript did not include sufficient experimental details. We have now expanded the 'Experiments' section and added a dedicated 'Reproducibility' subsection detailing: (1) all results are averaged over 5 independent runs with different random seeds; (2) statistical comparisons between headcam and fixedcam use paired t-tests with p < 0.05 for significance; (3) no episodes were excluded from the analysis—all ~2200 demonstrations were utilized. These details are provided to support the three main findings and enhance confidence in the results. revision: yes
Circularity Check
No circularity; empirical benchmark with direct comparisons
full rationale
This is an empirical benchmark paper introducing TAVIS task suites, paired headcam-vs-fixedcam protocols, the GALT metric, and procedural ID/OOD splits, followed by baseline experiments on Diffusion Policy and π0. No derivations, equations, fitted parameters, or predictions appear in the provided text or abstract; all claims rest on released code, data (~2200 episodes), and direct experimental measurements rather than any self-definitional, fitted-input, or self-citation reduction. The central observations (task-conditional benefits, multi-task degradation, and human-comparable lead times) are presented as outcomes of those comparisons, with no load-bearing step that reduces by construction to the paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The five TAVIS-Head and three TAVIS-Hands tasks, along with the chosen distribution shifts, represent meaningful and generalizable challenges for egocentric active vision in manipulation.
invented entities (1)
-
GALT (Gaze-Action Lead Time) metric
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GALT (Gaze-Action Lead Time), a novel metric grounded in cognitive science and HRI that quantifies anticipatory gaze in learned policies
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Baseline experiments with Diffusion Policy and π0 reveal that (i) active-vision generally helps, but benefits are task-conditional rather than uniform
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
procedural ID/OOD splits
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cheng, Xuxin and Li, Jialong and Yang, Shiqi and Yang, Ge and Wang, Xiaolong , booktitle=. 2025 , organization=
work page 2025
-
[2]
2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=
Active vision might be all you need: Exploring active vision in bimanual robotic manipulation , author=. 2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2025 , organization=
work page 2025
-
[3]
Conference on Robot Learning , pages=
Vision in Action: Learning Active Perception from Human Demonstrations , author=. Conference on Robot Learning , pages=. 2025 , organization=
work page 2025
-
[4]
Eye, Robot: Learning to Look to Act with a
Kerr, Justin and Hari, Kush and Weber, Ethan and Kim, Chung Min and Yi, Brent and Bonnen, Tyler and Goldberg, Ken and Kanazawa, Angjoo , booktitle=. Eye, Robot: Learning to Look to Act with a. 2025 , organization=
work page 2025
-
[5]
Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers , author=. arXiv e-prints , pages=
-
[6]
Yu, Justin and Shentu, Yide and Wu, Di and Abbeel, Pieter and Goldberg, Ken and Wu, Philipp , journal=
-
[8]
Liu, Yushan and Mu, Shilong and Chao, Xintao and Li, Zizhen and Mu, Yao and Chen, Tianxing and Li, Shoujie and Lyu, Chuqiao and Zhang, Xiao-ping and Ding, Wenbo , journal=
-
[9]
Hong Kong Journal of Occupational Therapy , volume=
Temporal differences in eye--hand coordination between children and adults during manual action on objects , author=. Hong Kong Journal of Occupational Therapy , volume=. 2018 , publisher=
work page 2018
-
[10]
Saccadic eye movements in a high-speed bimanual stacking task: Changes of attentional control during learning and automatization , author=. Journal of vision , volume=. 2011 , publisher=
work page 2011
-
[11]
Journal of neuroscience , volume=
Eye--hand coordination in object manipulation , author=. Journal of neuroscience , volume=. 2001 , publisher=
work page 2001
-
[12]
The roles of vision and eye movements in the control of activities of daily living , author=. Perception , volume=. 1999 , publisher=
work page 1999
-
[13]
The 23rd IEEE International Symposium on robot and human interactive communication , pages=
Legible robot pointing , author=. The 23rd IEEE International Symposium on robot and human interactive communication , pages=. 2014 , organization=
work page 2014
-
[14]
2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI) , pages=
Legibility and predictability of robot motion , author=. 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI) , pages=. 2013 , organization=
work page 2013
-
[15]
Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction , pages=
Deliberate delays during robot-to-human handovers improve compliance with gaze communication , author=. Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction , pages=
work page 2014
-
[16]
Biological cybernetics , volume=
Learning robotic eye--arm--hand coordination from human demonstration: a coupled dynamical systems approach , author=. Biological cybernetics , volume=. 2014 , publisher=
work page 2014
-
[17]
Spatiotemporal characteristics of eye-hand coordination among different skill levels in laparoscopic surgery , author=. Surgical Endoscopy , volume=. 2026 , publisher=
work page 2026
-
[18]
Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach , author=. arXiv e-prints , pages=
-
[19]
Robots can be perceived as goal-oriented agents , author=. Interaction Studies , volume=. 2013 , publisher=
work page 2013
-
[20]
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
-
[21]
Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter , journal=
-
[22]
Zhou, Xueyang and Xu, Yangming and Tie, Guiyao and Chen, Yongchao and Zhang, Guowen and Chu, Duanfeng and Zhou, Pan and Sun, Lichao , journal=
-
[23]
Fei, Senyu and Wang, Siyin and Shi, Junhao and Dai, Zihao and Cai, Jikun and Qian, Pengfang and Ji, Li and He, Xinzhe and Zhang, Shiduo and Fei, Zhaoye and others , journal=
-
[24]
Distracted Robot: How Visual Clutter Undermine Robotic Manipulation , author=. arXiv e-prints , pages=
-
[25]
2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) , pages=
Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task , author=. 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) , pages=. 2024 , organization=
work page 2024
-
[26]
Proceedings of Robotics: Science and Systems (RSS) , year=
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. Proceedings of Robotics: Science and Systems (RSS) , year=
-
[27]
Proceedings of Robotics: Science and Systems (RSS) , year=
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware , author=. Proceedings of Robotics: Science and Systems (RSS) , year=
-
[30]
Isaac Lab Arena: Composable Environment Creation and Policy Evaluation for Robotics , author =. 2025 , url =
work page 2025
-
[31]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , author=. arXiv:2009.12293 , year=
work page internal anchor Pith review arXiv 2009
-
[32]
Communications of the ACM , volume=
Datasheets for datasets , author=. Communications of the ACM , volume=. 2021 , publisher=
work page 2021
-
[33]
Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas , title =
-
[34]
Mandlekar, Ajay and Nasiriany, Soroush and Wen, Bowen and Akinola, Iretiayo and Narang, Yashraj and Fan, Linxi and Zhu, Yuke and Fox, Dieter , journal=
-
[35]
Jiang, Zhenyu and Xie, Yuqi and Lin, Kevin and Xu, Zhenjia and Wan, Weikang and Mandlekar, Ajay and Fan, Linxi Jim and Zhu, Yuke , booktitle=. 2025 , organization=
work page 2025
-
[36]
Proceedings of the IEEE , volume=
Active perception , author=. Proceedings of the IEEE , volume=. 1988 , publisher=
work page 1988
-
[37]
International journal of computer vision , volume=
Active vision , author=. International journal of computer vision , volume=. 1988 , publisher=
work page 1988
-
[38]
Artificial intelligence , volume=
Animate vision , author=. Artificial intelligence , volume=. 1991 , publisher=
work page 1991
-
[39]
Revisiting active perception , author=. Autonomous Robots , volume=. 2018 , publisher=
work page 2018
-
[40]
James, Stephen and Ma, Zicong and Arrojo, David Rovick and Davison, Andrew J , journal=. 2020 , publisher=
work page 2020
-
[41]
Mees, Oier and Hermann, Lukas and Rosete-Beas, Erick and Burgard, Wolfram , journal=. 2022 , publisher=
work page 2022
-
[42]
Nasiriany, Soroush and Maddukuri, Abhiram and Zhang, Lance and Parikh, Adeet and Lo, Aaron and Joshi, Abhishek and Mandlekar, Ajay and Zhu, Yuke , year=
-
[43]
Walke, Homer Rich and Black, Kevin and Zhao, Tony Z and Vuong, Quan and Zheng, Chongyi and Hansen-Estruch, Philippe and He, Andre Wang and Myers, Vivek and Kim, Moo Jin and Du, Max and others , booktitle=. 2023 , organization=
work page 2023
-
[45]
Deliberate delays during robot-to-human handovers improve compliance with gaze communication
Henny Admoni, Anca Dragan, Siddhartha S Srinivasa, and Brian Scassellati. Deliberate delays during robot-to-human handovers improve compliance with gaze communication. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, pages 49--56, 2014
work page 2014
-
[46]
John Aloimonos, Isaac Weiss, and Amit Bandyopadhyay. Active vision. International journal of computer vision, 1 0 (4): 0 333--356, 1988
work page 1988
-
[47]
Ruzena Bajcsy. Active perception. Proceedings of the IEEE, 76 0 (8): 0 966--1005, 1988
work page 1988
-
[48]
Dana H Ballard. Animate vision. Artificial intelligence, 48 0 (1): 0 57--86, 1991
work page 1991
-
[49]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky. _0 : A vis...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
Lerobot: State-of-the-art machine learning for real-world robotics in pytorch
Remi Cadene, Simon Alibert, Alexander Soare, Quentin Gallouedec, Adil Zouitine, Steven Palma, Pepijn Kooijmans, Michel Aractingi, Mustafa Shukor, Dana Aubakirova, Martino Russi, Francesco Capuano, Caroline Pascal, Jade Choghari, Jess Moss, and Thomas Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch. https://github.com/hu...
work page 2024
-
[51]
Open-TeleVision : Teleoperation with immersive active visual feedback
Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, and Xiaolong Wang. Open-TeleVision : Teleoperation with immersive active visual feedback. In Conference on Robot Learning, pages 2729--2749. PMLR, 2025
work page 2025
-
[52]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023
work page 2023
-
[53]
Active vision might be all you need: Exploring active vision in bimanual robotic manipulation
Ian Chuang, Andrew Lee, Dechen Gao, M-Mahdi Naddaf-Sh, and Iman Soltani. Active vision might be all you need: Exploring active vision in bimanual robotic manipulation. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 7952--7959. IEEE, 2025 a
work page 2025
-
[54]
Ian Chuang, Andrew Lee, Dechen Gao, Jinyu Zou, and Iman Soltani. Look, focus, act: Efficient and robust robot learning via human gaze and foveated vision transformers. arXiv e-prints, pages arXiv--2507, 2025 b
work page 2025
-
[55]
Bosong Ding, Murat Kirtay, and Giacomo Spigler. Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task. In 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), pages 645--652. IEEE, 2024
work page 2024
-
[56]
Legibility and predictability of robot motion
Anca D Dragan, Kenton CT Lee, and Siddhartha S Srinivasa. Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 301--308. IEEE, 2013
work page 2013
-
[57]
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. LIBERO-Plus : In-depth robustness analysis of Vision-Language-Action models. arXiv preprint arXiv:2510.13626, 2025
work page internal anchor Pith review arXiv 2025
-
[58]
Rebecca M Foerster, Elena Carbone, Hendrik Koesling, and Werner X Schneider. Saccadic eye movements in a high-speed bimanual stacking task: Changes of attentional control during learning and automatization. Journal of vision, 11 0 (7): 0 9--9, 2011
work page 2011
-
[59]
Robocerebra: A large-scale benchmark for long-horizon robotic manipulation evaluation
Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, and Si Liu. Robocerebra: A large-scale benchmark for long-horizon robotic manipulation evaluation. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025
work page 2025
-
[60]
Prime and reach: Synthesising body motion for gaze-primed object reach
Masashi Hatano, Saptarshi Sinha, Jacob Chalk, Wei-Hong Li, Hideo Saito, and Dima Damen. Prime and reach: Synthesising body motion for gaze-primed object reach. arXiv e-prints, pages arXiv--2512, 2025
work page 2025
-
[61]
Yuxin He, Ruihao Zhang, Tianao Shen, Cheng Liu, and Qiang Nie. Towards exploratory and focused manipulation with bimanual active perception: A new problem, benchmark and strategy. arXiv preprint arXiv:2602.01939, 2026
-
[62]
Rachel M Holladay, Anca D Dragan, and Siddhartha S Srinivasa. Legible robot pointing. In The 23rd IEEE International Symposium on robot and human interactive communication, pages 217--223. IEEE, 2014
work page 2014
-
[63]
RLBench : The robot learning benchmark & learning environment
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. RLBench : The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5 0 (2): 0 3019--3026, 2020
work page 2020
-
[64]
Roland S Johansson, G \"o ran Westling, Anders B \"a ckstr \"o m, and J Randall Flanagan. Eye--hand coordination in object manipulation. Journal of neuroscience, 21 0 (17): 0 6917--6932, 2001
work page 2001
-
[65]
Eye, robot: Learning to look to act with a BC-RL perception-action loop
Justin Kerr, Kush Hari, Ethan Weber, Chung Min Kim, Brent Yi, Tyler Bonnen, Ken Goldberg, and Angjoo Kanazawa. Eye, robot: Learning to look to act with a BC-RL perception-action loop. In Conference on Robot Learning, pages 3647--3664. PMLR, 2025
work page 2025
-
[66]
Hye Jin Kim, Cho Hee Lee, and Eun Young Kim. Temporal differences in eye--hand coordination between children and adults during manual action on objects. Hong Kong Journal of Occupational Therapy, 31 0 (2): 0 106--114, 2018
work page 2018
-
[67]
The roles of vision and eye movements in the control of activities of daily living
Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living. Perception, 28 0 (11): 0 1311--1328, 1999
work page 1999
-
[68]
LIBERO : Benchmarking knowledge transfer for lifelong robot learning
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. LIBERO : Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems, 36: 0 44776--44791, 2023
work page 2023
-
[69]
Yushan Liu, Shilong Mu, Xintao Chao, Zizhen Li, Yao Mu, Tianxing Chen, Shoujie Li, Chuqiao Lyu, Xiao-ping Zhang, and Wenbo Ding. AVR : Active vision-driven robotic precision manipulation with viewpoint and focal length optimization. arXiv e-prints, pages arXiv--2503, 2025
work page 2025
-
[70]
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CALVIN : A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7 0 (3): 0 7327--7334, 2022
work page 2022
-
[71]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M. G...
work page internal anchor Pith review arXiv 2025
-
[72]
RoboCasa : Large-scale simulation of household tasks for generalist robots
Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. RoboCasa : Large-scale simulation of household tasks for generalist robots. In Robotics: Science and Systems Foundation, 2024
work page 2024
-
[73]
Isaac lab arena: Composable environment creation and policy evaluation for robotics, 2025
NVIDIA Isaac Lab Arena Contributors . Isaac lab arena: Composable environment creation and policy evaluation for robotics, 2025. URL https://github.com/isaac-sim/IsaacLab-Arena
work page 2025
-
[74]
Tactile mnist: Benchmarking active tactile perception
Tim Schneider, Guillaume Duret, Cristiana de Farias, Roberto Calandra, Liming Chen, and Jan Peters. Tactile mnist: Benchmarking active tactile perception. arXiv preprint arXiv:2506.06361, 2025
-
[75]
Robots can be perceived as goal-oriented agents
Alessandra Sciutti, Ambra Bisio, Francesco Nori, Giorgio Metta, Luciano Fadiga, and Giulio Sandini. Robots can be perceived as goal-oriented agents. Interaction Studies, 14 0 (3): 0 329--350, 2013
work page 2013
-
[76]
Vision in action: Learning active perception from human demonstrations
Haoyu Xiong, Xiaomeng Xu, Jimmy Wu, Yifan Hou, Jeannette Bohg, and Shuran Song. Vision in action: Learning active perception from human demonstrations. In Conference on Robot Learning, pages 5450--5463. PMLR, 2025
work page 2025
-
[77]
EgoMI : Learning active vision and whole-body manipulation from egocentric human demonstrations
Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, and Philipp Wu. EgoMI : Learning active vision and whole-body manipulation from egocentric human demonstrations. arXiv e-prints, pages arXiv--2511, 2025
work page 2025
-
[78]
LIBERO-PRO : Towards robust and fair evaluation of Vision-Language-Action models beyond memorization
Xueyang Zhou, Yangming Xu, Guiyao Tie, Yongchao Chen, Guowen Zhang, Duanfeng Chu, Pan Zhou, and Lichao Sun. LIBERO-PRO : Towards robust and fair evaluation of Vision-Language-Action models beyond memorization. arXiv e-prints, pages arXiv--2510, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.