Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
Pith reviewed 2026-05-13 19:51 UTC · model grok-4.3
The pith
A shared multi-task multi-domain robot dataset doubles success rates for new tasks in new environments when added to just 50 demonstrations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By collecting a large multi-domain multi-task dataset with 7200 demonstrations of 71 tasks across 10 environments, the authors demonstrate that jointly training with this dataset plus 50 demonstrations of a never-before-seen task in a new domain leads to a 2x improvement in success rate compared to using target domain data alone. Data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains.
What carries the argument
The Bridge Data collection, which supplies cross-task and cross-domain demonstrations so that end-to-end policies trained on it generalize to unseen tasks and environments.
If this is right
- Robots can acquire new skills with far less per-project data collection.
- A small amount of data from a new environment allows reuse of many previously learned skills in that environment.
- Shared datasets become a practical way to bootstrap learning instead of starting from scratch each time.
- Generalization improves without exhaustive data collection in every new setting.
Where Pith is reading between the lines
- Growing the dataset with additional domains would likely further reduce the number of demonstrations needed for new tasks.
- The same bridging approach could extend to different robot hardware or sensor suites.
- If the dataset continues to expand, reliance on simulation for initial training may decrease.
Load-bearing premise
The collected tasks and domains are representative enough that cross-domain data produces positive transfer rather than interference for arbitrary new tasks and environments.
What would settle it
A new task and new domain in which adding the Bridge Data to the 50 target demonstrations lowers success rate below the level achieved with the 50 demonstrations alone.
read the original abstract
Robot learning holds the promise of learning policies that generalize broadly. However, such generalization requires sufficiently diverse datasets of the task of interest, which can be prohibitively expensive to collect. In other fields, such as computer vision, it is common to utilize shared, reusable datasets, such as ImageNet, to overcome this challenge, but this has proven difficult in robotics. In this paper, we ask: what would it take to enable practical data reuse in robotics for end-to-end skill learning? We hypothesize that the key is to use datasets with multiple tasks and multiple domains, such that a new user that wants to train their robot to perform a new task in a new domain can include this dataset in their training process and benefit from cross-task and cross-domain generalization. To evaluate this hypothesis, we collect a large multi-domain and multi-task dataset, with 7,200 demonstrations constituting 71 tasks across 10 environments, and empirically study how this data can improve the learning of new tasks in new environments. We find that jointly training with the proposed dataset and 50 demonstrations of a never-before-seen task in a new domain on average leads to a 2x improvement in success rate compared to using target domain data alone. We also find that data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains. These results suggest that reusing diverse multi-task and multi-domain datasets, including our open-source dataset, may pave the way for broader robot generalization, eliminating the need to re-collect data for each new robot learning project.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Bridge Data, a multi-domain multi-task robotic dataset of 7,200 demonstrations spanning 71 tasks across 10 environments. Its central empirical claim is that jointly training on this dataset together with 50 demonstrations of a previously unseen task in a new domain produces an average 2x improvement in success rate relative to training on the 50 target-domain demonstrations alone; it further reports that limited data in a new domain can enable a robot to perform tasks previously observed only in other domains.
Significance. If the reported gains are robust, the work supplies concrete evidence that large-scale, reusable cross-domain datasets can materially reduce per-task data collection costs in robot learning, mirroring the role of ImageNet-style resources in vision. The open release of the dataset itself constitutes a reusable asset for the community.
major comments (2)
- [Experimental Evaluation] Experimental section: the manuscript reports an average 2x success-rate gain but supplies insufficient detail on training procedures, baseline implementations, number of independent runs per condition, observed variance, and whether statistical tests were used to establish significance of the improvement over the target-only baseline. These omissions make it difficult to rule out post-hoc selection effects or implementation differences.
- [§5] §5 (held-out evaluation): all reported test tasks are drawn from the same overall collection protocol and visual regimes as the training environments. This limits the strength of the claim that the dataset produces positive transfer for arbitrary new domains; the current results do not yet demonstrate robustness to substantial changes in lighting, object appearance, robot kinematics, or task structure outside the 10 environments.
minor comments (2)
- [Abstract] Abstract: the phrase 'on average leads to a 2x improvement' should be accompanied by the precise mean and a measure of spread (standard deviation or range) across the evaluated tasks.
- [Dataset Description] Dataset description: the selection criteria for the 10 environments and 71 tasks should be stated more explicitly so readers can assess how representative they are of typical manipulation scenarios.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive recommendation for minor revision. We address each major comment below and will revise the manuscript to improve experimental transparency and clarify the scope of our claims.
read point-by-point responses
-
Referee: [Experimental Evaluation] Experimental section: the manuscript reports an average 2x success-rate gain but supplies insufficient detail on training procedures, baseline implementations, number of independent runs per condition, observed variance, and whether statistical tests were used to establish significance of the improvement over the target-only baseline. These omissions make it difficult to rule out post-hoc selection effects or implementation differences.
Authors: We agree that additional experimental details are required for reproducibility and to strengthen confidence in the results. In the revised manuscript we will expand the experimental section to provide: a full description of training procedures including all hyperparameters, network architectures, and optimization settings; explicit implementation details for each baseline; the number of independent runs per condition (five runs were performed); observed variance reported as standard deviations; and results from statistical significance tests (paired t-tests) confirming the 2x improvement over the target-only baseline. These additions will directly address concerns about implementation differences and selection effects. revision: yes
-
Referee: [§5] §5 (held-out evaluation): all reported test tasks are drawn from the same overall collection protocol and visual regimes as the training environments. This limits the strength of the claim that the dataset produces positive transfer for arbitrary new domains; the current results do not yet demonstrate robustness to substantial changes in lighting, object appearance, robot kinematics, or task structure outside the 10 environments.
Authors: We acknowledge that the held-out tasks share the same overall collection protocol and visual regimes as the training environments. While the ten environments already include meaningful diversity in settings, objects, and lighting, the results do not demonstrate robustness to arbitrary new domains involving major shifts such as different robot kinematics or extreme lighting changes outside the collected data. In the revision we will update §5 and the discussion to more precisely scope our claims to positive transfer across the diversity present in Bridge Data, while explicitly noting this limitation for broader generalization. This clarification will better contextualize the empirical findings. revision: partial
Circularity Check
No circularity: empirical success rates are measured outcomes, not reductions to fitted inputs
full rationale
The paper collects a multi-task multi-domain dataset of 7200 demonstrations and reports measured success rates on held-out tasks when training with the dataset plus 50 target demos. These results are direct experimental measurements rather than predictions derived from equations or parameters fitted inside the work. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain; the central claim rests on independent robot trials whose outcomes are not tautological with the data collection protocol.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard policy learning algorithms can effectively utilize demonstrations from multiple tasks and domains without negative interference.
Forward citations
Cited by 48 Pith papers
-
From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation
MoLA infers a mixture of latent actions from generated future videos via modality-aware inverse dynamics models to improve robot manipulation policies.
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 delivers an open VLA model with new specialized components, datasets, and techniques that outperforms baselines on benchmarks while releasing all weights, code, and data for real-world robot use.
-
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination
BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
-
Large Video Planner Enables Generalizable Robot Control
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
-
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation
RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.
-
RoboDreamer: Learning Compositional World Models for Robot Imagination
RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.
-
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
VIP learns a visual embedding from human videos whose distance defines dense, smooth rewards for arbitrary goal-image robot tasks without task-specific fine-tuning.
-
Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning
Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.
-
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.
-
BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation
BEACON uses discrepancy-aware importance reweighting to co-train generative robot policies from abundant source and limited target demonstrations, yielding better robustness and implicit feature alignment.
-
BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation
BEACON uses discrepancy-aware importance reweighting to jointly train diffusion-based robot policies and source sample weights, improving performance over target-only and fixed-ratio baselines in cross-domain manipula...
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture chang...
-
Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.
-
Ego-InBetween: Generating Object State Transitions in Ego-Centric Videos
EgoIn uses a fine-tuned vision-language model to infer transition steps and a conditioning module plus auxiliary supervision to generate coherent egocentric video sequences of object state changes.
-
PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation
PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.
-
IGen: Scalable Data Generation for Robot Learning from Open-World Images
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
-
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-wor...
-
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
DreamVLA uses dynamic-region-guided world knowledge prediction, block-wise attention to disentangle information types, and a diffusion transformer for actions, reaching 76.7% success on real robot tasks and 4.44 avera...
-
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
-
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
SmolVLA is a small efficient VLA model that achieves performance comparable to 10x larger models while training on one GPU and deploying on consumer hardware via community data and chunked asynchronous action prediction.
-
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
-
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π_{0.5} is a VLA model that achieves long-horizon dexterous manipulation in entirely new homes through co-training on heterogeneous tasks and multi-source data including web and semantic predictions.
-
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.
-
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
-
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
CogACT is a new VLA model that uses a conditioned diffusion action transformer to achieve over 35% higher average success rates than OpenVLA in simulation and 55% in real-robot experiments while generalizing to new ro...
-
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.
-
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA achieves 16.5% higher task success than the 55B RT-2-X model across 29 tasks with 7x fewer parameters while enabling effective fine-tuning and quantization without performance loss.
-
Octo: An Open-Source Generalist Robot Policy
Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
-
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
A low-cost whole-body teleoperation system enables effective imitation learning for complex bimanual mobile manipulation by co-training on mobile and static demonstration datasets.
-
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.
-
Scaling Robot Learning with Semantically Imagined Experience
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
-
R3M: A Universal Visual Representation for Robot Manipulation
A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks fr...
-
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
-
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment
Cortex 2.0 introduces world-model-based planning that generates and scores future trajectories to outperform reactive vision-language-action baselines on industrial robotic tasks including pick-and-place, sorting, and...
-
StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement
StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict act...
-
ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning
ReFineVLA adds teacher-generated reasoning steps to VLA training and reports state-of-the-art success rates on SimplerEnv WidowX and Google Robot benchmarks.
-
Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping
A rectified flow model trained on 30 actuation-space demonstrations produces control sequences that yield 97.5% grasp success across the workspace, with generalization to object size changes of ±33% and execution spee...
-
GR-3 Technical Report
GR-3 is a VLA model that generalizes to novel objects, environments, and abstract instructions, outperforms the π0 baseline, and integrates with the new ByteMini bi-manual mobile robot.
-
A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
Multi-task pretraining of diffusion policies on diverse robot data produces more successful, robust, and data-efficient policies for dexterous manipulation than single-task baselines, with performance scaling with pre...
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
NORA is a compact 3B-parameter VLA model trained on 970k robot demonstrations that outperforms larger VLA models in embodied tasks while using significantly less computational resources.
-
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
MimicGen creates over 50K robot demonstrations from roughly 200 human ones, allowing imitation learning to achieve strong performance on complex long-horizon tasks like assembly and coffee preparation.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Reference graph
Works this paper leans on
-
[1]
Imagenet classifica- tion with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- tion with deep convolutional neural networks,” Advances in neural information processing systems , vol. 25, pp. 1097–1105, 2012
work page 2012
-
[2]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Conference on Computer Vision and Pattern Recognition , 2009
work page 2009
-
[4]
Gradient surgery for multi-task learning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” arXiv preprint arXiv:2001.06782, 2020
-
[5]
Mt-opt: Continuous multi-task robotic reinforcement learning at scale,
D. Kalashnikov, J. Varley, Y . Chebotar, B. Swanson, R. Jon- schkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv:2104.08212, 2021
-
[6]
RoboNet: Large-Scale Multi-Robot Learning
S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” arXiv preprint arXiv:1910.11215 , 2019
work page internal anchor Pith review arXiv 1910
-
[7]
One-shot visual imitation learning via meta-learning,
C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Conference on Robot Learning. PMLR, 2017, pp. 357–368
work page 2017
-
[8]
Y . Duan, M. Andrychowicz, B. C. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba, “One-shot imitation learn- ing,” arXiv preprint arXiv:1703.07326 , 2017
work page Pith review arXiv 2017
-
[9]
Generative Adversarial Imitation Learning
J. Ho and S. Ermon, “Generative adversarial imitation learning,” arXiv preprint arXiv:1606.03476, 2016
work page Pith review arXiv 2016
-
[10]
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, and S. Levine, “One-shot imitation from observing humans via domain-adaptive meta-learning,” arXiv preprint arXiv:1802.01557 , 2018
work page Pith review arXiv 2018
-
[11]
Imitation from ob- servation: Learning to imitate behaviors from raw video via context translation,
Y . Liu, A. Gupta, P. Abbeel, and S. Levine, “Imitation from ob- servation: Learning to imitate behaviors from raw video via context translation,” in International Conference on Robotics and Automation (ICRA), 2018
work page 2018
-
[12]
Time-contrastive networks: Self-supervised learning from video,
P. Sermanet, C. Lynch, Y . Chebotar, J. Hsu, E. Jang, S. Schaal, S. Levine, and G. Brain, “Time-contrastive networks: Self-supervised learning from video,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 1134–1141
work page 2018
-
[13]
Human-centered collaborative robots with deep reinforcement learn- ing,
A. Ghadirzadeh, X. Chen, W. Yin, Z. Yi, M. Bjorkman, and D. Kragic, “Human-centered collaborative robots with deep reinforcement learn- ing,” IEEE Robotics and Automation Letters , 2020
work page 2020
-
[14]
Model-based visual planning with self-supervised func- tional distances,
S. Tian, S. Nair, F. Ebert, S. Dasari, B. Eysenbach, C. Finn, and S. Levine, “Model-based visual planning with self-supervised func- tional distances,” arXiv preprint arXiv:2012.15373 , 2020
-
[15]
Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,
T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 5628–5635
work page 2018
-
[16]
Multiple interactions made easy (mime): Large scale demonstrations data for imitation,
P. Sharma, L. Mohan, L. Pinto, and A. Gupta, “Multiple interactions made easy (mime): Large scale demonstrations data for imitation,” in Conference on robot learning . PMLR, 2018, pp. 906–915
work page 2018
-
[17]
Roboturk: A crowdsourcing platform for robotic skill learning through imitation,
A. Mandlekar, Y . Zhu, A. Garg, J. Booher, M. Spero, A. Tung, J. Gao, J. Emmons, A. Gupta, E. Orbay, S. Savarese, and L. Fei- Fei, “Roboturk: A crowdsourcing platform for robotic skill learning through imitation,” in Conference on Robot Learning , 2018
work page 2018
-
[18]
A. Mandlekar, J. Booher, M. Spero, A. Tung, A. Gupta, Y . Zhu, A. Garg, S. Savarese, and L. Fei-Fei, “Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity,” arXiv:1911.04052, 2019
-
[19]
Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,
L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” in international conference on robotics and automation (ICRA) . IEEE, 2016
work page 2016
-
[20]
Deep visual foresight for planning robot motion,
C. Finn and S. Levine, “Deep visual foresight for planning robot motion,” in 2017 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2017, pp. 2786–2793
work page 2017
-
[21]
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018
work page 2018
-
[22]
Scalable deep reinforcement learning for vision-based robotic manipulation,
D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke,et al., “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning . PMLR, 2018, pp. 651–673
work page 2018
-
[23]
Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control
F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine, “Visual foresight: Model-based deep reinforcement learning for vision-based robotic control,” arXiv preprint arXiv:1812.00568 , 2018
work page Pith review arXiv 2018
-
[24]
Tossing- bot: Learning to throw arbitrary objects with residual physics,
A. Zeng, S. Song, J. Lee, A. Rodriguez, and T. Funkhouser, “Tossing- bot: Learning to throw arbitrary objects with residual physics,” IEEE Transactions on Robotics , vol. 36, no. 4, pp. 1307–1319, 2020
work page 2020
-
[25]
S. Young, D. Gandhi, S. Tulsiani, A. Gupta, P. Abbeel, and L. Pinto, “Visual imitation made easy,” arXiv e-prints , pp. arXiv–2008, 2020
work page 2008
-
[26]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conference on Computer Vision and Pattern Recognition, 2016
work page 2016
-
[27]
Deep spatial autoencoders for visuomotor learning,
C. Finn, X. Y . Tan, Y . Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep spatial autoencoders for visuomotor learning,” in International Conference on Robotics and Automation (ICRA) , 2016
work page 2016
-
[28]
End-to-end training of deep visuomotor policies,
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.