hub

arXiv preprint arXiv:1805.01954 , year=

Behavioral cloning from observation , author= · 2018 · cs.AI · arXiv 1805.01954

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

open full Pith review browse 19 citing papers arXiv PDF

abstract

Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

Demo-JEPA enables one-shot cross-embodiment imitation by mapping visual demonstrations to shared latent future trajectories that serve as subgoals for the target agent's own forward dynamics planning.

ASH: Agents that Self-Hone via Embodied Learning

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

HITL-D: Human In The Loop Diffusion Assisted Shared Control

cs.RO · 2026-05-20 · unverdicted · novelty 6.0

HITL-D combines diffusion policies with human input for shared robotic control, reducing required joystick axes and improving speed and workload in manipulation tasks per a 12-participant study.

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

TimeRewarder derives step-wise progress rewards from frame-wise temporal distances in passive videos and uses them to guide RL, achieving high success rates on Meta-World tasks with fewer interactions than prior methods or hand-designed rewards.

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

cs.CV · 2025-03-13 · unverdicted · novelty 6.0

HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

cs.AI · 2024-08-01 · conditional · novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.

On the Role of Language Representations in Auto-Bidding: Findings and Implications

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

SemBid injects LLM-encoded Task, History, and Strategy semantics as tokens into offline bidding trajectories and uses self-attention to outperform numerical-only baselines in performance, constraint satisfaction, and robustness.

Learning from Demonstration with Failure Awareness for Safe Robot Navigation

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

A framework decouples failure data for value estimation and success data for policy learning in offline RL to reduce collisions in robot navigation while maintaining success rates.

Hybrid Adaptive Tuning for Tiered Memory Systems

cs.OS · 2026-04-14 · unverdicted · novelty 6.0

PTMT is a lightweight framework that automates parameter tuning for memory tiering via hybrid offline database building and online customized reinforcement learning, delivering 14-30% gains over defaults and 32% over prior art on four systems.

Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

cs.RO · 2026-04-06 · unverdicted · novelty 6.0

Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.

Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

cs.RO · 2026-05-20 · unverdicted · novelty 5.0

PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

cs.LG · 2026-02-11 · unverdicted · novelty 5.0

UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

cs.RO · 2025-10-10 · unverdicted · novelty 5.0

Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real robotic arm than behavioral cloning.

Treatment, evidence, imitation, and chat

stat.OT · 2025-06-29 · unverdicted · novelty 4.0

LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

cs.RO · 2024-12-27 · unverdicted · novelty 4.0

A centralized HRL planner with HTAN, multi-stage curricula, and counterfactual baseline scales multi-robot task planning to 200 robots and 1000 racks on unlearned maps in RMFS.

citing papers explorer

Showing 19 of 19 citing papers.

Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation cs.RO · 2026-05-20 · unverdicted · none · ref 1 · internal anchor
Demo-JEPA enables one-shot cross-embodiment imitation by mapping visual demonstrations to shared latent future trajectories that serve as subgoals for the target agent's own forward dynamics planning.
ASH: Agents that Self-Hone via Embodied Learning cs.AI · 2026-05-14 · unverdicted · none · ref 13 · internal anchor
ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift cs.LG · 2026-05-09 · unverdicted · none · ref 14 · 2 links · internal anchor
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
Goal-Conditioned Agents that Learn Everything All at Once cs.LG · 2026-05-22 · unverdicted · none · ref 24 · internal anchor
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
HITL-D: Human In The Loop Diffusion Assisted Shared Control cs.RO · 2026-05-20 · unverdicted · none · ref 27 · internal anchor
HITL-D combines diffusion policies with human input for shared robotic control, reducing required joystick axes and improving speed and workload in manipulation tasks per a 12-participant study.
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance cs.AI · 2025-09-30 · unverdicted · none · ref 14 · internal anchor
TimeRewarder derives step-wise progress rewards from frame-wise temporal distances in passive videos and uses them to guide RL, achieving high success rates on Meta-World tasks with fewer interactions than prior methods or hand-designed rewards.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions cs.LG · 2025-09-23 · unverdicted · none · ref 28 · internal anchor
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model cs.CV · 2025-03-13 · unverdicted · none · ref 36 · internal anchor
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models cs.AI · 2024-08-01 · conditional · none · ref 74 · internal anchor
Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
On the Role of Language Representations in Auto-Bidding: Findings and Implications cs.AI · 2026-05-07 · unverdicted · none · ref 36
SemBid injects LLM-encoded Task, History, and Strategy semantics as tokens into offline bidding trajectories and uses self-attention to outperform numerical-only baselines in performance, constraint satisfaction, and robustness.
Learning from Demonstration with Failure Awareness for Safe Robot Navigation cs.RO · 2026-04-25 · unverdicted · none · ref 3
A framework decouples failure data for value estimation and success data for policy learning in offline RL to reduce collisions in robot navigation while maintaining success rates.
Hybrid Adaptive Tuning for Tiered Memory Systems cs.OS · 2026-04-14 · unverdicted · none · ref 64
PTMT is a lightweight framework that automates parameter tuning for memory tiering via hybrid offline database building and online customized reinforcement learning, delivering 14-30% gains over defaults and 32% over prior art on four systems.
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation? cs.RO · 2026-04-06 · unverdicted · none · ref 41
Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition cs.RO · 2026-05-20 · unverdicted · none · ref 6 · internal anchor
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
COOPO: Cyclic Offline-Online Policy Optimization Algorithm cs.LG · 2026-05-18 · unverdicted · none · ref 50 · internal anchor
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics cs.LG · 2026-02-11 · unverdicted · none · ref 40 · internal anchor
UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators cs.RO · 2025-10-10 · unverdicted · none · ref 13 · internal anchor
Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real robotic arm than behavioral cloning.
Treatment, evidence, imitation, and chat stat.OT · 2025-06-29 · unverdicted · none · ref 107 · internal anchor
LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.
Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning cs.RO · 2024-12-27 · unverdicted · none · ref 44 · internal anchor
A centralized HRL planner with HTAN, multi-stage curricula, and counterfactual baseline scales multi-robot task planning to 200 robots and 1000 racks on unlearned maps in RMFS.

arXiv preprint arXiv:1805.01954 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer