pith. sign in

hub

arXiv preprint arXiv:1805.01954 , year=

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it
abstract

Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

representative citing papers

ASH: Agents that Self-Hone via Embodied Learning

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

HITL-D: Human In The Loop Diffusion Assisted Shared Control

cs.RO · 2026-05-20 · unverdicted · novelty 6.0

HITL-D combines diffusion policies with human input for shared robotic control, reducing required joystick axes and improving speed and workload in manipulation tasks per a 12-participant study.

Hybrid Adaptive Tuning for Tiered Memory Systems

cs.OS · 2026-04-14 · unverdicted · novelty 6.0

PTMT is a lightweight framework that automates parameter tuning for memory tiering via hybrid offline database building and online customized reinforcement learning, delivering 14-30% gains over defaults and 32% over prior art on four systems.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

Treatment, evidence, imitation, and chat

stat.OT · 2025-06-29 · unverdicted · novelty 4.0

LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.

citing papers explorer

Showing 19 of 19 citing papers.