pith. sign in

arxiv: 2505.09603 · v2 · pith:L47S3427new · submitted 2025-05-14 · 💻 cs.RO · cs.LG

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

classification 💻 cs.RO cs.LG
keywords dataselectiondatamildatasetsperformancepoliciespriortask-specific
0
0 comments X
read the original abstract

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that improves the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we introduce a surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on 60+ simulation and real-world manipulation tasks, notably showing successful data selection from the largest open collections of robot datasets (OXE); demonstrating consistent gains in success rates over prior works. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics. More information at https://robin-lab.cs.utexas.edu/datamodels4imitation/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. An Efficient Metric for Data Quality Measurement in Imitation Learning

    cs.RO 2026-05 unverdicted novelty 6.0

    Power spectral density of trajectories ranks demonstration quality for imitation learning, enabling rollout-free curation that improves fine-tuned policy success.

  2. World Value Models for Robotic Manipulation

    cs.RO 2026-06 unverdicted novelty 5.0

    World Value Model (WVM) integrates world models with value estimation to achieve SOTA Value-Order Correlation on expert and suboptimal robotic data and improves downstream policy performance.

  3. TSD: A Physics-Inspired Trajectory Saliency Detector for Efficient Imitation Learning

    cs.RO 2026-06 unverdicted novelty 5.0

    TSD applies two physics metrics to identify salient trajectory segments for dataset compression and expansion in robotic imitation learning, yielding comparable performance with 25% less data on average.

  4. RoboLineage: Agent-Native Data Lifecycle Governance Across Robot Policy Iterations

    cs.RO 2026-06 unverdicted novelty 5.0

    RoboLineage introduces an agent-native data lifecycle governance system that represents robot policy iteration steps as typed lineage artifacts to improve speed and auditability in real-robot workflows.

  5. Closing the Loop in Teleoperation: Episode-Level Data Quality Assessment and Feedback for High-Quality Demonstration Collection

    cs.RO 2026-05 unverdicted novelty 4.0

    The DQAF framework automates post-episode quality assessment and natural-language feedback in teleoperation to help novice operators produce higher-quality robot demonstration data faster.