{"paper":{"title":"From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Video-based robot manipulation methods are limited most by how predictions connect to reliable physical actions.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Chen Wang, Jia Pan, Linfang Zheng, Wei Zhang, Zikai Ouyang","submitted_at":"2026-04-04T15:37:11Z","abstract_excerpt":"Video is a scalable observation of physical dynamics: it captures how objects move, how contact unfolds, and how scenes evolve under interaction -- all without requiring robot action labels. Yet translating this temporal structure into reliable robotic control remains an open challenge, because video lacks action supervision and differs from robot experience in embodiment, viewpoint, and physical constraints. This survey reviews methods that exploit non-action-annotated temporal video to learn control interfaces for robotic manipulation. We introduce an interface-centric taxonomy organized by "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"A cross-family synthesis reveals that the most pressing open challenges center on the robotics integration layer -- the mechanisms that connect video-derived predictions to dependable robot behavior.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the three-family interface-centric taxonomy captures the essential distinctions among existing methods without leaving out important approaches or creating artificial boundaries.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Video-based robot manipulation methods are limited most by how predictions connect to reliable physical actions.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"df3efa0e624f8b1ec6bae0016d95378f1949f9353e3fcbab7f33e2efbd09f6e0"},"source":{"id":"2604.04974","kind":"arxiv","version":2},"verdict":{"id":"215e3a58-8752-4bbe-9029-4f3a4a62776d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-13T16:59:51.886561Z","strongest_claim":"A cross-family synthesis reveals that the most pressing open challenges center on the robotics integration layer -- the mechanisms that connect video-derived predictions to dependable robot behavior.","one_line_summary":"A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the three-family interface-centric taxonomy captures the essential distinctions among existing methods without leaving out important approaches or creating artificial boundaries.","pith_extraction_headline":"Video-based robot manipulation methods are limited most by how predictions connect to reliable physical actions."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.04974/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"3e9f75289b0f7452321a30e9013b8bb30629acaeff0188546ad0da27f533bdcf"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}