Maniwav: Learning robot manipulation from in-the-wild audio-visual data

· 2024 · arXiv 2406.19464

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A-SLIP: Acoustic Sensing for Continuous In-hand Slip Estimation

cs.RO · 2026-04-09 · unverdicted · novelty 7.0

A four-microphone acoustic system with a CNN achieves 14.1-degree mean directional error for continuous in-hand slip estimation and outperforms single-channel baselines.

Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

cs.RO · 2025-03-07 · unverdicted · novelty 7.0

Introduces the Kaiwu multimodal dataset and framework with 11,664 synchronized assembling demonstrations including hand motions, pressures, sounds, multi-view videos, motion capture, eye gaze, and EMG signals with timestamp-based and semantic annotations.

You're Pushing My Buttons: Instrumented Learning of Gentle Button Presses

cs.RO · 2026-04-07 · unverdicted · novelty 5.0

Training-time instrumentation with audio and privileged button-state signals produces contact policies that match success rates but apply lower forces using only vision and audio at inference.

Causal World Modeling for Robot Control

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.

citing papers explorer

Showing 4 of 4 citing papers.

A-SLIP: Acoustic Sensing for Continuous In-hand Slip Estimation cs.RO · 2026-04-09 · unverdicted · none · ref 36
A four-microphone acoustic system with a CNN achieves 14.1-degree mean directional error for continuous in-hand slip estimation and outperforms single-channel baselines.
Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction cs.RO · 2025-03-07 · unverdicted · none · ref 11
Introduces the Kaiwu multimodal dataset and framework with 11,664 synchronized assembling demonstrations including hand motions, pressures, sounds, multi-view videos, motion capture, eye gaze, and EMG signals with timestamp-based and semantic annotations.
You're Pushing My Buttons: Instrumented Learning of Gentle Button Presses cs.RO · 2026-04-07 · unverdicted · none · ref 5
Training-time instrumentation with audio and privileged button-state signals produces contact policies that match success rates but apply lower forces using only vision and audio at inference.
Causal World Modeling for Robot Control cs.CV · 2026-01-29 · unverdicted · none · ref 51
LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.

Maniwav: Learning robot manipulation from in-the-wild audio-visual data

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer