pith. sign in

arxiv: 1906.07298 · v1 · pith:KC4XCR3Jnew · submitted 2019-06-17 · 📡 eess.AS · cs.SD· eess.IV

Weighted delay-and-sum beamforming guided by visual tracking for human-robot interaction

classification 📡 eess.AS cs.SDeess.IV
keywords beamformingvisualrobotdelay-and-sumservoingsourcesourcestracking
0
0 comments X
read the original abstract

This paper describes the integration of weighted delay-and-sum beamforming with speech source localization using image processing and robot head visual servoing for source tracking. We take into consideration the fact that the directivity gain provided by the beamforming depends on the angular distance between its main lobe and the main response axis of the microphone array. A visual servoing scheme is used to reduce the angular distance between the center of the video frame of a robot camera and a target object. Additionally, the beamforming strategy presented combines two information sources: the direction of the target object obtained with image processing and the audio signals provided by a microphone array. These sources of information were integrated by making use of a weighted delay-and-sum beamforming method. Experiments were carried out with a real mobile robotic testbed built with a PR2 robot. Static and dynamic robot head as well as the use of one and two external noise sources were considered. The results presented here show that the appropriate integration of visual source tracking with visual servoing and a beamforming method can lead to a reduction in WER as high as 34% compared to beamforming alone.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.