Multi-Source Position and Direction-of-Arrival Estimation Based on Euclidean Distance Matrices

Klaus Br\"umann; Simon Doclo

arxiv: 2510.02556 · v2 · submitted 2025-10-02 · 📡 eess.AS · eess.SP

Multi-Source Position and Direction-of-Arrival Estimation Based on Euclidean Distance Matrices

Klaus Br\"umann , Simon Doclo This is my paper

Pith reviewed 2026-05-18 10:34 UTC · model grok-4.3

classification 📡 eess.AS eess.SP

keywords multi-source localizationEuclidean distance matricestime difference of arrivaldirection of arrival estimationmicrophone arraysGram matrix eigenvaluesProcrustes mapping

0 comments

The pith

Euclidean distance matrix methods estimate positions and directions of multiple sound sources by optimizing only one variable per source.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces techniques for multi-source sound localization from microphone arrays that build Euclidean distance matrices directly from estimated time differences of arrival. Position estimation then reduces to tuning a single distance parameter per source, with the best value selected by minimizing an eigenvalue-based cost on the Gram matrix before an orthogonal Procrustes step recovers absolute coordinates. Direction-of-arrival estimation avoids any continuous search by operating on a rank-reduced Gram matrix and choosing the lowest-cost candidate set. In tests with two sources, six microphones, and realistic noise plus reverberation, these steps produce lower error and shorter run times than steered-response power beamforming across varied geometries.

Core claim

By forming Euclidean distance matrices from TDOA estimates and inspecting the eigenvalues of the corresponding Gram matrices, the method identifies the optimal distance to a reference microphone for each source or the best candidate TDOA set for directions of arrival; relative source positions are subsequently aligned to absolute coordinates via an orthogonal Procrustes solution. This yields higher accuracy and lower computation than conventional joint optimization over multiple continuous variables.

What carries the argument

Euclidean distance matrices constructed from TDOA estimates, together with eigenvalue costs on their Gram matrices and a final orthogonal Procrustes mapping that converts relative positions to absolute ones.

If this is right

Position estimation requires optimization of only one continuous distance variable per source rather than three.
Direction-of-arrival estimation requires no continuous variable optimization at all.
The eigenvalue cost functions select the best candidate TDOA set directly from the Gram matrix properties.
An orthogonal Procrustes problem converts the relative source positions obtained from the distance matrices into absolute coordinates.
Experimental comparisons with two sources and six microphones show improved accuracy and reduced run time relative to steered-response power methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Improved upstream TDOA estimators could allow the same framework to handle three or more sources without increasing the per-source optimization burden.
The eigenvalue selection step may transfer to other distance-geometry problems in array signal processing where only partial distance information is available.
Because the method separates the distance search from the angular alignment, it could be paired with existing robust TDOA trackers to extend operation into more reverberant rooms.

Load-bearing premise

The input time-difference-of-arrival estimates must be accurate enough that the derived distance matrices preserve the true geometric relationships among microphones and sources.

What would settle it

An experiment that injects progressively larger errors into the TDOA estimates until the position and DOA errors of the EDM method exceed those of the SRP baseline for the same six-microphone setups.

Figures

Figures reproduced from arXiv: 2510.02556 by Klaus Br\"umann, Simon Doclo.

**Figure 2.** Figure 2: (a) Cost functions J(α,q) for all combinations of TDOA estimates q, (b) Corresponding minimum cost function values. Source in far field e1 e2 v1 δ1 δ2 δ3 δ4 θ1 m1 m2 m3 m4 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Exemplary two-dimensional configuration, consisti [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Cost function values I(q) for all combinations of candidate TDOA estimates q. A. SRP-Based Multi-Source Position Estimation Similarly to the GCC-PHAT function in (20), the SRPPHAT functional for the 3-dimensional position estimation [22] is defined as Ψ(p) =X i>j Z ω0 −ω0 ψij (ω)e −ωτij(p) dω , (57) where τij (p) denotes the TDOA (8) corresponding to position p=[px,py,pz] T , and the summation considers … view at source ↗

**Figure 5.** Figure 5: Box plots of position estimation errors for both sour [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Box plots of DOA estimation errors for both sources. T [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

A popular method to estimate the positions or directions-of-arrival (DOAs) of multiple sound sources using an array of microphones is based on steered-response power (SRP) beamforming. For a three-dimensional scenario, SRP-based methods require joint optimization of three continuous variables for position estimation or two continuous variables for DOA estimation, which can be computationally expensive when high localization accuracy is desired. In this paper, we propose novel methods for multi-source position and DOA estimation by exploiting properties of Euclidean distance matrices (EDMs) and their respective Gram matrices. All methods require estimated time-differences of arrival (TDOAs) between the microphones. In the proposed multi-source position estimation method, only a single continuous variable per source, representing the distance to a reference microphone, needs to be optimized. For each source, the optimal distance variable and set of candidate TDOA estimates are determined by minimizing a cost function defined using the eigenvalues of the Gram matrix. The estimated relative source positions are then mapped to absolute source positions by solving an orthogonal Procrustes problem. The proposed multi-source DOA estimation method eliminates the need for continuous variable optimization. The optimal set of candidate TDOA estimates is determined by minimizing a cost function defined using the eigenvalues of a rank-reduced Gram matrix. For two sources in a noisy and reverberant environment, experimental results for different source and microphone configurations with six microphones show that the proposed EDM-based method consistently outperforms the SRP-based method in terms of position and DOA estimation accuracy and run time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The EDM approach reduces multi-source localization to one distance variable per source via Gram-matrix eigenvalues and reports runtime plus accuracy gains over SRP, but those gains rest on TDOA estimates whose error statistics are not separately checked.

read the letter

This paper gives a method that turns multi-source position estimation into a single-variable search per source. From TDOA estimates it builds an EDM, forms the Gram matrix, and minimizes a cost based on the eigenvalues to pick both the distance to a reference mic and the best TDOA set. An orthogonal Procrustes step then maps the relative positions to absolute ones. For DOA the same idea is applied to a rank-reduced Gram matrix so no continuous optimization is required at all.

Referee Report

2 major / 2 minor

Summary. The paper proposes EDM-based methods for multi-source position and DOA estimation that rely on pre-estimated TDOAs. Position estimation optimizes one continuous distance variable per source by minimizing an eigenvalue cost function on the Gram matrix, followed by orthogonal Procrustes mapping to absolute coordinates. DOA estimation avoids continuous optimization by minimizing an eigenvalue cost on a rank-reduced Gram matrix. Experiments with six microphones in noisy reverberant conditions for two sources report consistent gains over SRP baselines in both accuracy and runtime across varied source/microphone geometries.

Significance. If the TDOA precondition holds, the reduction from three (or two) continuous variables to one per source, together with the algebraic EDM/Gram-matrix route, offers a potentially faster and more scalable alternative to SRP for multi-source localization. The eigenvalue-minimization and Procrustes steps are standard tools applied in a novel combination here, and the reported runtime and accuracy improvements are concrete. The work would be strengthened by isolating the contribution of the TDOA front-end.

major comments (2)

[Abstract and experimental results] Abstract and experimental results: the central claim that the EDM pipeline 'consistently outperforms' SRP in position/DOA accuracy and runtime rests on the assumption that the input TDOAs are sufficiently accurate. The manuscript provides no separate TDOA error statistics, no sensitivity sweep of RMSE versus TDOA perturbation, and no description of the TDOA estimator used under the same reverberant conditions as the six-microphone trials. Because SRP operates directly on raw signals, any reported advantage is conditional on an untested upstream error source.
[Experimental results] Experimental results: the reported gains lack visible error bars, confidence intervals, or exact statistical tests (e.g., paired t-tests or Wilcoxon tests) across the different source and microphone configurations. Without these, it is difficult to assess whether the observed improvements are statistically reliable or merely consistent in direction.

minor comments (2)

[DOA estimation method] Notation: the precise construction of the rank-reduced Gram matrix used for the DOA cost function should be given explicitly (including the rank-reduction step) to allow independent reproduction.
[Experimental setup] The paper should state the exact TDOA estimator and its hyper-parameters so that the experimental conditions are fully reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the two major comments point by point below, agreeing where the manuscript is incomplete and outlining the planned revisions.

read point-by-point responses

Referee: [Abstract and experimental results] Abstract and experimental results: the central claim that the EDM pipeline 'consistently outperforms' SRP in position/DOA accuracy and runtime rests on the assumption that the input TDOAs are sufficiently accurate. The manuscript provides no separate TDOA error statistics, no sensitivity sweep of RMSE versus TDOA perturbation, and no description of the TDOA estimator used under the same reverberant conditions as the six-microphone trials. Because SRP operates directly on raw signals, any reported advantage is conditional on an untested upstream error source.

Authors: We agree that the reported gains are conditional on TDOA accuracy, as already noted in the abstract and Section II. The current manuscript does not report separate TDOA error statistics, a sensitivity analysis, or the precise TDOA estimator used in the reverberant trials. In the revised version we will (i) describe the TDOA estimator and its parameters, (ii) tabulate the observed TDOA RMSE for each geometry, and (iii) add a sensitivity study that perturbs the input TDOAs and plots the resulting position/DOA RMSE. These additions will make the dependence on the TDOA front-end explicit. revision: yes
Referee: [Experimental results] Experimental results: the reported gains lack visible error bars, confidence intervals, or exact statistical tests (e.g., paired t-tests or Wilcoxon tests) across the different source and microphone configurations. Without these, it is difficult to assess whether the observed improvements are statistically reliable or merely consistent in direction.

Authors: We concur that statistical reliability should be quantified. In the revision we will augment all accuracy and runtime plots with error bars (standard deviation over repeated trials with independent noise realizations) and include the results of paired statistical tests (Wilcoxon signed-rank or paired t-tests) for each source/microphone configuration to confirm that the observed improvements are statistically significant. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent EDM algebraic properties

full rationale

The paper constructs EDMs and Gram matrices directly from estimated TDOAs, then minimizes eigenvalue-based costs to optimize per-source distances and select TDOA sets before applying a standard Procrustes orthogonal mapping. These steps invoke general mathematical properties of EDMs (rank deficiency and non-negative eigenvalues of the Gram matrix) that hold independently of the final position or DOA estimates. No step defines the target result in terms of itself, renames a fitted quantity as a prediction, or relies on a self-citation chain for uniqueness. The reported experimental outperformance is therefore not forced by construction but rests on the validity of the upstream TDOA estimates, which the paper treats as given inputs rather than deriving circularly.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the algebraic properties of EDMs and Gram matrices plus the assumption that TDOA estimates can be treated as distance measurements; one free parameter (distance to reference microphone) is introduced per source.

free parameters (1)

distance to reference microphone
Single continuous variable per source that is optimized by minimizing the eigenvalue-based cost function of the Gram matrix.

axioms (1)

domain assumption Estimated TDOAs between microphones can be converted into a Euclidean distance matrix whose Gram matrix eigenvalues reliably indicate geometric consistency.
Invoked when the cost function is defined using the eigenvalues of the Gram matrix for both position and DOA variants.

pith-pipeline@v0.9.0 · 5806 in / 1352 out tokens · 36836 ms · 2026-05-18T10:34:16.300911+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For each source, the optimal distance variable and set of candidate TDOA estimates are determined by minimizing a cost function defined using the eigenvalues of the Gram matrix.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the rank of the positive semi-definite matrix Gs is at most P

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Acousti c source localization with microphone arrays,

N. Madhu, R. Martin, U. Heute, and C. Antweiler, “Acousti c source localization with microphone arrays,” in Advances in Digital Speech Transmission, R. Martin, U. Heute, and C. Antweiler, Eds. Chichester, UK: Wiley, 2008, pp. 135–170

work page 2008
[2]

Mult ichannel source activity detection, localization, and tracking,

P . Pertil¨ a, A. Brutti, P . Svaizer, and M. Omologo, “Mult ichannel source activity detection, localization, and tracking,” in Audio source separation and speech enhancement , E. Vincent, T. Virtanen, and S. Gannot, Eds. Wiley, 2018, pp. 47–64

work page 2018
[3]

Cognitive-driven binaural beam forming us- ing EEG-based auditory attention decoding,

A. Aroudi and S. Doclo, “Cognitive-driven binaural beam forming us- ing EEG-based auditory attention decoding,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 28, pp. 862–875, 2020

work page 2020
[4]

Multi-channel speech separat ion using spatially selective deep non-linear ﬁlters,

K. Tesch and T. Gerkmann, “Multi-channel speech separat ion using spatially selective deep non-linear ﬁlters,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 32, pp. 542–553, 2023

work page 2023
[5]

Use of the crosspower-spectr um phase in acoustic event location,

M. Omologo and P . Svaizer, “Use of the crosspower-spectr um phase in acoustic event location,” IEEE Trans. Speech Audio Process. , vol. 5, no. 3, pp. 288–292, 1997

work page 1997
[6]

Rob ust local- ization in reverberant rooms,

J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Rob ust local- ization in reverberant rooms,” in Microphone arrays: signal processing techniques and applications, M. Brandstein and D. Ward, Eds. Springer, 2001, pp. 157–180. 13

work page 2001
[7]

Multiple source l ocalization based on acoustic map de-emphasis,

A. Brutti, M. Omologo, and P . Svaizer, “Multiple source l ocalization based on acoustic map de-emphasis,” EURASIP J. Audio, Speech & Music Process. , vol. 2010, 2010, art. no. 147495

work page 2010
[8]

Multiple emitter location and signal param eter estimation,

R. Schmidt, “Multiple emitter location and signal param eter estimation,” IEEE Trans. Antennas Propag. , vol. 34, no. 3, pp. 276–280, 1986

work page 1986
[9]

Time delay estimati on and source localization,

Y . A. Huang, J. Benesty, and J. Chen, “Time delay estimati on and source localization,” in Springer Handbook of Speech Processing , J. Benesty, Y . A. Huang, and M. Christensen, Eds. Springer, 2008, pp. 104 3–1063

work page 2008
[10]

Sound source locali zation using deep learning models,

N. Y alta, K. Nakadai, and T. Ogata, “Sound source locali zation using deep learning models,” Journal of Robotics and Mechatronics , vol. 29, no. 1, pp. 37–48, 2017

work page 2017
[11]

Differentia ble tracking- based training of deep learning sound source localizers,

S. Adavanne, A. Politis, and T. Virtanen, “Differentia ble tracking- based training of deep learning sound source localizers,” i n Proc. IEEE W orkshop on Applications of Signal Processing to Audio and A coustics (WASPAA), New Paltz, NY , USA, 2021, pp. 211–215

work page 2021
[12]

Semi- supervised source localization in reverberant environmen ts with deep generative modeling,

M. J. Bianco, S. Gannot, E. Fernandez-Grande, and P . Ger stoft, “Semi- supervised source localization in reverberant environmen ts with deep generative modeling,” IEEE Access , vol. 9, pp. 84 956–84 970, 2021

work page 2021
[13]

A s urvey of sound source localization with deep learning methods,

P .-A. Grumiaux, S. Kiti´ c, L. Girin, and A. Gu´ erin, “A s urvey of sound source localization with deep learning methods,” J. Acoust. Soc. Am. , vol. 152, no. 1, pp. 107–151, 2022

work page 2022
[14]

SRP-DNN: Learning direct-pa th phase difference for multiple moving sound source localization,

B. Y ang, H. Liu, and X. Li, “SRP-DNN: Learning direct-pa th phase difference for multiple moving sound source localization, ” in Proc. IEEE International Conference on Acoustics, Speech and Signal P rocessing (ICASSP), Singapore, 2022, pp. 721–725

work page 2022
[15]

The neural-SRP method for universal robust multi- source tracking,

E. Grinstein, C. M. Hicks, T. van Waterschoot, M. Brooke s, and P . A. Naylor, “The neural-SRP method for universal robust multi- source tracking,” IEEE Open J. Signal Process. , vol. 5, pp. 19–28, 2024

work page 2024
[16]

Improving mult i-talker binaural DOA estimation by combining periodicity and spati al features in convolutional neural networks,

R. V arzandeh, S. Doclo, and V . Hohmann, “Improving mult i-talker binaural DOA estimation by combining periodicity and spati al features in convolutional neural networks,” EURASIP J. Audio, Speech & Music Process., vol. 2025, no. 1, 2025, art. no. 5

work page 2025
[17]

A modiﬁed SRP-PHA T f unctional for robust real-time sound source localization with scalab le spatial sampling,

M. Cobos, A. Marti, and J. J. Lopez, “A modiﬁed SRP-PHA T f unctional for robust real-time sound source localization with scalab le spatial sampling,” IEEE Signal Process. Lett. , vol. 18, no. 1, pp. 71–74, 2010

work page 2010
[18]

A steered-resp onse power algorithm employing hierarchical search for acoustic sour ce localization using microphone arrays,

L. O. Nunes, W. A. Martins, M. V . S. Lima, L. W. P . Biscainh o, M. V . M. Costa, F. M. Gonc ¸alves, A. Said, and B. Lee, “A steered-resp onse power algorithm employing hierarchical search for acoustic sour ce localization using microphone arrays,” IEEE Trans. Signal Process. , vol. 62, no. 19, pp. 5171–5183, 2014

work page 2014
[19]

Exploiting a g eometrically sampled grid in the steered response power algorithm for loc alization improvement,

D. Salvati, C. Drioli, and G. L. Foresti, “Exploiting a g eometrically sampled grid in the steered response power algorithm for loc alization improvement,” J. Acoust. Soc. Am. , vol. 141, no. 1, pp. 586–601, 2017

work page 2017
[20]

Aco ustic source localization based on geometric projection in reverberant and noisy environments,

T. Long, J. Chen, G. Huang, J. Benesty, and I. Cohen, “Aco ustic source localization based on geometric projection in reverberant and noisy environments,” IEEE Selected Topics in Signal Processing, vol. 13, no. 1, pp. 143–155, 2018

work page 2018
[21]

Analytical model for the relatio n between signal bandwidth and spatial resolution in steered-respon se power phase transform (SRP-PHA T) maps,

G. Garc´ ıa-Barrios, J. M. Guti´ errez-Arriola, N. S´ ae nz-Lech´ on, V . J. Osma-Ruiz, and R. Fraile, “Analytical model for the relatio n between signal bandwidth and spatial resolution in steered-respon se power phase transform (SRP-PHA T) maps,” IEEE Access , vol. 9, pp. 121 549– 121 560, 2021

work page 2021
[22]

Steered respons e power for sound source localization: A tutorial review,

E. Grinstein, E. Tengan, B. C ¸ akmak, T. Dietzen, L. Nune s, T. van Waterschoot, M. Brookes, and P . A. Naylor, “Steered respons e power for sound source localization: A tutorial review,” EURASIP J. Audio, Speech & Music Process. , vol. 2024, no. 1, 2024, art. no. 59

work page 2024
[23]

Scalabl e-complexity steered response power based on low-rank and sparse interpo lation,

T. Dietzen, E. De Sena, and T. van Waterschoot, “Scalabl e-complexity steered response power based on low-rank and sparse interpo lation,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 32, pp. 5024– 5039, 2024

work page 2024
[24]

A robust steered res ponse power localization method for wireless acoustic sensor networks in an outdoor environment,

Y . Huang, J. Tong, X. Hu, and M. Bao, “A robust steered res ponse power localization method for wireless acoustic sensor networks in an outdoor environment,” Sensors, vol. 21, no. 5, 2021, art. no. 1591

work page 2021
[25]

3D single source localizatio n based on Euclidean distance matrices,

K. Br¨ umann and S. Doclo, “3D single source localizatio n based on Euclidean distance matrices,” in Proc. IEEE International W orkshop on Acoustic Echo and Noise Control (IWAENC) , Bamberg, Germany, 2022, pp. 1–5

work page 2022
[26]

A fast cumulative s teered response power for multiple speaker detection and localization,

Y . Oualil, F. Faubel, and D. Klakow, “A fast cumulative s teered response power for multiple speaker detection and localization,” in Proc. Euro- pean Signal Processing Conference (EUSIPCO) , Marrakech, Morocco, 2013, pp. 1–5

work page 2013
[27]

Incident signal power compa rison for localization of concurrent multiple acoustic sources,

D. Salvati and S. Canazza, “Incident signal power compa rison for localization of concurrent multiple acoustic sources,” The Scientiﬁc W orld Journal, vol. 2014, no. 1, 2014, art. no. 582397

work page 2014
[28]

Joint es timation of pitch and direction of arrival: improving robustness and accurac y for multi- speaker scenarios,

S. Gerlach, J. Bitzer, S. Goetze, and S. Doclo, “Joint es timation of pitch and direction of arrival: improving robustness and accurac y for multi- speaker scenarios,” EURASIP J. Audio, Speech & Music Process. , vol. 2014, no. 1, 2014, art. no. 31

work page 2014
[29]

Com- parison of frequency-fusion mechanisms for binaural direc tion-of-arrival estimation for multiple speakers,

D. Fejgin, E. Hadad, S. Gannot, Z. Koldovsky, and S. Docl o, “Com- parison of frequency-fusion mechanisms for binaural direc tion-of-arrival estimation for multiple speakers,” in Proc. IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP) . Seoul, Korea: IEEE, 2024, pp. 731–735

work page 2024
[30]

A multiple sources l ocalization method based on TDOA without association ambiguity for near and far mixed ﬁeld sources,

H. Liu, Y . Chen, Y . Lin, and Q. Xiao, “A multiple sources l ocalization method based on TDOA without association ambiguity for near and far mixed ﬁeld sources,” Circuits, Systems, and Signal Processing , vol. 40, no. 8, pp. 4018–4046, 2021

work page 2021
[31]

A feature-based data association me thod for multiple acoustic source localization in a distributed mic rophone array,

X. Dang and H. Zhu, “A feature-based data association me thod for multiple acoustic source localization in a distributed mic rophone array,” J. Acoust. Soc. Am. , vol. 149, no. 1, pp. 612–628, 2021

work page 2021
[32]

Euclidean distance geometry,

J. C. Gower, “Euclidean distance geometry,” Mathematical Scientist , vol. 7, no. 1, pp. 1–14, 1982

work page 1982
[33]

Euclidean distance matrices: essential theory, algorithms, and appl ications,

I. Dokmani´ c, R. Parhizkar, J. Ranieri, and M. V etterli , “Euclidean distance matrices: essential theory, algorithms, and appl ications,” IEEE Signal Process. Mag. , vol. 32, no. 6, pp. 12–30, 2015

work page 2015
[34]

The generalized correlation me thod for estimation of time delay,

C. Knapp and G. Carter, “The generalized correlation me thod for estimation of time delay,” IEEE Trans. Acoust., Speech, Signal Process. , vol. 24, no. 4, pp. 320–327, 1976

work page 1976
[35]

Time delay estimatio n in room acoustic environments: An overview,

J. Chen, J. Benesty, and Y . Huang, “Time delay estimatio n in room acoustic environments: An overview,” EURASIP J. Adv. Signal Process. , vol. 2006, 2006, art. no. 26503

work page 2006
[36]

Proposal and validation of an analytical generati ve model of SRP-PHA T power maps in reverberant scenarios,

J. V elasco, C. J. Martin-Arguedas, J. Macias-Guarasa, D. Pizarro, and M. Mazo, “Proposal and validation of an analytical generati ve model of SRP-PHA T power maps in reverberant scenarios,” Signal Process., vol. 119, pp. 209–228, 2016

work page 2016
[37]

Why does PHA T wor k well in low noise, reverberative environments?

C. Zhang, D. Florˆ encio, and Z. Zhang, “Why does PHA T wor k well in low noise, reverberative environments?” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICA SSP), Las V egas, NV , USA, 2008, pp. 2565–2568

work page 2008
[38]

A generalized solution of the ortho gonal Procrustes problem,

P . H. Sch¨ onemann, “A generalized solution of the ortho gonal Procrustes problem,” Psychometrika, vol. 31, no. 1, pp. 1–10, 1966

work page 1966
[39]

Image method for efﬁcient ly simulating small-room acoustics,

J. B. Allen and D. A. Berkley, “Image method for efﬁcient ly simulating small-room acoustics,” J. Acoust. Soc. Am. , vol. 65, no. 4, pp. 943–950, 1979

work page 1979
[40]

RIR-generator,

E. A. P . Habets, “RIR-generator,” available: https://github.com/ehabets/RIR-Generator. Accessed: A ug. 01, 2025

work page 2025
[41]

M-ailabs speech dataset,

I. Solak, “M-ailabs speech dataset,” available: https://www.caito.de/2019/01/03/the-m-ailabs-speech-dataset/. Accessed: Aug. 01, 2025

work page 2019
[42]

Generating non stationary multisensor signals under a spatial coherence constraint,

E. A. P . Habets, I. Cohen, and S. Gannot, “Generating non stationary multisensor signals under a spatial coherence constraint, ” J. Acoust. Soc. Am., vol. 124, no. 5, pp. 2911–2917, 2008

work page 2008
[43]

A. V . Oppenheim and R. W. Schafer, Discrete-Time Signal Processing , 3rd ed. Prentice Hall, 2009

work page 2009
[44]

Incremen tal aver- aging method to improve graph-based time-difference-of-a rrival estima- tion,

K. Br¨ umann, K. Y amaoka, N. Ono, and S. Doclo, “Incremen tal aver- aging method to improve graph-based time-difference-of-a rrival estima- tion,” in Proc. IEEE W orkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) , Tahoe City, CA, USA, 2025

work page 2025
[45]

Steered response power-base d direction-of- arrival estimation exploiting an auxiliary microphone,

K. Br¨ umann and S. Doclo, “Steered response power-base d direction-of- arrival estimation exploiting an auxiliary microphone,” i n Proc. Euro- pean Signal Processing Conference (EUSIPCO) , Lyon, France, 2024, pp. 917–921

work page 2024
[46]

A fast microphone array SRP-P HA T source location implementation using coarse-to-ﬁne regio n contraction (CFRC),

H. Do and H. F. Silverman, “A fast microphone array SRP-P HA T source location implementation using coarse-to-ﬁne regio n contraction (CFRC),” in Proc. IEEE W orkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY , USA, 2007, pp. 295– 298

work page 2007
[47]

A class of greedy algori thms for the generalized assignment problem,

H. E. Romeijn and D. R. Morales, “A class of greedy algori thms for the generalized assignment problem,” Discrete Appl. Math. , vol. 103, no. 1-3, pp. 209–235, 2000

work page 2000

[1] [1]

Acousti c source localization with microphone arrays,

N. Madhu, R. Martin, U. Heute, and C. Antweiler, “Acousti c source localization with microphone arrays,” in Advances in Digital Speech Transmission, R. Martin, U. Heute, and C. Antweiler, Eds. Chichester, UK: Wiley, 2008, pp. 135–170

work page 2008

[2] [2]

Mult ichannel source activity detection, localization, and tracking,

P . Pertil¨ a, A. Brutti, P . Svaizer, and M. Omologo, “Mult ichannel source activity detection, localization, and tracking,” in Audio source separation and speech enhancement , E. Vincent, T. Virtanen, and S. Gannot, Eds. Wiley, 2018, pp. 47–64

work page 2018

[3] [3]

Cognitive-driven binaural beam forming us- ing EEG-based auditory attention decoding,

A. Aroudi and S. Doclo, “Cognitive-driven binaural beam forming us- ing EEG-based auditory attention decoding,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 28, pp. 862–875, 2020

work page 2020

[4] [4]

Multi-channel speech separat ion using spatially selective deep non-linear ﬁlters,

K. Tesch and T. Gerkmann, “Multi-channel speech separat ion using spatially selective deep non-linear ﬁlters,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 32, pp. 542–553, 2023

work page 2023

[5] [5]

Use of the crosspower-spectr um phase in acoustic event location,

M. Omologo and P . Svaizer, “Use of the crosspower-spectr um phase in acoustic event location,” IEEE Trans. Speech Audio Process. , vol. 5, no. 3, pp. 288–292, 1997

work page 1997

[6] [6]

Rob ust local- ization in reverberant rooms,

J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Rob ust local- ization in reverberant rooms,” in Microphone arrays: signal processing techniques and applications, M. Brandstein and D. Ward, Eds. Springer, 2001, pp. 157–180. 13

work page 2001

[7] [7]

Multiple source l ocalization based on acoustic map de-emphasis,

A. Brutti, M. Omologo, and P . Svaizer, “Multiple source l ocalization based on acoustic map de-emphasis,” EURASIP J. Audio, Speech & Music Process. , vol. 2010, 2010, art. no. 147495

work page 2010

[8] [8]

Multiple emitter location and signal param eter estimation,

R. Schmidt, “Multiple emitter location and signal param eter estimation,” IEEE Trans. Antennas Propag. , vol. 34, no. 3, pp. 276–280, 1986

work page 1986

[9] [9]

Time delay estimati on and source localization,

Y . A. Huang, J. Benesty, and J. Chen, “Time delay estimati on and source localization,” in Springer Handbook of Speech Processing , J. Benesty, Y . A. Huang, and M. Christensen, Eds. Springer, 2008, pp. 104 3–1063

work page 2008

[10] [10]

Sound source locali zation using deep learning models,

N. Y alta, K. Nakadai, and T. Ogata, “Sound source locali zation using deep learning models,” Journal of Robotics and Mechatronics , vol. 29, no. 1, pp. 37–48, 2017

work page 2017

[11] [11]

Differentia ble tracking- based training of deep learning sound source localizers,

S. Adavanne, A. Politis, and T. Virtanen, “Differentia ble tracking- based training of deep learning sound source localizers,” i n Proc. IEEE W orkshop on Applications of Signal Processing to Audio and A coustics (WASPAA), New Paltz, NY , USA, 2021, pp. 211–215

work page 2021

[12] [12]

Semi- supervised source localization in reverberant environmen ts with deep generative modeling,

M. J. Bianco, S. Gannot, E. Fernandez-Grande, and P . Ger stoft, “Semi- supervised source localization in reverberant environmen ts with deep generative modeling,” IEEE Access , vol. 9, pp. 84 956–84 970, 2021

work page 2021

[13] [13]

A s urvey of sound source localization with deep learning methods,

P .-A. Grumiaux, S. Kiti´ c, L. Girin, and A. Gu´ erin, “A s urvey of sound source localization with deep learning methods,” J. Acoust. Soc. Am. , vol. 152, no. 1, pp. 107–151, 2022

work page 2022

[14] [14]

SRP-DNN: Learning direct-pa th phase difference for multiple moving sound source localization,

B. Y ang, H. Liu, and X. Li, “SRP-DNN: Learning direct-pa th phase difference for multiple moving sound source localization, ” in Proc. IEEE International Conference on Acoustics, Speech and Signal P rocessing (ICASSP), Singapore, 2022, pp. 721–725

work page 2022

[15] [15]

The neural-SRP method for universal robust multi- source tracking,

E. Grinstein, C. M. Hicks, T. van Waterschoot, M. Brooke s, and P . A. Naylor, “The neural-SRP method for universal robust multi- source tracking,” IEEE Open J. Signal Process. , vol. 5, pp. 19–28, 2024

work page 2024

[16] [16]

Improving mult i-talker binaural DOA estimation by combining periodicity and spati al features in convolutional neural networks,

R. V arzandeh, S. Doclo, and V . Hohmann, “Improving mult i-talker binaural DOA estimation by combining periodicity and spati al features in convolutional neural networks,” EURASIP J. Audio, Speech & Music Process., vol. 2025, no. 1, 2025, art. no. 5

work page 2025

[17] [17]

A modiﬁed SRP-PHA T f unctional for robust real-time sound source localization with scalab le spatial sampling,

M. Cobos, A. Marti, and J. J. Lopez, “A modiﬁed SRP-PHA T f unctional for robust real-time sound source localization with scalab le spatial sampling,” IEEE Signal Process. Lett. , vol. 18, no. 1, pp. 71–74, 2010

work page 2010

[18] [18]

A steered-resp onse power algorithm employing hierarchical search for acoustic sour ce localization using microphone arrays,

L. O. Nunes, W. A. Martins, M. V . S. Lima, L. W. P . Biscainh o, M. V . M. Costa, F. M. Gonc ¸alves, A. Said, and B. Lee, “A steered-resp onse power algorithm employing hierarchical search for acoustic sour ce localization using microphone arrays,” IEEE Trans. Signal Process. , vol. 62, no. 19, pp. 5171–5183, 2014

work page 2014

[19] [19]

Exploiting a g eometrically sampled grid in the steered response power algorithm for loc alization improvement,

D. Salvati, C. Drioli, and G. L. Foresti, “Exploiting a g eometrically sampled grid in the steered response power algorithm for loc alization improvement,” J. Acoust. Soc. Am. , vol. 141, no. 1, pp. 586–601, 2017

work page 2017

[20] [20]

Aco ustic source localization based on geometric projection in reverberant and noisy environments,

T. Long, J. Chen, G. Huang, J. Benesty, and I. Cohen, “Aco ustic source localization based on geometric projection in reverberant and noisy environments,” IEEE Selected Topics in Signal Processing, vol. 13, no. 1, pp. 143–155, 2018

work page 2018

[21] [21]

Analytical model for the relatio n between signal bandwidth and spatial resolution in steered-respon se power phase transform (SRP-PHA T) maps,

G. Garc´ ıa-Barrios, J. M. Guti´ errez-Arriola, N. S´ ae nz-Lech´ on, V . J. Osma-Ruiz, and R. Fraile, “Analytical model for the relatio n between signal bandwidth and spatial resolution in steered-respon se power phase transform (SRP-PHA T) maps,” IEEE Access , vol. 9, pp. 121 549– 121 560, 2021

work page 2021

[22] [22]

Steered respons e power for sound source localization: A tutorial review,

E. Grinstein, E. Tengan, B. C ¸ akmak, T. Dietzen, L. Nune s, T. van Waterschoot, M. Brookes, and P . A. Naylor, “Steered respons e power for sound source localization: A tutorial review,” EURASIP J. Audio, Speech & Music Process. , vol. 2024, no. 1, 2024, art. no. 59

work page 2024

[23] [23]

Scalabl e-complexity steered response power based on low-rank and sparse interpo lation,

T. Dietzen, E. De Sena, and T. van Waterschoot, “Scalabl e-complexity steered response power based on low-rank and sparse interpo lation,” IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol. 32, pp. 5024– 5039, 2024

work page 2024

[24] [24]

A robust steered res ponse power localization method for wireless acoustic sensor networks in an outdoor environment,

Y . Huang, J. Tong, X. Hu, and M. Bao, “A robust steered res ponse power localization method for wireless acoustic sensor networks in an outdoor environment,” Sensors, vol. 21, no. 5, 2021, art. no. 1591

work page 2021

[25] [25]

3D single source localizatio n based on Euclidean distance matrices,

K. Br¨ umann and S. Doclo, “3D single source localizatio n based on Euclidean distance matrices,” in Proc. IEEE International W orkshop on Acoustic Echo and Noise Control (IWAENC) , Bamberg, Germany, 2022, pp. 1–5

work page 2022

[26] [26]

A fast cumulative s teered response power for multiple speaker detection and localization,

Y . Oualil, F. Faubel, and D. Klakow, “A fast cumulative s teered response power for multiple speaker detection and localization,” in Proc. Euro- pean Signal Processing Conference (EUSIPCO) , Marrakech, Morocco, 2013, pp. 1–5

work page 2013

[27] [27]

Incident signal power compa rison for localization of concurrent multiple acoustic sources,

D. Salvati and S. Canazza, “Incident signal power compa rison for localization of concurrent multiple acoustic sources,” The Scientiﬁc W orld Journal, vol. 2014, no. 1, 2014, art. no. 582397

work page 2014

[28] [28]

Joint es timation of pitch and direction of arrival: improving robustness and accurac y for multi- speaker scenarios,

S. Gerlach, J. Bitzer, S. Goetze, and S. Doclo, “Joint es timation of pitch and direction of arrival: improving robustness and accurac y for multi- speaker scenarios,” EURASIP J. Audio, Speech & Music Process. , vol. 2014, no. 1, 2014, art. no. 31

work page 2014

[29] [29]

Com- parison of frequency-fusion mechanisms for binaural direc tion-of-arrival estimation for multiple speakers,

D. Fejgin, E. Hadad, S. Gannot, Z. Koldovsky, and S. Docl o, “Com- parison of frequency-fusion mechanisms for binaural direc tion-of-arrival estimation for multiple speakers,” in Proc. IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP) . Seoul, Korea: IEEE, 2024, pp. 731–735

work page 2024

[30] [30]

A multiple sources l ocalization method based on TDOA without association ambiguity for near and far mixed ﬁeld sources,

H. Liu, Y . Chen, Y . Lin, and Q. Xiao, “A multiple sources l ocalization method based on TDOA without association ambiguity for near and far mixed ﬁeld sources,” Circuits, Systems, and Signal Processing , vol. 40, no. 8, pp. 4018–4046, 2021

work page 2021

[31] [31]

A feature-based data association me thod for multiple acoustic source localization in a distributed mic rophone array,

X. Dang and H. Zhu, “A feature-based data association me thod for multiple acoustic source localization in a distributed mic rophone array,” J. Acoust. Soc. Am. , vol. 149, no. 1, pp. 612–628, 2021

work page 2021

[32] [32]

Euclidean distance geometry,

J. C. Gower, “Euclidean distance geometry,” Mathematical Scientist , vol. 7, no. 1, pp. 1–14, 1982

work page 1982

[33] [33]

Euclidean distance matrices: essential theory, algorithms, and appl ications,

I. Dokmani´ c, R. Parhizkar, J. Ranieri, and M. V etterli , “Euclidean distance matrices: essential theory, algorithms, and appl ications,” IEEE Signal Process. Mag. , vol. 32, no. 6, pp. 12–30, 2015

work page 2015

[34] [34]

The generalized correlation me thod for estimation of time delay,

C. Knapp and G. Carter, “The generalized correlation me thod for estimation of time delay,” IEEE Trans. Acoust., Speech, Signal Process. , vol. 24, no. 4, pp. 320–327, 1976

work page 1976

[35] [35]

Time delay estimatio n in room acoustic environments: An overview,

J. Chen, J. Benesty, and Y . Huang, “Time delay estimatio n in room acoustic environments: An overview,” EURASIP J. Adv. Signal Process. , vol. 2006, 2006, art. no. 26503

work page 2006

[36] [36]

Proposal and validation of an analytical generati ve model of SRP-PHA T power maps in reverberant scenarios,

J. V elasco, C. J. Martin-Arguedas, J. Macias-Guarasa, D. Pizarro, and M. Mazo, “Proposal and validation of an analytical generati ve model of SRP-PHA T power maps in reverberant scenarios,” Signal Process., vol. 119, pp. 209–228, 2016

work page 2016

[37] [37]

Why does PHA T wor k well in low noise, reverberative environments?

C. Zhang, D. Florˆ encio, and Z. Zhang, “Why does PHA T wor k well in low noise, reverberative environments?” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICA SSP), Las V egas, NV , USA, 2008, pp. 2565–2568

work page 2008

[38] [38]

A generalized solution of the ortho gonal Procrustes problem,

P . H. Sch¨ onemann, “A generalized solution of the ortho gonal Procrustes problem,” Psychometrika, vol. 31, no. 1, pp. 1–10, 1966

work page 1966

[39] [39]

Image method for efﬁcient ly simulating small-room acoustics,

J. B. Allen and D. A. Berkley, “Image method for efﬁcient ly simulating small-room acoustics,” J. Acoust. Soc. Am. , vol. 65, no. 4, pp. 943–950, 1979

work page 1979

[40] [40]

RIR-generator,

E. A. P . Habets, “RIR-generator,” available: https://github.com/ehabets/RIR-Generator. Accessed: A ug. 01, 2025

work page 2025

[41] [41]

M-ailabs speech dataset,

I. Solak, “M-ailabs speech dataset,” available: https://www.caito.de/2019/01/03/the-m-ailabs-speech-dataset/. Accessed: Aug. 01, 2025

work page 2019

[42] [42]

Generating non stationary multisensor signals under a spatial coherence constraint,

E. A. P . Habets, I. Cohen, and S. Gannot, “Generating non stationary multisensor signals under a spatial coherence constraint, ” J. Acoust. Soc. Am., vol. 124, no. 5, pp. 2911–2917, 2008

work page 2008

[43] [43]

A. V . Oppenheim and R. W. Schafer, Discrete-Time Signal Processing , 3rd ed. Prentice Hall, 2009

work page 2009

[44] [44]

Incremen tal aver- aging method to improve graph-based time-difference-of-a rrival estima- tion,

K. Br¨ umann, K. Y amaoka, N. Ono, and S. Doclo, “Incremen tal aver- aging method to improve graph-based time-difference-of-a rrival estima- tion,” in Proc. IEEE W orkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) , Tahoe City, CA, USA, 2025

work page 2025

[45] [45]

Steered response power-base d direction-of- arrival estimation exploiting an auxiliary microphone,

K. Br¨ umann and S. Doclo, “Steered response power-base d direction-of- arrival estimation exploiting an auxiliary microphone,” i n Proc. Euro- pean Signal Processing Conference (EUSIPCO) , Lyon, France, 2024, pp. 917–921

work page 2024

[46] [46]

A fast microphone array SRP-P HA T source location implementation using coarse-to-ﬁne regio n contraction (CFRC),

H. Do and H. F. Silverman, “A fast microphone array SRP-P HA T source location implementation using coarse-to-ﬁne regio n contraction (CFRC),” in Proc. IEEE W orkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY , USA, 2007, pp. 295– 298

work page 2007

[47] [47]

A class of greedy algori thms for the generalized assignment problem,

H. E. Romeijn and D. R. Morales, “A class of greedy algori thms for the generalized assignment problem,” Discrete Appl. Math. , vol. 103, no. 1-3, pp. 209–235, 2000

work page 2000