{"paper":{"title":"Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"Geometry conditioning lets a spatially selective filter generalize target speaker extraction across different microphone array shapes.","cross_cats":[],"primary_cat":"eess.AS","authors_text":"Jiatong Li, Simon Doclo, Wiebke Middelberg","submitted_at":"2026-05-18T14:11:37Z","abstract_excerpt":"Recently, a spatially selective non-linear filter (SSF) has been proposed for target speaker extraction, using the target direction-of-arrival (DOA) as a spatial cue. Since learned intermediate features are tied to the microphone geometry, the performance of the SSF degrades significantly when evaluated on mismatched array geometries. In this paper, we propose a geometry-conditioned SSF (GC-SSF), which incorporates a geometry-conditioning branch based on FiLM layers. Furthermore, we propose a feature that jointly encodes the DOA and the microphone positions (DOA-MPE). The conditioning branch m"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The proposed GC-SSF generalizes better to mismatched geometries while maintaining high spatial selectivity, as demonstrated by experimental results across circular, uniform linear, and random microphone arrays.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the geometry-conditioning branch using FiLM layers and the DOA-MPE feature can effectively capture and apply the spatial relationship between microphone positions and target speaker direction to adapt the SSF filtering process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"GC-SSF with DOA-MPE feature generalizes target speaker extraction to mismatched microphone array geometries while preserving spatial selectivity.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Geometry conditioning lets a spatially selective filter generalize target speaker extraction across different microphone array shapes.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"adbfe154ff1408db7c12e0d75b68b2f25c8a0c21204ec20a5b92e4ef3ba008a9"},"source":{"id":"2605.18442","kind":"arxiv","version":1},"verdict":{"id":"325bdb3a-f113-4232-92aa-43be9f2cdf1e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T23:34:37.366810Z","strongest_claim":"The proposed GC-SSF generalizes better to mismatched geometries while maintaining high spatial selectivity, as demonstrated by experimental results across circular, uniform linear, and random microphone arrays.","one_line_summary":"GC-SSF with DOA-MPE feature generalizes target speaker extraction to mismatched microphone array geometries while preserving spatial selectivity.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the geometry-conditioning branch using FiLM layers and the DOA-MPE feature can effectively capture and apply the spatial relationship between microphone positions and target speaker direction to adapt the SSF filtering process.","pith_extraction_headline":"Geometry conditioning lets a spatially selective filter generalize target speaker extraction across different microphone array shapes."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.18442/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-20T00:01:20.209277Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"citation_quote_validity","ran_at":"2026-05-19T23:49:44.751993Z","status":"completed","version":"0.1.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T23:40:55.267593Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T23:33:25.268095Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"external_links","ran_at":"2026-05-19T23:31:24.644715Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T23:21:58.630334Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"cited_work_retraction","ran_at":"2026-05-19T23:21:57.862490Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"17637044a0095525754fd3a94fdfc67aa8ed5b4cfe7df0164aa02d8e9b261978"},"references":{"count":33,"sample":[{"doi":"","year":null,"title":"INTRODUCTION Extracting a target speaker from a mixture of speakers and background noise remains a fundamental challenge in acoustic signal processing [1]. To discriminate the target speaker from the ","work_id":"838b9174-b2aa-46c0-b229-49e056076a03","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters","work_id":"6b8f43e4-5a3f-4bd1-99ac-f2b63a99bb12","ref_index":2,"cited_arxiv_id":"2605.18442","is_internal_anchor":true},{"doi":"","year":null,"title":"SP A TIALL Y SELECTIVE NON-LINEAR FIL TER In this section, we review the spatially selective non-linear filter (SSF) for target speaker extraction [8], which serves as the baseline system. In the shor","work_id":"7cb93d62-027e-4f54-a7c4-1f1bb251a9a6","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"PROPOSED GEOMETRY -CONDITIONED SP A TIALL Y SELECTIVE NON-LINEAR FIL TER To improve the generalization ability of the SSF system across different microphone array geometries for a fixed number of micr","work_id":"9b41481c-2f49-4274-ae58-e3ce0c5767d6","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"EXPERIMENTS This section first presents the experimental setup, including the training and evaluation datasets, the network structure, and the training procedure. Then, the experimental results are pr","work_id":"5a9808c9-eba0-4e4b-8b58-14840e72577f","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":33,"snapshot_sha256":"a3e6ab924cca1483657d6601fc49536bac38249227ffbea3a74470f3d743ddea","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}