FXplorer: A Map-Based Interface for Exploratory Audio Effect Design

Annie Chu; Bryan Pardo; Jason Brent Smith

arxiv: 2606.08286 · v1 · pith:U6WME7BFnew · submitted 2026-06-06 · 💻 cs.SD

FXplorer: A Map-Based Interface for Exploratory Audio Effect Design

Annie Chu , Jason Brent Smith , Bryan Pardo This is my paper

Pith reviewed 2026-06-27 19:06 UTC · model grok-4.3

classification 💻 cs.SD

keywords audio effectsexploratory interface2D mappingmachine learning embeddingssound designDAW controlspreset interpolationperceptual space

0 comments

The pith

FXplorer places audio effect presets in a perceptually informed 2D space so users can browse transformations as a continuous landscape and interpolate between them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FXplorer as an interface that arranges audio effects in a two-dimensional map derived from perceptual similarity. This map lets users move through sound transformations fluidly instead of selecting from separate modules. The system merges spatial navigation, standard DAW parameter controls, and machine learning embeddings that capture how effects relate to each other. By keeping exploration and refinement in the same workspace, it aims to help composers and producers develop intuition about the full range of possible sound changes. If successful, the approach would reduce the separation between searching for effects and adjusting their details.

Core claim

FXplorer organizes audio effects within a perceptually informed 2D space, allowing sound transformations to be browsed as a continuous landscape rather than as isolated presets. By combining established spatial interaction approaches and interpretable DAW-style controls with recent embedding-based machine learning methods for similarity and semantic search, the system brings exploration and parameter refinement into a single workspace. FXplorer supports composition, production, or performance by allowing users to edit and interpolate between effect presets interactively.

What carries the argument

The perceptually informed 2D space that embeds audio effects via machine learning similarity measures, enabling spatial browsing and interpolation alongside DAW-style controls.

If this is right

Sound transformations become continuous rather than discrete, so users can glide between presets instead of switching modules.
Editing and searching occur in one view, removing the need to alternate between separate search and adjustment panels.
Real-time interpolation in the map supports live performance or iterative composition without resetting parameters.
Semantic search integrated into the spatial view lets users locate effects by description while staying in the same workspace.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same 2D embedding approach could be tested on other parameter-rich creative tools such as synthesizer patches or visual filter banks.
Dynamic updating of the map from user listening data might further reduce the gap between individual taste and the displayed layout.
If the map proves stable across genres, it could serve as a shared reference for teaching sound design concepts to beginners.

Load-bearing premise

Embedding methods for audio similarity will produce a navigable 2D map that combines effectively with spatial interaction and existing DAW controls without creating new usability problems.

What would settle it

A controlled user study in which participants using FXplorer complete fewer successful sound designs or report higher cognitive load than participants using a conventional list-based DAW interface with the same effects.

Figures

Figures reproduced from arXiv: 2606.08286 by Annie Chu, Bryan Pardo, Jason Brent Smith.

**Figure 1.** Figure 1: FXplorer’s Explore Mode: Users primarily navigate via mouse (hover, click, pan, zoom) and keyboard shortcuts. Inspector panel (i) for parameter editing, assigning interpolation endpoints, and viewing exact parameters for the currently selected effect variant. Floating control panel (ii) for embedding mode switching (CLAP/AFx-Rep), hover playback settings, and semantic search inputs. Waveform monitor (iii) … view at source ↗

**Figure 2.** Figure 2: Overview of interaction workflows in FXplorer. From an input audio source, the system constructs a semantic variant space that users navigate to explore alternatives, select and interpolate between variants, iteratively edit outcomes, and save results, supporting both exploratory (divergent) and refinement (convergent) processes. 4.1 Preparing the Exploration Space: Generating Variants (DR1) Users may beg… view at source ↗

**Figure 5.** Figure 5: Interpolation Mode interface. Two samples from [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 3.** Figure 3: Sample generation in FXplorer. Users upload dry audio, select an exploration mode (single, effect chain, or macro groups), choose effects to include and specify 𝑛 samples to generate for each configuration [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Semantic Entry Point Example (text): Results [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Audio effects (FX) shape sound in contemporary music practice. However, most interfaces present them as discrete modules and parameters that favor targeted adjustment over exploratory listening. This separation can make it difficult to build intuition about the broader space of possible transformations or to move fluidly between searching and refinement. We present FXplorer, an interface that organizes audio effects within a perceptually informed 2D space, allowing sound transformations to be browsed as a continuous landscape rather than as isolated presets. By combining established spatial interaction approaches and interpretable DAW-style controls with recent embedding-based machine learning methods for similarity and semantic search, the system brings exploration and parameter refinement into a single workspace. FXplorer supports composition, production, or performance by allowing users to edit and interpolate between effect presets interactively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FXplorer is a clean system concept for mapping audio FX via embeddings into a 2D explorable space, but the paper offers no data or implementation details to show the map or interpolation actually helps users.

read the letter

The paper's main contribution is a proposed interface that places audio effect presets into a 2D perceptual map using embedding-based ML for similarity and semantic search, then lets users browse, edit, and interpolate between them with familiar DAW-style controls. This aims to collapse the usual split between searching for effects and refining parameters.

It does a solid job naming the real workflow friction in current tools: discrete modules push users toward targeted tweaks rather than open listening. The integration of spatial navigation with parameter interpolation is a reasonable next step from existing work on embeddings and map interfaces.

The weakness is that none of the central claims are tested. There are no user studies, no perceptual validation that the 2D neighborhoods match how musicians actually hear similarity, and no checks on whether linear interpolation between presets produces usable audio or just artifacts. The abstract and description stay at the level of system concept without technical specifics on the embedding model, training data, or how the map is built and updated.

This leaves the usability assumptions unexamined: whether the map reduces disorientation, whether interpolation feels musical, or whether the combined controls create new problems. The work is internally consistent as a proposal, but the evidence is missing.

It would interest people building creative audio tools or studying music HCI. A reader looking for evaluated interfaces or reproducible methods will not get much. I would not bring it to a reading group or cite it as is. It does not yet deserve peer review; the authors would need at least a working prototype with some form of validation before it makes sense to send out.

Referee Report

2 major / 0 minor

Summary. The paper presents FXplorer, a map-based interface for exploratory audio effect design. It organizes audio effects within a perceptually informed 2D space using embedding-based machine learning methods for similarity and semantic search. The system combines this spatial organization with spatial interaction techniques and DAW-style controls to enable users to browse effects as a continuous landscape, edit presets, and interpolate between them interactively, with the goal of integrating exploration and parameter refinement into a single workspace to support composition, production, or performance.

Significance. If the perceptual embedding produces musically meaningful neighborhoods and the interface proves usable without introducing disorientation or loss of control, the work could meaningfully advance creative audio tools by bridging discrete preset browsing with continuous exploration. The combination of ML embeddings with established spatial and DAW interaction paradigms is a coherent synthesis that targets a documented limitation in current interfaces. However, the manuscript contains no implementation details, perceptual validation, or usage evidence, so any significance remains prospective rather than demonstrated.

major comments (2)

[Abstract] Abstract: The claim that FXplorer 'supports composition, production, or performance by allowing users to edit and interpolate between effect presets interactively' is load-bearing for the paper's contribution yet is unsupported; the manuscript supplies no user studies, perceptual validation of the 2D embedding, quantitative measures of interpolation quality, or comparisons against existing DAW workflows.
[Abstract] Abstract: The assertion that the space is 'perceptually informed' via 'embedding-based machine learning methods' is central to the interface rationale but receives no technical elaboration on model choice, training corpus, similarity metric, or any validation that neighborhoods correspond to audible similarity rather than abstract embedding proximity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The manuscript presents the design of FXplorer as a novel interface concept. We address each major comment below, acknowledging where the abstract overstates the current evidence and indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that FXplorer 'supports composition, production, or performance by allowing users to edit and interpolate between effect presets interactively' is load-bearing for the paper's contribution yet is unsupported; the manuscript supplies no user studies, perceptual validation of the 2D embedding, quantitative measures of interpolation quality, or comparisons against existing DAW workflows.

Authors: We agree that the manuscript contains no user studies, perceptual validations, quantitative measures, or workflow comparisons. The paper's contribution is the interface design that integrates spatial organization with DAW-style controls. We will revise the abstract to state that FXplorer is designed to support composition, production, or performance by enabling interactive editing and interpolation, framing the claim as prospective rather than demonstrated. revision: yes
Referee: [Abstract] Abstract: The assertion that the space is 'perceptually informed' via 'embedding-based machine learning methods' is central to the interface rationale but receives no technical elaboration on model choice, training corpus, similarity metric, or any validation that neighborhoods correspond to audible similarity rather than abstract embedding proximity.

Authors: The current abstract does not elaborate on these technical aspects. We will revise the abstract to briefly specify the embedding model, training corpus, and similarity metric employed. We will also qualify 'perceptually informed' to indicate that it derives from embeddings trained on audio similarity data rather than claiming direct perceptual validation, which is not present in the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; system proposal with no derivations or fitted results

full rationale

The paper is a high-level description of an interface (FXplorer) that combines spatial interaction, DAW controls, and embedding-based ML methods. No equations, parameter fitting, predictions, or derivation chains are present in the provided text. The central claim is a design proposal rather than a mathematical result that reduces to its inputs. No self-citations or uniqueness theorems are invoked in a load-bearing way. This matches the default expectation of no circularity for non-derivational papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no technical details on parameters, axioms, or entities are available.

pith-pipeline@v0.9.1-grok · 5656 in / 1014 out tokens · 16296 ms · 2026-06-27T19:06:55.785682+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 3 canonical work pages

[1]

Antoine Caillon and Philippe Esling. 2021. RAVE: A variational autoen- coder for fast and high-quality neural audio synthesis.arXiv preprint arXiv:2111.05011(2021)

arXiv 2021
[2]

Annie Chu, Patrick O’Reilly, Julia Barnett, and Bryan Pardo. 2025. Text2fx: Harnessing clap embeddings for text-guided audio effects. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

2025
[3]

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet, and Nicola Conci
[4]

InProceedings of the 28-th Int

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals. InProceedings of the 28-th Int. Conf. on Digital Audio Effects (DAFx25)(Ancona, Italy), L. Gabrielli and S. Cecchi (Eds.)
[5]

Stefano Delle Monache, Nicolas Misdariis, and Elif Özcan. 2022. Semantic models of sound-driven design: Designing with listening in mind.Design Studies83 (2022), 101134

2022
[6]

Seungheon Doh, Junghyun Koo, Marco A Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam, and Yuki Mitsufuji. 2025. Can Large Language Models Predict Audio Effects Parameters from Natural Language?arXiv preprint arXiv:2505.20770(2025)

arXiv 2025
[7]

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. 2017. Neural audio synthesis of musical notes with wavenet autoencoders. InInternational conference on machine learning. PMLR, 1068–1077

2017
[8]

Frederic Font and Giuseppe Bandiera. 2017. Freesound explorer: make music while discovering freesound!. InProceedings of the 3rd Web Audio Conference

2017
[9]

Ohad Fried, Zeyu Jin, Reid Oda, and Adam Finkelstein. 2014. AudioQuilt: 2D Arrangements of Audio Samples using Metric Learning and Kernelized Sorting. InProceedings of the International Conference on New Interfaces for Musical Expression. Goldsmiths, University of London, London, United Kingdom, 281–

2014
[10]

https://doi.org/10.5281/zenodo.1178766

work page doi:10.5281/zenodo.1178766
[11]

Leandro Garber, Tomás Ciccola, and Juan Cruz Amusategui. 2020. AudioStellar, an open source corpus-based musical instrument for latent sound structure discovery and sonic experimentation. InProceedings of ICMC

2020
[12]

L. H. Hantrakul. 2017. lamtharnhantrakul/klustr. [Online]. https://github.c om/lamtharnhantrakul/klustr

2017
[13]

Aaron Hertzmann. 2022. Toward Modeling Creative Processes for Algorithmic Painting. InICCC. https://api.semanticscholar.org/CorpusId:248506151

2022
[14]

Hojoon Ki, Jongsuk Kim, Minchan Kwon, and Junmo Kim. 2026. FxSearcher: gradient-free text-driven audio transformation. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 15462–15466

2026
[15]

Alexander Lunt and Sebastian Trump. 2023. Latent Space Explorer.AIMC 2023(aug 29 2023). https://aimc2023.pubpub.org/pub/zgc5j7ha

2023
[16]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of Machine Learning Research9, Nov (2008), 2579–2605

2008
[17]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform mani- fold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

Pith/arXiv arXiv 2018
[18]

Jason Naradowsky. 2021. Amp-space: A large-scale dataset for fine-grained timbre transformation. In2021 24th International Conference on Digital Audio Effects (DAFx). IEEE, 57–64

2021
[19]

Bryan Pardo, Mark Cartwright, Prem Seetharaman, and Bongjun Kim. 2019. Learning to build natural audio production interfaces. InArts, Vol. 8. MDPI, 110

2019
[20]

Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space.The London, Edinburgh, and Dublin philosophical magazine and journal of science2, 11 (1901), 559–572

1901
[21]

Gerard Roma, Owen Green, and Pierre Alexandre Tremblay. 2019. Adaptive Mapping of Sound Collections for Data-driven Musical Interfaces. InProceed- ings of the International Conference on New Interfaces for Musical Expression, Marcelo Queiroz and Anna Xambó Sedó (Eds.). UFRGS, Porto Alegre, Brazil, 313–318. https://doi.org/10.5281/zenodo.3672976

work page doi:10.5281/zenodo.3672976 2019
[22]

Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. 2006. Real-time corpus-based concatenative synthesis with catart. In9th Interna- tional Conference on Digital Audio Effects (DAFx). 279–282

2006
[23]

Prem Seetharaman and Bryan Pardo. 2016. Audealize: Crowdsourced audio production tools.Journal of the Audio Engineering Society64, 9 (2016), 683–695

2016
[24]

Spyridon Stasis, Nicholas Jillings, Sean Enderby, and Ryan Stables. 2017. Audio processing chain recommendation. InProceedings of the 20th International Conference on Digital Audio Effects,(Edinburgh, UK)

2017
[25]

Christian J Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, and Joshua D Reiss. 2024. St-ito: Controlling audio effects for style transfer with inference-time optimization.arXiv preprint arXiv:2410.21233(2024)

arXiv 2024
[26]

Manny Tan and Kyle McDonald. 2017. Infinite Drum Machine. [Online] https://experiments.withgoogle.com/ai/drum-machine/

2017
[27]

Robert Tubb and Simon Dixon. 2014. The Divergent Interface: Supporting Creative Exploration of Parameter Spaces. InProceedings of the International Conference on New Interfaces for Musical Expression. Goldsmiths, University of London, London, United Kingdom, 227–232. https://doi.org/10.5281/zenodo .1178967

work page doi:10.5281/zenodo 2014
[28]

Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2023. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. InICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

2023
[29]

XLN Audio. 2021. XO - XLN Audio. [Online]. https://www.xlnaudio.com/pro ducts/xo

2021
[30]

Ruihan Yang, Tianyao Chen, Yiyi Zhang, and Gus Xia. 2019. Inspecting and interacting with meaningful music representations using VAE.arXiv preprint arXiv:1904.08842(2019)

Pith/arXiv arXiv 2019
[31]

Shuoyang Jasper Zheng, Anna Xambó Sedó, and Nick Bryan-Kinns. 2025. Exploring gestural affordances in audio latent space navigation.Frontiers in Computer Science7 (2025), 1575202

2025

[1] [1]

Antoine Caillon and Philippe Esling. 2021. RAVE: A variational autoen- coder for fast and high-quality neural audio synthesis.arXiv preprint arXiv:2111.05011(2021)

arXiv 2021

[2] [2]

Annie Chu, Patrick O’Reilly, Julia Barnett, and Bryan Pardo. 2025. Text2fx: Harnessing clap embeddings for text-guided audio effects. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

2025

[3] [3]

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet, and Nicola Conci

[4] [4]

InProceedings of the 28-th Int

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals. InProceedings of the 28-th Int. Conf. on Digital Audio Effects (DAFx25)(Ancona, Italy), L. Gabrielli and S. Cecchi (Eds.)

[5] [5]

Stefano Delle Monache, Nicolas Misdariis, and Elif Özcan. 2022. Semantic models of sound-driven design: Designing with listening in mind.Design Studies83 (2022), 101134

2022

[6] [6]

Seungheon Doh, Junghyun Koo, Marco A Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam, and Yuki Mitsufuji. 2025. Can Large Language Models Predict Audio Effects Parameters from Natural Language?arXiv preprint arXiv:2505.20770(2025)

arXiv 2025

[7] [7]

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. 2017. Neural audio synthesis of musical notes with wavenet autoencoders. InInternational conference on machine learning. PMLR, 1068–1077

2017

[8] [8]

Frederic Font and Giuseppe Bandiera. 2017. Freesound explorer: make music while discovering freesound!. InProceedings of the 3rd Web Audio Conference

2017

[9] [9]

Ohad Fried, Zeyu Jin, Reid Oda, and Adam Finkelstein. 2014. AudioQuilt: 2D Arrangements of Audio Samples using Metric Learning and Kernelized Sorting. InProceedings of the International Conference on New Interfaces for Musical Expression. Goldsmiths, University of London, London, United Kingdom, 281–

2014

[10] [10]

https://doi.org/10.5281/zenodo.1178766

work page doi:10.5281/zenodo.1178766

[11] [11]

Leandro Garber, Tomás Ciccola, and Juan Cruz Amusategui. 2020. AudioStellar, an open source corpus-based musical instrument for latent sound structure discovery and sonic experimentation. InProceedings of ICMC

2020

[12] [12]

L. H. Hantrakul. 2017. lamtharnhantrakul/klustr. [Online]. https://github.c om/lamtharnhantrakul/klustr

2017

[13] [13]

Aaron Hertzmann. 2022. Toward Modeling Creative Processes for Algorithmic Painting. InICCC. https://api.semanticscholar.org/CorpusId:248506151

2022

[14] [14]

Hojoon Ki, Jongsuk Kim, Minchan Kwon, and Junmo Kim. 2026. FxSearcher: gradient-free text-driven audio transformation. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 15462–15466

2026

[15] [15]

Alexander Lunt and Sebastian Trump. 2023. Latent Space Explorer.AIMC 2023(aug 29 2023). https://aimc2023.pubpub.org/pub/zgc5j7ha

2023

[16] [16]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of Machine Learning Research9, Nov (2008), 2579–2605

2008

[17] [17]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform mani- fold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

Pith/arXiv arXiv 2018

[18] [18]

Jason Naradowsky. 2021. Amp-space: A large-scale dataset for fine-grained timbre transformation. In2021 24th International Conference on Digital Audio Effects (DAFx). IEEE, 57–64

2021

[19] [19]

Bryan Pardo, Mark Cartwright, Prem Seetharaman, and Bongjun Kim. 2019. Learning to build natural audio production interfaces. InArts, Vol. 8. MDPI, 110

2019

[20] [20]

Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space.The London, Edinburgh, and Dublin philosophical magazine and journal of science2, 11 (1901), 559–572

1901

[21] [21]

Gerard Roma, Owen Green, and Pierre Alexandre Tremblay. 2019. Adaptive Mapping of Sound Collections for Data-driven Musical Interfaces. InProceed- ings of the International Conference on New Interfaces for Musical Expression, Marcelo Queiroz and Anna Xambó Sedó (Eds.). UFRGS, Porto Alegre, Brazil, 313–318. https://doi.org/10.5281/zenodo.3672976

work page doi:10.5281/zenodo.3672976 2019

[22] [22]

Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. 2006. Real-time corpus-based concatenative synthesis with catart. In9th Interna- tional Conference on Digital Audio Effects (DAFx). 279–282

2006

[23] [23]

Prem Seetharaman and Bryan Pardo. 2016. Audealize: Crowdsourced audio production tools.Journal of the Audio Engineering Society64, 9 (2016), 683–695

2016

[24] [24]

Spyridon Stasis, Nicholas Jillings, Sean Enderby, and Ryan Stables. 2017. Audio processing chain recommendation. InProceedings of the 20th International Conference on Digital Audio Effects,(Edinburgh, UK)

2017

[25] [25]

Christian J Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, and Joshua D Reiss. 2024. St-ito: Controlling audio effects for style transfer with inference-time optimization.arXiv preprint arXiv:2410.21233(2024)

arXiv 2024

[26] [26]

Manny Tan and Kyle McDonald. 2017. Infinite Drum Machine. [Online] https://experiments.withgoogle.com/ai/drum-machine/

2017

[27] [27]

Robert Tubb and Simon Dixon. 2014. The Divergent Interface: Supporting Creative Exploration of Parameter Spaces. InProceedings of the International Conference on New Interfaces for Musical Expression. Goldsmiths, University of London, London, United Kingdom, 227–232. https://doi.org/10.5281/zenodo .1178967

work page doi:10.5281/zenodo 2014

[28] [28]

Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2023. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. InICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

2023

[29] [29]

XLN Audio. 2021. XO - XLN Audio. [Online]. https://www.xlnaudio.com/pro ducts/xo

2021

[30] [30]

Ruihan Yang, Tianyao Chen, Yiyi Zhang, and Gus Xia. 2019. Inspecting and interacting with meaningful music representations using VAE.arXiv preprint arXiv:1904.08842(2019)

Pith/arXiv arXiv 2019

[31] [31]

Shuoyang Jasper Zheng, Anna Xambó Sedó, and Nick Bryan-Kinns. 2025. Exploring gestural affordances in audio latent space navigation.Frontiers in Computer Science7 (2025), 1575202

2025