Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration
Pith reviewed 2026-06-27 14:58 UTC · model grok-4.3
The pith
MAP-Elites with CPPNs, DSP graphs and a classifier produces diverse innovative synthetic sounds across durations and contexts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that CPPN and DSP graphs coupled with MAP-Elites and a deep learning classifier generate a substantial variety of synthetic sounds that are diverse and innovative across temporal and contextual dimensions.
What carries the argument
MAP-Elites algorithm that fills a multi-dimensional archive of phenotypic elites, with behavior dimensions that include sound duration and musical versus non-musical context, and quality supplied by the classifier.
If this is right
- Solutions specialize in separate temporal niches when sound duration is included in the behavior space.
- Lineages reach musical sounds by traversing non-musical stepping stones.
- Multiple specialized CPPNs achieve performance comparable to single larger networks.
- The generated sounds can be used directly in composition experiments across varied durations and contexts.
Where Pith is reading between the lines
- Composers could treat the resulting archive as a source of starting material rather than building every sound from scratch.
- The observed goal-switching paths might suggest initialization strategies that accelerate discovery in other generative domains.
- Further expansion of the behavior space could expose additional niches that current single-context searches miss.
Load-bearing premise
The supervised classifier supplies a quality signal that reliably matches human notions of musical usefulness or innovation.
What would settle it
If side-by-side listening tests show that the sounds archived by the QD system are rated no more diverse or innovative than sounds produced by the same synthesis methods without the archive or classifier, the central claim would be falsified.
read the original abstract
This study addresses the challenges composers and sound designers face in creating and refining tools to achieve their musical goals. Using evolutionary processes to promote diversity and foster serendipitous discoveries, we automate the search through uncharted sonic spaces for sound discovery, arguing that diversity-promoting algorithms can bridge the gap between the theoretical realisation and practical accessibility of sounds. We describe a system for generative sound synthesis combining Quality Diversity (QD) algorithms with a supervised discriminative model, inspired by the Innovation Engine algorithm, and explore different configurations and the interplay between the chosen synthesis approach and the discriminative model. We examine the interaction between Compositional Pattern Producing Networks (CPPNs) and Digital Signal Processing (DSP) graphs, introducing a novel approach that uses multiple specialised CPPNs for different frequency ranges; this yields simpler networks while maintaining performance comparable to single-CPPN setups. We also investigate evolutionary stepping stones by analysing goal switches between musical and non-musical contexts, revealing how lineages traverse unlikely paths to current elites. Expanding the behaviour space of a previous study to include various sound durations, we uncover specialisation within temporal niches. Results indicate that CPPN and DSP graphs coupled with a Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) and a deep learning classifier can generate a substantial variety of synthetic sounds, diverse and innovative across temporal and contextual dimensions. We present the generated sound objects through an online explorer and as rendered sound files, and, in the context of music composition, an experimental application that showcases their creative potential across various durations and contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes a Quality-Diversity system for generative sound synthesis that couples Compositional Pattern Producing Networks (CPPNs) and DSP graphs with MAP-Elites and a supervised deep-learning classifier, inspired by Innovation Engines. It introduces a multi-CPPN architecture specialized by frequency range, analyzes evolutionary stepping-stone trajectories across musical/non-musical contexts, and demonstrates temporal niche specialization when the behaviour space is expanded to include variable sound durations. The central empirical claim is that the resulting archives produce a substantial variety of synthetic sounds that are diverse and innovative across temporal and contextual dimensions, with outputs released via an online explorer and rendered files for compositional use.
Significance. If the empirical results hold, the work supplies a practical exploration tool that automates serendipitous discovery in audio spaces while making the generated objects directly accessible. The multi-CPPN frequency specialization and the stepping-stone analysis constitute concrete methodological contributions that could be adopted in other QD audio applications. The public release of the explorer and sound files is a clear strength for reproducibility and creative uptake.
major comments (1)
- [Method (discriminative model) and Results] The quality signal supplied by the supervised discriminative model is load-bearing for the claim that the generated sounds are 'innovative' in a musically useful sense, yet the manuscript provides no human listening tests or correlation analysis between classifier scores and perceptual judgments of musical quality or novelty. This assumption is stated in the abstract and method description but is not empirically tested.
minor comments (2)
- [Abstract] The abstract refers to 'Innovation Engines' without a brief parenthetical definition or citation; this should be clarified for readers outside the QD community.
- [Figures and Tables] Figure captions and table headers should explicitly state the number of independent runs and any statistical tests used to support claims of 'comparable performance' between single- and multi-CPPN configurations.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address the single major comment below.
read point-by-point responses
-
Referee: [Method (discriminative model) and Results] The quality signal supplied by the supervised discriminative model is load-bearing for the claim that the generated sounds are 'innovative' in a musically useful sense, yet the manuscript provides no human listening tests or correlation analysis between classifier scores and perceptual judgments of musical quality or novelty. This assumption is stated in the abstract and method description but is not empirically tested.
Authors: We agree that the manuscript does not include human listening tests or a correlation analysis validating that classifier scores align with perceptual judgments of musical quality or novelty. The discriminative model serves as a proxy quality signal, trained on labeled musical versus non-musical audio, following the Innovation Engine approach. This constitutes an untested assumption in the current work. We will revise the manuscript to explicitly acknowledge this limitation in the method and discussion sections and identify perceptual validation as future work. revision: yes
Circularity Check
No significant circularity detected
full rationale
This paper is an empirical demonstration study that applies established algorithms (MAP-Elites, CPPNs, DSP graphs, supervised deep learning classifier) to sound generation and reports observed outcomes such as archive coverage, stepping-stone lineages, and temporal specialisation. No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citation load-bearing premises within the paper. The central claims rest on experimental results and external benchmarks rather than internal redefinitions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Princeton University Press, Princeton, New Jersey (2023)
Noë, A.: The Entanglement : How Art and Philosophy Make Us What We Are. Princeton University Press, Princeton, New Jersey (2023). ISBN: 9780691188812 Place: Princeton, New Jersey
2023
-
[2]
Organised Sound8(3), 237–247 (2003) https://doi.org/10.1017/S1355771803000219 20
Wyse, L.: Free music and the discipline of sound. Organised Sound8(3), 237–247 (2003) https://doi.org/10.1017/S1355771803000219 20
-
[3]
Evolutionary Computation19(3), 373–403 (2011) https://doi.org/10.1162/EVCO_a_00030
Secretan, J., Beato, N., D’Ambrosio, D.B., Rodriguez, A., Campbell, A., Folsom- Kovarik, J.T., Stanley, K.O.: Picbreeder: a case study in collaborative evolution- ary exploration of design space. Evolutionary Computation19(3), 373–403 (2011) https://doi.org/10.1162/EVCO_a_00030
-
[4]
Lehman,J.,Stanley,K.O.:AbandoningObjectives:EvolutionThroughtheSearch for Novelty Alone. Evolutionary Computation19(2), 189–223 (2011) https://doi. org/10.1162/EVCO_a_00025 . Conference Name: Evolutionary Computation
-
[5]
Lehman, J., Stanley, K.O.: Evolving a diversity of creatures through novelty searchandlocalcompetition.GeneticandEvolutionaryComputationConference, GECCO’11 (Gecco), 211–218 (2011) https://doi.org/10.1145/2001576.2001606 . ISBN: 9781450305570
-
[6]
Mouret, J.-B., Clune, J.: Illuminating search spaces by mapping elites. arXiv. arXiv:1504.04909 [cs, q-bio] (2015). https://doi.org/10.48550/arXiv.1504.04909
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1504.04909 2015
-
[7]
Frontiers in Robotics and AI3(2016) https://doi
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI3(2016) https://doi. org/10.3389/frobt.2016.00040
-
[8]
Cully, A., Demiris, Y.: Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation22(2), 245–259 (2018) https://doi.org/10.1109/TEVC.2017.2704781
-
[9]
Gaier, A., Asteroth, A., Mouret, J.B.: Are quality diversity algorithms better at generating stepping stones than objective-based search? In: GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion, pp. 115–116 (2019). https://doi.org/10.1145/3319619. 3321897
-
[10]
Frontiers in Robotics and AI8, 56 (2021) https://doi.org/10.3389/frobt.2021.639173
Nordmoen, J., Veenstra, F., Ellefsen, K.O., Glette, K.: MAP-Elites enables pow- erful stepping stones and diversity for modular robotics. Frontiers in Robotics and AI8, 56 (2021) https://doi.org/10.3389/frobt.2021.639173
-
[11]
In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation
Nguyen, A.M., Yosinski, J., Clune, J.: Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO ’15, pp. 959–966. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2739480.2754703
-
[12]
Evolutionary Computation24(3), 545–572 (2016) https://doi.org/10.1162/EVCO_a_00189
Nguyen,A.,Yosinski,J.,Clune,J.:Understandinginnovationengines:Automated creativity and improved stochastic optimization via deep learning. Evolutionary Computation24(3), 545–572 (2016) https://doi.org/10.1162/EVCO_a_00189
-
[13]
Stanley, K.O.: Compositional pattern producing networks: A novel abstraction of development. Genetic Programming and Evolvable Machines8(2), 131–162 21 (2007) https://doi.org/10.1007/s10710-007-9028-8
-
[14]
Proceedings of the IEEE89(9), 1275–1296 (2001) https://doi.org/10.1109/5.949485
Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proceedings of the IEEE89(9), 1275–1296 (2001) https://doi.org/10.1109/5.949485 . Conference Name: Proceedings of the IEEE
-
[16]
Journal of the Audio Engineering Society72(4) (2024) https://doi.org/10.17743/jaes.2022.0137
Jónsson, B.T., Erdem, C., Glette, K.: A System for Sonic Explorations with Evolutionary Algorithms. Journal of the Audio Engineering Society72(4) (2024) https://doi.org/10.17743/jaes.2022.0137
-
[17]
Academic Press, ??? (1995)
Moore, B.C.J.: Hearing. Academic Press, ??? (1995). ISBN: 0125056265 Place: San Diego, Calif Series: Handbook of perception and cognition (2nd ed.)
1995
-
[18]
In: Ystad, S., Kronland- Martinet, R., Jensen, K
Godøy, R.I.: Chunking Sound for Musical Analysis. In: Ystad, S., Kronland- Martinet, R., Jensen, K. (eds.) Computer Music Modeling and Retrieval. Genesis of Meaning in Sound and Music. Lecture Notes in Computer Science, pp. 67–80. Springer,Berlin,Heidelberg(2009).https://doi.org/10.1007/978-3-642-02518-1_ 4
-
[19]
Evolutionary Computation10(2), 99–127 (2002) https://doi.org/10
Stanley, K.O., Miikkulainen, R.: Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation10(2), 99–127 (2002) https://doi.org/10. 1162/106365602320169811
2002
-
[20]
In: Johnson, C., Rebelo, S.M., Santos, I
Jónsson, B.T., Erdem, C., Fasciani, S., Glette, K.: Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. In: Johnson, C., Rebelo, S.M., Santos, I. (eds.) Artificial Intelligence in Music, Sound, Art And Design vol. 14633, pp. 211–227. Springer, Cham (2024). https://doi.org/10. 1007/978-3-031-56992-0_14 . Series Title: Lectur...
-
[21]
Jónsson, B.T., Glette, K., Erdem, C., Fasciani, S.: Supporting Data for: Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. DataverseNO (2024). https://doi.org/10.18710/BAX9N5
-
[22]
Jónsson, B.T., Erdem, E. Cagri, Fasciani, S., Glette, K.: Extended Data for: Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration (2024). https://doi.org/10.18710/4FBT38
-
[23]
Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., Ritter, M.: Audio Set: An ontology and human-labeled dataset 22 for audio events. In: Proc. IEEE ICASSP 2017, New Orleans, LA (2017). https: //doi.org/10.1109/ICASSP.2017.7952261
-
[24]
Deep Residual Learning for Image Recognition
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848 . ISSN: 1063-6919
-
[25]
In: In arXiv E-prints: 2304.12521 (2023)
Choi, K., Im, J., Heller, L., McFee, B., Imoto, K., Okamoto, Y., Lagrange, M., Takamichi, S.: Foley Sound Synthesis at the DCASE 2023 Challenge. In: In arXiv E-prints: 2304.12521 (2023). https://doi.org/10.48550/arXiv.2304.12521
-
[26]
In: Proceedings of the Seventh International Conference on Computational Creativity : ICCC 2016
Lehman, J., Risi, S., Clune, J.: Creative Generation of 3D Objects with Deep Learning and Innovation Engines. In: Proceedings of the Seventh International Conference on Computational Creativity : ICCC 2016. 7, pp. 180–187. Sony CSL Paris, Paris, France (2016)
2016
-
[27]
Master’s thesis, The University of Oklahoma (May 2015)
Rice, D.: GenSynth: Collaboratively Evolving Novel Synthetic Musical Instru- ments. Master’s thesis, The University of Oklahoma (May 2015). https://doi.org/ 10.13140/RG.2.1.4691.6001
-
[28]
XRDS: Crossroads, The ACM Magazine for Students26(4), 54–59 (2020) https://doi.org/10.1145/3398459
Pathak, A.: Introduction to Git for beginners. XRDS: Crossroads, The ACM Magazine for Students26(4), 54–59 (2020) https://doi.org/10.1145/3398459 . Accessed 2024-08-16
-
[29]
Jónsson, B.T.: synth-is/kromosynth. Zenodo (2024). https://doi.org/10.5281/ ZENODO.13342452 . https://zenodo.org/doi/10.5281/zenodo.13342452 Accessed 2024-08-19
-
[30]
Jónsson, B.T.: synth-is/kromosynth-cli. Zenodo (2024). https://doi.org/10.5281/ ZENODO.13342465 . https://zenodo.org/doi/10.5281/zenodo.13342465 Accessed 2024-08-19
-
[31]
Jónsson, B.T.: synth-is/kromosynth-evaluate. Zenodo (2024). https://doi.org/ 10.5281/ZENODO.13342462 . https://zenodo.org/doi/10.5281/zenodo.13342462 Accessed 2024-08-19
-
[32]
Jónsson, B.T.: synth-is/kromosynth-render. Zenodo (2024). https://doi.org/ 10.5281/ZENODO.13342466 . https://zenodo.org/doi/10.5281/zenodo.13342466 Accessed 2024-08-19
-
[33]
In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation
Pugh, J.K., Soros, L.B., Szerlip, P.A., Stanley, K.O.: Confronting the Challenge of Quality Diversity. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO ’15, pp. 967–974. Association for Com- puting Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2739480. 2754664 23
-
[34]
Stanley, K.O., Lehman, J.: Why Greatness Cannot Be Planned. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15524-1 . http://link.springer.com/10.1007/978-3-319-15524-1Accessed 2022-08-30
-
[35]
In: Proceedings of the International Computer Music Conference, pp
Garber, L., Ciccola, T., Amusategui, J.: AudioStellar, an open source corpus- based musical instrument for latent sound structure discovery and sonic experi- mentation. In: Proceedings of the International Computer Music Conference, pp. 62–67 (2021)
2021
-
[36]
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
Nguyen, A., Yosinski, J., Clune, J.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. arXiv, ??? (2015). https: //doi.org/10.48550/arXiv.1412.1897 . arXiv:1412.1897 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.1897 2015
-
[37]
Gong, Y., Lai, C.-I.J., Chung, Y.-A., Glass, J.: SSAST: Self-Supervised Audio Spectrogram Transformer. arXiv. arXiv:2110.09784 [cs, eess] (2022). https://doi. org/10.48550/arXiv.2110.09784
-
[38]
Huang, P.-Y., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., Metze, F., Feichtenhofer, C.: Masked Autoencoders that Listen. In: NeurIPS (2022). https: //doi.org/10.48550/arXiv.2207.06405
-
[39]
In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M
McCormack, J., Cruz Gambardella, C.: Quality-Diversity for Aesthetic Evolution. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds.) Artificial Intelli- gence in Music, Sound, Art And Design. Lecture Notes in Computer Science, pp. 369–384.Springer,Cham(2022).https://doi.org/10.1007/978-3-031-03789-4_24
-
[40]
Auto-Encoding Variational Bayes
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014). https://doi.org/10.48550/arXiv.1312.6114
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1312.6114 2014
-
[41]
McCormack, J., Gambardella, C.C., Krol, S.J.: Creative Discovery using QD Search. arXiv. arXiv:2305.04462 [cs] (2023). https://doi.org/10.48550/arXiv. 2305.04462
work page internal anchor Pith review doi:10.48550/arxiv 2023
-
[42]
In: Proceedings of the Genetic and Evolutionary Computation Con- ference, pp
Cully, A.: Autonomous skill discovery with quality-diversity and unsupervised descriptors. In: Proceedings of the Genetic and Evolutionary Computation Con- ference, pp. 81–89. ACM, Prague Czech Republic (2019). https://doi.org/10. 1145/3321707.3321804
arXiv 2019
-
[43]
Grillotti, L., Cully, A.: Unsupervised Behavior Discovery With Quality-Diversity Optimization. IEEE Transactions on Evolutionary Computation26(6), 1539– 1552 (2022) https://doi.org/10.1109/TEVC.2022.3159855
-
[44]
Ding, L., Zhang, J., Clune, J., Spector, L., Lehman, J.: Quality Diversity through Human Feedback. arXiv. arXiv:2310.12103 [cs] (2023). https://doi.org/10.48550/ arXiv.2310.12103 24
arXiv 2023
-
[45]
Bloomsbury Academic, New York, NY (2019)
Magnusson, T.: Sonic Writing: Technologies of Material, Symbolic and Signal Inscriptions. Bloomsbury Academic, New York, NY (2019)
2019
-
[46]
Pennsylvania State University Press, ??? (1996)
Davis, W.: Replications : Archaeology, Art History, Psychoanalysis. Pennsylvania State University Press, ??? (1996). ISBN: 0271015233 Place: University Park, Penn 25
1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.