pith. sign in

arxiv: 1907.06637 · v1 · pith:PVUAJCO7new · submitted 2019-07-14 · 💻 cs.SD · cs.HC· cs.LG· eess.AS· stat.ML

The Bach Doodle: Approachable music composition with machine learning at scale

Pith reviewed 2026-05-24 21:17 UTC · model grok-4.3

classification 💻 cs.SD cs.HCcs.LGeess.ASstat.ML
keywords Bach DoodleCoconetmusic harmonizationTensorFlow.jsbrowser deploymentuser datasetinteractive AImusic composition
0
0 comments X

The pith

An optimized Coconet model lets users harmonize melodies in Bach style directly in the browser.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Bach Doodle as the first AI-powered Google Doodle that lets users input melodies through a simplified sheet-music interface and receive harmonizations from the Coconet model in Bach's style. The authors re-implemented the model in TensorFlow.js for browser execution, cutting runtime from 40 seconds to 2 seconds via dilated depth-wise separable convolutions and operation fusing while shrinking the download size to 400KB with post-training quantization. A speed test decides whether to run locally or send to remote servers, enabling the system to handle over 55 million queries in three days with users spending 350 years of collective time. The work also releases a dataset of user compositions and ratings. A sympathetic reader would care because it shows how to deliver interactive music AI to a global audience without special hardware.

Core claim

We designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmoniz at

What carries the argument

Coconet model re-implemented in TensorFlow.js using dilated depth-wise separable convolutions, operation fusion, post-training weight quantization, and a speed-test-based switch between local browser and remote TPU execution.

If this is right

  • The system processed more than 55 million harmonization queries in three days.
  • Users collectively spent 350 years of time interacting with the doodle.
  • A public dataset of user compositions and ratings is released for research.
  • The optimizations demonstrate how to run generative music models interactively at internet scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The hybrid local-remote inference approach could extend to other real-time creative AI tools on the web.
  • The released dataset may support studies of how non-musicians approach melody writing.
  • Similar quantization and convolution changes might enable browser deployment of other sequence models.

Load-bearing premise

That the quantized and re-implemented Coconet model keeps enough musical quality and coherence to produce an engaging experience for users.

What would settle it

If most users gave low ratings to their harmonized outputs or if total engagement time stayed far below 350 years across millions of queries.

read the original abstract

To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js (Smilkov et al., 2019) to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmonization request should be performed locally or sent to remote TPU servers. In three days, people spent 350 years worth of time playing with the Bach Doodle, and Coconet received more than 55 million queries. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing with this paper. We hope that the community finds this dataset useful for applications ranging from ethnomusicological studies, to music education, to improving machine learning models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper describes the Bach Doodle, the first AI-powered Google Doodle, in which users input melodies through a simplified sheet-music interface and receive Bach-style harmonizations from a re-implemented Coconet model running in TensorFlow.js. The authors detail engineering optimizations (dilated depth-wise separable convolutions, operation fusion, post-training quantization to ~400 KB) that reduce runtime from 40 s to 2 s, a speed-test-based hybrid local/remote inference strategy, the resulting engagement (55 M queries, 350 user-years of interaction over three days), and the release of a public dataset of rated user compositions.

Significance. If the optimizations preserve output quality, the work shows that neural music models can be deployed at massive consumer scale via browser engineering, while the released dataset offers a new resource for ethnomusicology, music education, and ML research. The engagement numbers provide concrete evidence of public interest in approachable AI composition tools.

major comments (1)
  1. [Abstract] Abstract (paragraph on model re-implementation and optimizations): the manuscript reports no objective or subjective evaluation of whether the TensorFlow.js re-implementation, dilated depth-wise separable convolutions, operation fusion, or post-training quantization preserved the musical coherence or stylistic fidelity of the original Coconet model (Huang et al., 2017). No negative log-likelihood on held-out chorales, no listening tests, and no side-by-side comparison are provided. This is load-bearing for the central claim that the deployed system delivers coherent Bach-style harmonizations at scale; usage statistics alone cannot isolate model quality from interface novelty.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address the major comment on model quality evaluation below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on model re-implementation and optimizations): the manuscript reports no objective or subjective evaluation of whether the TensorFlow.js re-implementation, dilated depth-wise separable convolutions, operation fusion, or post-training quantization preserved the musical coherence or stylistic fidelity of the original Coconet model (Huang et al., 2017). No negative log-likelihood on held-out chorales, no listening tests, and no side-by-side comparison are provided. This is load-bearing for the central claim that the deployed system delivers coherent Bach-style harmonizations at scale; usage statistics alone cannot isolate model quality from interface novelty.

    Authors: We acknowledge that the manuscript does not include new objective or subjective evaluations (e.g., NLL on held-out data or listening tests) of whether the TensorFlow.js re-implementation and optimizations preserved output quality relative to the original Coconet. The paper's central contributions concern the engineering steps that enabled interactive, browser-based inference at massive scale and the resulting public dataset; model performance was established in the 2017 Coconet work. The architectural modifications (dilated depth-wise separable convolutions) were selected to retain receptive field while lowering compute, and post-training quantization plus operation fusion are standard methods expected to incur only minor fidelity loss. We agree, however, that the absence of explicit verification leaves the quality claim under-supported. In revision we will add a short discussion of these design choices and their expected impact on coherence, update the abstract to clarify the paper's scope, and note that the released user ratings offer only indirect, uncontrolled evidence of perceived quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical deployment report with no derivations or fitted predictions

full rationale

This paper reports on the design, optimization, and deployment of the Bach Doodle using a pre-existing Coconet model (cited from Huang et al. 2017). It describes engineering changes (dilated depth-wise separable convolutions, operation fusion, post-training quantization) and usage statistics (55M queries, 350 years of playtime) as direct observations. No equations, parameter fitting, predictions, or uniqueness theorems are presented that could reduce to inputs by construction. The self-citation is for the original model definition only and is not load-bearing for any new claim. The paper is self-contained as an engineering case study against external benchmarks of runtime and size.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of the pre-existing Coconet model and the assumption that the described engineering changes do not degrade musical quality.

axioms (1)
  • domain assumption The Coconet model generates musically appropriate Bach-style harmonizations for arbitrary user melodies.
    The entire user-facing system depends on this capability of the 2017 model, which is invoked without re-validation in the abstract.

pith-pipeline@v0.9.0 · 5798 in / 1395 out tokens · 36630 ms · 2026-05-24T21:17:04.162943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    For users to input melodies, we de- signed a simplified sheet-music based interface

    ABSTRACT To make music composition more approachable, we de- signed the first AI-powered Google Doodle, the Bach Doo- dle [1], where users can create their own melody and have it harmonized by a machine learning model (Coconet [22]) in the style of Bach. For users to input melodies, we de- signed a simplified sheet-music based interface. To sup- port an int...

  2. [2]

    The Bach Doodle: Approachable music com- position with machine learning at scale

    INTRODUCTION Machine learning can extend our creative abilities by offer- ing generative models that can rapidly fill in missing parts of our composition, allowing us to see a prototype of how a piece could sound. To celebrate J.S. Bach’s 334th birth- day, we designed the Bach Doodle to create an interactive experience where users can rapidly explore diffe...

  3. [3]

    The Bach Doodle: Approachable music composition with machine learning at scale

    RELATED WORK Machine learning has been used in algorithmic music com- position to support a wide range of musical tasks [5, 13, 19, 28, 29]. Melody harmonization is one of the canonical tasks [7, 11, 20, 26], encourages human-computer interac- tion [3, 14, 21, 25, 33], and is particularly approachable for novices. Different interfaces and tools have been ...

  4. [4]

    star” button on the left-hand side of the sheet mu- sic, they can enter “advanced mode

    THE BACH DOODLE 4.1 A walk through of the user experience The Bach Doodle user experience begins by demonstrat- ing 4-part harmony using two measures of a Bach chorale, Ach wie flüchtig, ach wie nichtig, BWV 26 . By playing the soprano line alone, followed by soprano and alto, and then all four voices, users are shown how the harmony enhances the melody. U...

  5. [5]

    TECHNICAL CHALLENGES In order for users to interact with Coconet via a web inter- face, we needed to either port it to run client-side on the user’s device or host the model on a server with sufficient speed and capacity to support the number of requests we were expecting. In fact, we did both: we ported the model to TensorFlow.js (TF.js) so that it could ...

  6. [6]

    We make this entire dataset available at https://g.co/ magenta/bach-doodle-dataset under a Creative Commons license

    DATASET RELEASE AND ANALYSIS 6.1 Data structure Every user who interacted with the Bach Doodle had the opportunity to add their composition to a dataset. We make this entire dataset available at https://g.co/ magenta/bach-doodle-dataset under a Creative Commons license. Of more than 55 million requests, the user contributed dataset contains over 21.6 mill...

  7. [7]

    We hope this encourages more creative apps that allow novices and artists to interact with music composition and machine learning in approachable ways

    CONCLUSION The Bach Doodle enabled large-scale participation in baroque-style counterpoint composition through an intu- itive sheet music interface assisted by machine learning. We hope this encourages more creative apps that allow novices and artists to interact with music composition and machine learning in approachable ways. With this pa- per, we are r...

  8. [8]

    A big shoutout to Pedro Vergani, Rebecca Thomas, Jordan Thompson and others on the Doodle team for their contri- bution to the core components of the Doodle

    ACKNOWLEDGEMENTS Many thanks to Ann Yuan, Daniel Smilkov and Nikhil Thorat from Tensorflow.js for their expert assistance. A big shoutout to Pedro Vergani, Rebecca Thomas, Jordan Thompson and others on the Doodle team for their contri- bution to the core components of the Doodle. Thank you Lauren Hannah-Murphy and Chris Han for keeping us on track. Thank y...

  9. [9]

    https: //www.google.com/doodles/celebrating- johann-sebastian-bach

    Celebrating Johann Sebastian Bach. https: //www.google.com/doodles/celebrating- johann-sebastian-bach. Accessed: 2019-04- 04

  10. [10]

    Harmon- ising chorales by probabilistic inference

    Moray Allan and Christopher KI Williams. Harmon- ising chorales by probabilistic inference. Advances in neural information processing systems , 17:25–32, 2005

  11. [11]

    Omax brothers: a dy- namic yopology of agents for improvization learning

    Gérard Assayag, Georges Bloch, Marc Chemillier, Ar- shia Cont, and Shlomo Dubnov. Omax brothers: a dy- namic yopology of agents for improvization learning. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia , pages 125–132. ACM, 2006

  12. [12]

    Modeling temporal dependencies in high-dimensional sequences: Application to poly- phonic music generation and transcription

    Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to poly- phonic music generation and transcription. Interna- tional Conference on Machine Learning, 2012

  13. [13]

    arXiv preprint arXiv:1709.01620 (2017)

    Jean-Pierre Briot, Gaëtan Hadjeres, and François Pa- chet. Deep learning techniques for music generation-a survey. arXiv preprint arXiv:1709.01620, 2017

  14. [14]

    Xception: Deep learning with depth- wise separable convolutions

    François Chollet. Xception: Deep learning with depth- wise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017

  15. [15]

    A hybrid system for automatic generation of style-specific accompani- ment

    Ching-Hua Chuan and Elaine Chew. A hybrid system for automatic generation of style-specific accompani- ment. In Proceedings of the 4th International Joint Workshop on Computational Creativity, pages 57–64. Goldsmiths, University of London London, 2007

  16. [16]

    Computers and musical style

    David Cope. Computers and musical style . Oxford University Press, 1991

  17. [17]

    mu- sic21: A toolkit for computer-aided musicology and symbolic music data

    Michael Scott Cuthbert and Christopher Ariza. mu- sic21: A toolkit for computer-aided musicology and symbolic music data. In International Society for Mu- sic Information Retrieval, 2010

  18. [18]

    Consecutive 5ths and octaves in bach chorales

    Luke Dahn. Consecutive 5ths and octaves in bach chorales. https://lukedahn.wordpress.com/ 2016/04/15/consecutive-5ths-and- octaves-in-bach-chorales/ . Accessed: 2019-04-12

  19. [19]

    Analysis and syn- thesis of palestrina-style counterpoint using markov chains

    Mary Farbood and Bernd Schöner. Analysis and syn- thesis of palestrina-style counterpoint using markov chains. In Proceedings of the International Computer Music Conference, 2001

  20. [20]

    Hyperscore: a graphical sketchpad for novice composers

    Morwaread M Farbood, Egon Pasztor, and Kevin Jen- nings. Hyperscore: a graphical sketchpad for novice composers. IEEE Computer Graphics and Applica- tions, 24(1):50–54, 2004

  21. [21]

    Ai methods in algorithmic composition: A comprehensive survey

    Jose D Fernández and Francisco Vico. Ai methods in algorithmic composition: A comprehensive survey. Journal of Artificial Intelligence Research , 48:513– 582, 2013

  22. [22]

    Real-time human interaction with supervised learning algorithms for music compo- sition and performance

    Rebecca Anne Fiebrink. Real-time human interaction with supervised learning algorithms for music compo- sition and performance. PhD dissertation, Princeton University, 2011

  23. [23]

    Parallel succes- sions of perfect fifths in the bach chorales

    George Fitsioris and Darrell Conklin. Parallel succes- sions of perfect fifths in the bach chorales. MUSICAL STRUCTURE, page 52, 2008

  24. [24]

    Poly- phonic music generation by modeling temporal depen- dencies using a RNN-DBN

    Kratarth Goel, Raunaq V ohra, and JK Sahoo. Poly- phonic music generation by modeling temporal depen- dencies using a RNN-DBN. In International Confer- ence on Artificial Neural Networks, 2014

  25. [25]

    Deepbach: a steerable model for bach chorales gener- ation

    Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales gener- ation. In International Conference on Machine Learn- ing, pages 1362–1371, 2017

  26. [26]

    Style Imitation and Chord Invention in Polyphonic Music with Exponential Families

    Gaëtan Hadjeres, Jason Sakellariou, and François Pa- chet. Style imitation and chord invention in poly- phonic music with exponential families.arXiv preprint arXiv:1609.05152, 2016

  27. [27]

    A functional taxonomy of music generation systems

    Dorien Herremans, Ching-Hua Chuan, and Elaine Chew. A functional taxonomy of music generation systems. ACM Computing Surveys (CSUR) , 50(5):69, 2017

  28. [28]

    Composing fifth species counterpoint music with a variable neigh- borhood search algorithm

    Dorien Herremans and Kenneth Sörensen. Composing fifth species counterpoint music with a variable neigh- borhood search algorithm. Expert systems with appli- cations, 40(16):6427–6437, 2013

  29. [29]

    Mixed-initiative generation of multi- channel sequential structures

    Cheng-Zhi Anna Huang, Sherol Chen, Mark Nelson, and Doug Eck. Mixed-initiative generation of multi- channel sequential structures. In International Con- ference on Learning Representations Workshop Track, 2018

  30. [30]

    Counter- point by convolution

    Cheng-Zhi Anna Huang, Tim Cooijmnas, Adam Roberts, Aaron Courville, and Douglas Eck. Counter- point by convolution. ISMIR, 2017

  31. [31]

    Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control

    Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E Turner, and Douglas Eck. Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control. In Pro- ceedings of the 34th International Conference on Ma- chine Learning-Volume 70, pages 1645–1654. JMLR. org, 2017

  32. [32]

    Bachbot: Automatic composition in the style of bach chorales

    Feynman Liang. Bachbot: Automatic composition in the style of bach chorales. Masters thesis, University of Cambridge, 2016

  33. [33]

    The continuator: Musical interaction with style

    Francois Pachet. The continuator: Musical interaction with style. Journal of New Music Research, 32(3):333– 341, 2003

  34. [34]

    Musical harmoniza- tion with constraints: A survey.Constraints, 6(1):7–19, 2001

    François Pachet and Pierre Roy. Musical harmoniza- tion with constraints: A survey.Constraints, 6(1):7–19, 2001

  35. [35]

    Assisted lead sheet composition using flowcom- poser

    Alexandre Papadopoulos, Pierre Roy, and François Pa- chet. Assisted lead sheet composition using flowcom- poser. In International Conference on Principles and Practice of Constraint Programming, pages 769–785. Springer, 2016

  36. [36]

    Ai meth- ods for algorithmic composition: A survey, a critical view and future prospects

    George Papadopoulos and Geraint Wiggins. Ai meth- ods for algorithmic composition: A survey, a critical view and future prospects. In AISB Symposium on Mu- sical Creativity , volume 124, pages 110–117. Edin- burgh, UK, 1999

  37. [37]

    An introduction to musical metacre- ation

    Philippe Pasquier, Arne Eigenfeldt, Oliver Bown, and Shlomo Dubnov. An introduction to musical metacre- ation. Computers in Entertainment (CIE) , 14(2):2, 2016

  38. [38]

    Ma- genta

    Adam Roberts, Curtis Hawthorne, and Ian Simon. Ma- genta. js: A javascript api for augmenting creativity with deep learning. 2018

  39. [39]

    Mysong: au- tomatic accompaniment generation for vocal melodies

    Ian Simon, Dan Morris, and Sumit Basu. Mysong: au- tomatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI conference on human factors in computing systems , pages 725–734. ACM, 2008

  40. [40]

    TensorFlow.js: Machine Learning for the Web and Beyond

    Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger, Ping Yu, Kangyi Zhang, Shanqing Cai, Eric Nielsen, David Soergel, et al. Tensorflow. js: Machine learning for the web and beyond. arXiv preprint arXiv:1901.05350, 2019

  41. [41]

    Ma- chine learning research that matters for music cre- ation: A case study

    Bob L Sturm, Oded Ben-Tal, Una Monaghan, Nick Collins, Dorien Herremans, Elaine Chew, Gaëtan Had- jeres, Emmanuel Deruty, and François Pachet. Ma- chine learning research that matters for music cre- ation: A case study. Journal of New Music Research , 48(1):36–55, 2019

  42. [42]

    Neural autoregres- sive distribution estimation

    Benigno Uria, Marc-Alexandre Côté, Karol Gregor, Iain Murray, and Hugo Larochelle. Neural autoregres- sive distribution estimation. The Journal of Machine Learning Research, 17(1):7184–7220, 2016

  43. [43]

    A deep and tractable density estimator

    Benigno Uria, Iain Murray, and Hugo Larochelle. A deep and tractable density estimator. InIn Proceedings of the International Conference on Machine Learning, 2014

  44. [44]

    Wavenet: A generative model for raw audio

    Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio

  45. [45]

    On the equivalence between deep nade and generative stochastic networks

    Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio. On the equivalence between deep nade and generative stochastic networks. In In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014