LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.
arXiv preprint arXiv:2412.07948 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SD 3years
2026 3representative citing papers
StreamMUSE performs frame-synchronous streaming inference for language models by having a client send high-frequency requests and a server return outputs aligned to an external clock, shown on live music accompaniment with open-source code.
citing papers explorer
-
Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding
LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.
-
Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation
StreamMUSE performs frame-synchronous streaming inference for language models by having a client send high-frequency requests and a server return outputs aligned to an external clock, shown on live music accompaniment with open-source code.
- BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps