Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Hong-Jie You; Jie-Jing Shao; Lan-Zhe Guo; Lin-Han Jia; Xiao-Wen Yang; Yu-Feng Li

arxiv: 2512.02652 · v2 · pith:K72GOYMSnew · submitted 2025-12-02 · 💻 cs.SD · cs.AI· cs.MM

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Hong-Jie You , Jie-Jing Shao , Xiao-Wen Yang , Lin-Han Jia , Lan-Zhe Guo , Yu-Feng Li This is my paper

classification 💻 cs.SD cs.AIcs.MM

keywords musicperformancerenderingtransformerexpressivemodelpianistdata

0 comments

read the original abstract

Existing methods for expressive music performance rendering, a conditional generation task that aims to generate a human-like performance from a symbolic score, rely on supervised learning over small labeled datasets, which limits scaling of both data volume and model size, despite the availability of vast unlabeled music, as in vision and language. To address this gap, we introduce Pianist Transformer, with three key contributions: 1) introducing large-scale self-supervised learning into expressive piano performance rendering through a unified Musical Instrument Digital Interface (MIDI) representation, enabling pre-training on 10B tokens of unlabeled MIDI data; 2) an efficient asymmetric Transformer with note-level compression, substantially improving training efficiency, memory usage, and inference speed for long-context music modeling; 3) a state-of-the-art rendering model with an editable workflow, achieving strong objective and subjective results and enabling integration into real-world music production workflows. Overall, Pianist Transformer outlines a scalable path toward human-like performance synthesis in the music domain. Code, audio samples, and model checkpoints are available on our project page: https://yhj137.github.io/pianist-transformer-demo/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing
cs.SD 2026-05 unverdicted novelty 6.0

AnchorSteer couples self-discovered semantic concept vectors with structural anchoring in diffusion models to achieve controllable music editing with preserved structure.