SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.
Zipformer: A faster and better encoder for automatic speech recognition,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A survey that organizes audio SSL into five objective paradigms, relates their demands to architectural biases, and interprets downstream applications as tests of generalization.
citing papers explorer
-
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling
SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.
-
From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised Learning
A survey that organizes audio SSL into five objective paradigms, relates their demands to architectural biases, and interprets downstream applications as tests of generalization.