pith:S4Y3ZWUW
DiffWave: A Versatile Diffusion Model for Audio Synthesis
A diffusion model converts white noise into high-quality audio waveforms through a fixed-step Markov chain, matching WaveNet vocoder quality while running orders of magnitude faster.
arxiv:2009.09761 v3 · 2020-09-21 · eess.AS · cs.CL · cs.LG · cs.SD · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{S4Y3ZWUWEX5I3563BY74JTVKPQ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
That a neural network can accurately predict the noise to remove at each step of the reverse diffusion Markov chain so that the resulting waveform matches the statistical structure of real audio data across conditional and unconditional tasks.
DiffWave is a non-autoregressive diffusion model that generates high-fidelity audio waveforms from noise in constant steps, matching WaveNet vocoder quality while being orders of magnitude faster and outperforming prior models in unconditional generation.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:52.485912Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9731bcda9625fa8df7db0e3fc4ceaa7c1567322b3a582da7a2505b30f4e028e9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/S4Y3ZWUWEX5I3563BY74JTVKPQ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9731bcda9625fa8df7db0e3fc4ceaa7c1567322b3a582da7a2505b30f4e028e9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "19873d12221f4a74bda25d7330b4259d32db0ca87acfa2ff2b93e55a6392a6f0",
"cross_cats_sorted": [
"cs.CL",
"cs.LG",
"cs.SD",
"stat.ML"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "eess.AS",
"submitted_at": "2020-09-21T11:20:38Z",
"title_canon_sha256": "848a2e8d4a5ddbf0e6eaeb3880f031f2c0cfa2f0510837aca0ac098a54697ee9"
},
"schema_version": "1.0",
"source": {
"id": "2009.09761",
"kind": "arxiv",
"version": 3
}
}