{"paper":{"title":"ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A one-step block diffusion model generates clinically accurate chest X-ray reports eight times faster than autoregressive methods by distilling joint token dependencies.","cross_cats":["cs.AI","eess.IV"],"primary_cat":"cs.LG","authors_text":"Hao Liu, Jile Jiao, Lifeng Chen, Tao Sun, Tianqi You, Xiaofeng Mou, Xiao Han, Xiaojie Jin, Yi Xu, Zhicai Ou, Zhimin Bao","submitted_at":"2026-04-10T16:07:14Z","abstract_excerpt":"Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative through parallel generation, but they still require multiple denoising iterations. Compressing multi-step denoising to a single step could further reduce latency, but often degrades textual coherence due to the mean-field bias introduced by token-factorized denoisers. To address this challenge, we pro"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ECHO surpasses state-of-the-art autoregressive methods, improving RaTE and SemScore by 64.33% and 60.58% respectively, while achieving an 8× inference speedup without compromising clinical accuracy.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the Direct Conditional Distillation framework successfully encodes joint token dependencies from on-policy trajectories to overcome mean-field bias in one-step generation, without introducing new coherence failures not captured by the reported metrics.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ECHO is a one-step block diffusion VLM for chest X-ray reports that improves RaTE and SemScore by over 60% while delivering 8x faster inference than autoregressive baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A one-step block diffusion model generates clinically accurate chest X-ray reports eight times faster than autoregressive methods by distilling joint token dependencies.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"0368b12deb3cc1fc660fbc314a48e5d2b13dc69a39d0d86cf4b7a821adaa67f1"},"source":{"id":"2604.09450","kind":"arxiv","version":2},"verdict":{"id":"c0edf919-dd39-4141-94bf-9f28d67de499","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T17:51:46.575571Z","strongest_claim":"ECHO surpasses state-of-the-art autoregressive methods, improving RaTE and SemScore by 64.33% and 60.58% respectively, while achieving an 8× inference speedup without compromising clinical accuracy.","one_line_summary":"ECHO is a one-step block diffusion VLM for chest X-ray reports that improves RaTE and SemScore by over 60% while delivering 8x faster inference than autoregressive baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the Direct Conditional Distillation framework successfully encodes joint token dependencies from on-policy trajectories to overcome mean-field bias in one-step generation, without introducing new coherence failures not captured by the reported metrics.","pith_extraction_headline":"A one-step block diffusion model generates clinically accurate chest X-ray reports eight times faster than autoregressive methods by distilling joint token dependencies."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.09450/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}