{"paper":{"title":"Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Robot action trajectories are mostly low-frequency, so diffusion policies need only two denoising steps for strong performance.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Haoming Song, Huizhe Li, Jie Mei, Jinhao Zhang, Wenlong Xia, Yichen Lai, Youmin Gong, Zhexuan Zhou","submitted_at":"2026-05-02T19:07:09Z","abstract_excerpt":"Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This also suggests"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the low-frequency concentration observed in action trajectories and the derived error bound directly translate to sufficiency of exactly two denoising steps across the diverse tasks and action spaces tested, without hidden performance loss.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Robot action trajectories are mostly low-frequency, so diffusion policies need only two denoising steps for strong performance.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e76e4958bb2b5bcba3608fe11e43ddefcc0da986f7466892aa56d3761ce5984f"},"source":{"id":"2605.01581","kind":"arxiv","version":4},"verdict":{"id":"af103e7b-f403-4e67-888e-52e50ddfa5b2","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T03:42:35.839063Z","strongest_claim":"across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.","one_line_summary":"Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the low-frequency concentration observed in action trajectories and the derived error bound directly translate to sufficiency of exactly two denoising steps across the diverse tasks and action spaces tested, without hidden performance loss.","pith_extraction_headline":"Robot action trajectories are mostly low-frequency, so diffusion policies need only two denoising steps for strong performance."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.01581/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-20T17:39:33.128516Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T17:09:26.935487Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"b1be61bd77754442fcff2837c21e8ebbcc771cbbadfbf7225553080dfefe1b48"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}