pith. the verified trust layer for science. sign in
Pith Number

pith:QM2I3SS6

pith:2024:QM2I3SS6PUQRPIEMTEONHQMQON
not attested not anchored not stored refs resolved

Diffusion Policy Policy Optimization

Allen Z. Ren, Anirudha Majumdar, Anthony Simeonov, Benjamin Burchfiel, Hongkai Dai, Justin Lidard, Lars L. Ankile, Max Simchowitz, Pulkit Agrawal

DPPO fine-tunes diffusion-based policies with policy gradients to reach stronger performance than prior RL methods on robot tasks.

arxiv:2409.00588 v3 · 2024-09-01 · cs.RO · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QM2I3SS6PUQRPIEMTEONHQMQON}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations.

C2weakest assumption

That the observed performance gains arise from unique synergies between the diffusion parameterization and policy-gradient updates rather than from unstated hyperparameter tuning or benchmark-specific implementation details.

C3one line summary

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

References

114 extracted · 114 resolved · 24 Pith anchors

[1] J. Achiam. Spinning Up in Deep Reinforcement Learning. 2018 2018
[2] A. Ajay, Y . Du, A. Gupta, J. B. Tenenbaum, T. S. Jaakkola, and P. Agrawal. Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Re 2023
[3] M. Alakuijala, G. Dulac-Arnold, J. Mairal, J. Ponce, and C. Schmid. Residual reinforcement learning from demonstrations. arXiv preprint arXiv:2106.08050, 2021 2021
[4] O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. Learning dexterous in-hand manipulation. The International Journal of 2020
[5] L. Ankile, A. Simeonov, I. Shenfeld, and P. Agrawal. Juicer: Data-efficient imitation learning for robotic assembly. arXiv, 2024 2024

Formal links

3 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.459195Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

83348dca5e7d2117a08c991cd3c19073458a4f7b0e1e2d3d3f0abf5fdd9e0872

Aliases

arxiv: 2409.00588 · arxiv_version: 2409.00588v3 · doi: 10.48550/arxiv.2409.00588 · pith_short_12: QM2I3SS6PUQR · pith_short_16: QM2I3SS6PUQRPIEM · pith_short_8: QM2I3SS6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QM2I3SS6PUQRPIEMTEONHQMQON \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 83348dca5e7d2117a08c991cd3c19073458a4f7b0e1e2d3d3f0abf5fdd9e0872
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9d3d0304a1b74a8d746cb2da96dde3219bf7b97e22907935b53ce1aa1ef2b15c",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-09-01T02:47:50Z",
    "title_canon_sha256": "d1ea95de8d3a4b7a2518acbc7b245a34effaa8d966a90740b5a52f4d94d4825f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.00588",
    "kind": "arxiv",
    "version": 3
  }
}