Pith Number

pith:QM2I3SS6

pith:2024:QM2I3SS6PUQRPIEMTEONHQMQON

not attested not anchored not stored refs resolved

Diffusion Policy Policy Optimization

Allen Z. Ren, Anirudha Majumdar, Anthony Simeonov, Benjamin Burchfiel, Hongkai Dai, Justin Lidard, Lars L. Ankile, Max Simchowitz, Pulkit Agrawal

DPPO fine-tunes diffusion-based policies with policy gradients to reach stronger performance than prior RL methods on robot tasks.

arxiv:2409.00588 v3 · 2024-09-01 · cs.RO · cs.LG

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QM2I3SS6PUQRPIEMTEONHQMQON}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations.

C2weakest assumption

That the observed performance gains arise from unique synergies between the diffusion parameterization and policy-gradient updates rather than from unstated hyperparameter tuning or benchmark-specific implementation details.

C3one line summary

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

References

114 extracted · 114 resolved · 24 Pith anchors

[1] J. Achiam. Spinning Up in Deep Reinforcement Learning. 2018 2018

[2] A. Ajay, Y . Du, A. Gupta, J. B. Tenenbaum, T. S. Jaakkola, and P. Agrawal. Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Re 2023

[3] M. Alakuijala, G. Dulac-Arnold, J. Mairal, J. Ponce, and C. Schmid. Residual reinforcement learning from demonstrations. arXiv preprint arXiv:2106.08050, 2021 2021

[4] O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. Learning dexterous in-hand manipulation. The International Journal of 2020

[5] L. Ankile, A. Simeonov, I. Shenfeld, and P. Agrawal. Juicer: Data-efficient imitation learning for robotic assembly. arXiv, 2024 2024

Formal links

3 machine-checked theorem links

Cited by

26 papers in Pith

Reinforcement Learning with Action Chunking

EXPO: Stable Reinforcement Learning with Expressive Policies

AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Receipt and verification

First computed	2026-05-17T23:38:48.459195Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

83348dca5e7d2117a08c991cd3c19073458a4f7b0e1e2d3d3f0abf5fdd9e0872

Aliases

arxiv: 2409.00588 · arxiv_version: 2409.00588v3 · doi: 10.48550/arxiv.2409.00588 · pith_short_12: QM2I3SS6PUQR · pith_short_16: QM2I3SS6PUQRPIEM · pith_short_8: QM2I3SS6

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QM2I3SS6PUQRPIEMTEONHQMQON \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 83348dca5e7d2117a08c991cd3c19073458a4f7b0e1e2d3d3f0abf5fdd9e0872

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "9d3d0304a1b74a8d746cb2da96dde3219bf7b97e22907935b53ce1aa1ef2b15c",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-09-01T02:47:50Z",
    "title_canon_sha256": "d1ea95de8d3a4b7a2518acbc7b245a34effaa8d966a90740b5a52f4d94d4825f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.00588",
    "kind": "arxiv",
    "version": 3
  }
}