pith. machine review for the scientific record. sign in
Pith Number

pith:SOUJZI63

pith:2025:SOUJZI63CLDKUJMYDPLWER3S2K
not attested not anchored not stored refs resolved

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Abhishek Gupta, Andrew Wagenmaker, Anusha Nagabandi, Mitsuhiko Nakamoto, Seohong Park, Sergey Levine, Waleed Yagoub, Yunchu Zhang

Optimizing in a diffusion policy's latent noise space enables sample-efficient autonomous robotic adaptation without altering model weights.

arxiv:2506.15799 v2 · 2025-06-18 · cs.RO · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement.

C2weakest assumption

That optimizing actions via RL in the diffusion model's latent noise space will produce meaningful policy improvements without access to model gradients or internal weights, and that this optimization remains stable across real-world robotic tasks.

C3one line summary

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

References

98 extracted · 98 resolved · 30 Pith anchors

[1] S. Stepputtis, J. Campbell, M. Phielipp, S. Lee, C. Baral, and H. Ben Amor. Language- conditioned imitation learning for robot manipulation tasks. Advances in Neural Information Processing Systems, 33 2020
[2] N. M. Shafiullah, Z. Cui, A. A. Altanzaya, and L. Pinto. Behavior transformers: Cloning k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022 2022
[3] G., Rao, K., Yu, W., Fu, C., Gopalakrishnan, K., Xu, Z., et al 2023
[4] Octo: An Open-Source Generalist Robot Policy 2024 · arXiv:2405.12213
[5] Aloha unleashed: A simple recipe for robot dexterity.arXiv preprint arXiv:2410.13126 2024

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:12.935251Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

93a89ca3db12c6aa25981bd7624772d299dd180391dde7212029dd62572bc52f

Aliases

arxiv: 2506.15799 · arxiv_version: 2506.15799v2 · doi: 10.48550/arxiv.2506.15799 · pith_short_12: SOUJZI63CLDK · pith_short_16: SOUJZI63CLDKUJMY · pith_short_8: SOUJZI63
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SOUJZI63CLDKUJMYDPLWER3S2K \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 93a89ca3db12c6aa25981bd7624772d299dd180391dde7212029dd62572bc52f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "663649781b6ec4cce2ed62242e79588a55edf61a4d2087d98728585e10f0b98a",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-06-18T18:35:57Z",
    "title_canon_sha256": "123eb9f732b785ef2fd3accc2cdd00a4ee02ede57d79f226477b9bc4cf77dffd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.15799",
    "kind": "arxiv",
    "version": 2
  }
}