pith. sign in
Pith Number

pith:P4HUNZAR

pith:2026:P4HUNZARFD5CNASISNWE63AYQN
not attested not anchored not stored refs resolved

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Danica Kragic, Daniel Palenicek, Donghu Kim, Florian Vogt, Hojoon Lee, I Made Aswin Nahendra, Jaegul Choo, Jan Peters, Kinam Kim, Minho Park, Sehee Min, Takuma Seno, Youngdo Lee

FlashSAC stabilizes off-policy RL for high-dimensional robot control by cutting gradient updates and bounding norms to limit critic errors.

arxiv:2604.04539 v2 · 2026-04-06 · cs.LG · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{P4HUNZARFD5CNASISNWE63AYQN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across over 60 tasks in 10 simulators, FlashSAC consistently outperforms PPO and strong off-policy baselines in both final performance and training efficiency, with the largest gains on high-dimensional tasks such as dexterous manipulation. In sim-to-real humanoid locomotion, FlashSAC reduces training time from hours to minutes.

C2weakest assumption

That explicitly bounding weight, feature, and gradient norms will sufficiently curb critic error accumulation in high-dimensional spaces without removing the capacity needed for accurate value estimation or policy improvement.

C3one line summary

FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.

References

94 extracted · 94 resolved · 18 Pith anchors

[1] Loss of plasticity in continual deep reinforcement learning 2023
[2] Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020 2020
[3] A brief survey of deep reinforcement learning 2017 · arXiv:1708.05866
[4] Genesis: A generative and universal physics engine for robotics and beyond, December 2024 2024
[5] Layer Normalization 2016 · arXiv:1607.06450

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:00:37.674180Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7f0f46e41128fa268248936c4f6c188351bd41ef445d3cfd1aa858d1d78ab95f

Aliases

arxiv: 2604.04539 · arxiv_version: 2604.04539v2 · doi: 10.48550/arxiv.2604.04539 · pith_short_12: P4HUNZARFD5C · pith_short_16: P4HUNZARFD5CNASI · pith_short_8: P4HUNZAR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/P4HUNZARFD5CNASISNWE63AYQN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7f0f46e41128fa268248936c4f6c188351bd41ef445d3cfd1aa858d1d78ab95f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ae46796841d76b3575eba32104b569b38dee8c7c75a0de345a25ac3d84abef1a",
    "cross_cats_sorted": [
      "cs.RO"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-04-06T09:03:41Z",
    "title_canon_sha256": "f94ebc986218d4761adc4446fc388da88b050823e912340b7a2eef21558e682f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.04539",
    "kind": "arxiv",
    "version": 2
  }
}