pith. sign in
Pith Number

pith:INA3ZRYW

pith:2023:INA3ZRYWGUTA6PUQHZ6SAOKHQD
not attested not anchored not stored refs resolved

TD-MPC2: Scalable, Robust World Models for Continuous Control

Hao Su, Nicklas Hansen, Xiaolong Wang

TD-MPC2 achieves significantly better performance than baselines on 104 continuous control tasks using one fixed set of hyperparameters.

arxiv:2310.16828 v2 · 2023-10-25 · cs.LG · cs.AI · cs.CV · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{INA3ZRYWGUTA6PUQHZ6SAOKHQD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces.

C2weakest assumption

The reported gains rely on the assumption that the chosen 104 tasks and four domains are representative enough that a single hyperparameter set will continue to work when the method is applied to new, unseen continuous-control problems.

C3one line summary

TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.

References

162 extracted · 162 resolved · 12 Pith anchors

[1] Layer normalization 2016
[2] Video pretraining (vpt): Learning to act by watching unlabeled online videos 2022
[3] A distributional perspective on reinforcement learning 2017
[4] A markovian decision process 1957
[7] Language models are few-shot learners 1901

Formal links

3 machine-checked theorem links

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:22.321510Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4341bcc71635260f3e903e7d20394780cfa0e88df256df3aa5350dab84d495b6

Aliases

arxiv: 2310.16828 · arxiv_version: 2310.16828v2 · doi: 10.48550/arxiv.2310.16828 · pith_short_12: INA3ZRYWGUTA · pith_short_16: INA3ZRYWGUTA6PUQ · pith_short_8: INA3ZRYW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/INA3ZRYWGUTA6PUQHZ6SAOKHQD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4341bcc71635260f3e903e7d20394780cfa0e88df256df3aa5350dab84d495b6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a2f737d999efdb1fbcc13897d2bbd2b8b944905f8270acce0ed123d3b92c7024",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV",
      "cs.RO"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-10-25T17:57:07Z",
    "title_canon_sha256": "f2ad3264774b571271338ae467cb30dc4224dc960e43751b19fe7de5d69db4b3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.16828",
    "kind": "arxiv",
    "version": 2
  }
}