pith. sign in
Pith Number

pith:GTXFFVUW

pith:2022:GTXFFVUW5IAGGYDDGCDKIZJSH5
not attested not anchored not stored refs resolved

R3M: A Universal Visual Representation for Robot Manipulation

Abhinav Gupta, Aravind Rajeswaran, Chelsea Finn, Suraj Nair, Vikash Kumar

Pre-trained visual features from human videos enable more data-efficient robot manipulation.

arxiv:2203.12601 v3 · 2022-03-23 · cs.RO · cs.AI · cs.CV · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GTXFFVUW5IAGGYDDGCDKIZJSH5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across a suite of 12 simulated robot manipulation tasks, we find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo. Furthermore, R3M enables a Franka Emika Panda arm to learn a range of manipulation tasks in a real, cluttered apartment given just 20 demonstrations.

C2weakest assumption

That visual features learned from human video data will transfer effectively to robotic camera inputs and task distributions without any robot-specific fine-tuning or domain adaptation.

C3one line summary

A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks from 20 demonstrations.

References

71 extracted · 71 resolved · 8 Pith anchors

[1] S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016 2016
[2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009 2009
[3] D. Mzurikwao, M. Khan, O. Samuel, J. Cinatl, M. Wass, M. Michaelis, G. Marcelli, and C. S. Ang. Towards image-based cancer cell lines authentication using deep neural networks. Scientific Reports, 10, 2020 · doi:10.1038/s41598-020-76670-6
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for C 2019
[5] doi: 10.18653/v1/ 2024.findings-acl.586 2020 · doi:10.18653/v1/

Formal links

2 machine-checked theorem links

Cited by

38 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.454622Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

34ee52d696ea006360633086a465323f42504e698f7a5184bb4ac4f8e17a2bd4

Aliases

arxiv: 2203.12601 · arxiv_version: 2203.12601v3 · doi: 10.48550/arxiv.2203.12601 · pith_short_12: GTXFFVUW5IAG · pith_short_16: GTXFFVUW5IAGGYDD · pith_short_8: GTXFFVUW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GTXFFVUW5IAGGYDDGCDKIZJSH5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 34ee52d696ea006360633086a465323f42504e698f7a5184bb4ac4f8e17a2bd4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2e12f3f0f536b483f3c663217324ecee07bd9809f73caf8399664ec0c02148d6",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2022-03-23T17:55:09Z",
    "title_canon_sha256": "923da15624e8a8f8846da5becb700e1523a5a0058db29deff7fbab2dd3adcab3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2203.12601",
    "kind": "arxiv",
    "version": 3
  }
}