Pith Number

pith:3NSFLYB5

pith:2024:3NSFLYB5XM5L4DIY4H65GBVNE4

not attested not anchored not stored refs resolved

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Haoran Liu, He Wang, Jiazhao Zhang, Kunyu Wang, Minghan Li, Shaoan Wang, Songlin Wei, Zhizheng Zhang, Zhongyuan Wang

A single video-based model unifies multiple robot navigation tasks by standardizing their data formats.

arxiv:2412.06224 v2 · 2024-12-09 · cs.RO · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{3NSFLYB5XM5L4DIY4H65GBVNE4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Uni-NaVid is the first video-based vision-language-action model designed to unify diverse embodied navigation tasks and enable seamless navigation for mixed long-horizon tasks in unseen real-world environments.

C2weakest assumption

Harmonizing input and output data configurations across tasks allows effective integration and positive synergy in learning without loss of performance on individual tasks or introduction of negative interference.

C3one line summary

Uni-NaVid unifies diverse embodied navigation tasks into one video-based vision-language-action model trained on 3.6 million samples from four sub-tasks, achieving state-of-the-art performance on benchmarks and real-world tests.

References

126 extracted · 126 resolved · 13 Pith anchors

[1] Etpnav: Evolving topological planning for vision-language nav- igation in continuous environments 2023

[3] On Evaluation of Embodied Navigation Agents 2018 · arXiv:1807.06757

[4] Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments 2018

[5] Sim-to-real transfer for vision-and-language navigation 2021

[6] Human memory: A proposed system and its control processes (vol 1968

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

SEDualVLN: A Spatially-Enhanced Dual-System for Vision-Language Navigation

EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning

LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map

Receipt and verification

First computed	2026-05-17T23:38:46.784204Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

db6455e03dbb3abe0d18e1fdd306ad272fa57104b1f13a1816e9da3eaae1b047

Aliases

arxiv: 2412.06224 · arxiv_version: 2412.06224v2 · doi: 10.48550/arxiv.2412.06224 · pith_short_12: 3NSFLYB5XM5L · pith_short_16: 3NSFLYB5XM5L4DIY · pith_short_8: 3NSFLYB5

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/3NSFLYB5XM5L4DIY4H65GBVNE4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: db6455e03dbb3abe0d18e1fdd306ad272fa57104b1f13a1816e9da3eaae1b047

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "9678bece30c07e9689b0428da3a3ad5864662f0e3e9873458f46b86eb5418661",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-12-09T05:55:55Z",
    "title_canon_sha256": "6de18cdeccb65d161bb2f2cf81f80abd312ccf44a0191d91de55ac49a7636abb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.06224",
    "kind": "arxiv",
    "version": 2
  }
}