pith. sign in
Pith Number

pith:VFRBDQCO

pith:2025:VFRBDQCOQYO33TYOOTL5AVJWZJ
not attested not anchored not stored refs resolved

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Bolei Zhou, Jiaqi Ma, Seth Z. Zhao, Tianhui Cai, Yun Zhang, Zewei Zhou, Zhiyu Huang

AutoVLA unifies semantic reasoning and trajectory planning inside one autoregressive model that reads raw images and language instructions for end-to-end driving.

arxiv:2506.13757 v3 · 2025-06-16 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VFRBDQCOQYO33TYOOTL5AVJWZJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

AutoVLA performs semantic reasoning and trajectory planning directly from raw visual inputs and language instructions, achieving competitive performance across real-world and simulated datasets in both open-loop and closed-loop settings.

C2weakest assumption

That discretizing continuous trajectories into a fixed vocabulary of feasible actions preserves sufficient information for safe and precise control without introducing unacceptable discretization errors or limiting expressiveness.

C3one line summary

AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.

References

104 extracted · 104 resolved · 17 Pith anchors

[1] Bevformer: learning bird’s-eye-view representation from lidar-camera via spa- tiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 2024
[2] Detr3d: 3d object detection from multi-view images via 3d-to-2d queries 2022
[3] Bevfusion: A simple and robust lidar-camera fusion framework.Advances in Neural Information Processing Systems, 35:10421–10434, 2022 2022
[4] QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction 2023
[5] Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Mach 2024

Formal links

3 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.582612Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a96211c04e861dbdcf0e74d7d05536ca4ea4bbc13e430411e89cb8f369a35575

Aliases

arxiv: 2506.13757 · arxiv_version: 2506.13757v3 · doi: 10.48550/arxiv.2506.13757 · pith_short_12: VFRBDQCOQYO3 · pith_short_16: VFRBDQCOQYO33TYO · pith_short_8: VFRBDQCO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VFRBDQCOQYO33TYOOTL5AVJWZJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a96211c04e861dbdcf0e74d7d05536ca4ea4bbc13e430411e89cb8f369a35575
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9a36de3787df00219683c404cc33cbbae62d8ec50651aadc0f9f1572c2193769",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-06-16T17:58:50Z",
    "title_canon_sha256": "ca090c459a0f88849e29ba22c06c394d862e227e03ca8396b152bffd666d68e6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.13757",
    "kind": "arxiv",
    "version": 3
  }
}