pith. sign in
Pith Number

pith:MZ2Q6ZWW

pith:2025:MZ2Q6ZWWEBZ2M4XGYHF72E5SOH
not attested not anchored not stored refs resolved

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Bo Jiang, Qian Zhang, Shaoyu Chen, Wenyu Liu, Xinggang Wang

Reinforcement learning with tailored rewards and a two-stage strategy improves vision-language models for autonomous driving planning.

arxiv:2503.07608 v1 · 2025-03-10 · cs.CV · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MZ2Q6ZWWEBZ2M4XGYHF72E5SOH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

AlphaDrive significantly improves both planning performance and training efficiency compared to using only SFT or without reasoning, and following RL training exhibits emergent multimodal planning capabilities.

C2weakest assumption

That the four GRPO-based RL rewards and two-stage training strategy produce generalizable, safe improvements on real-world driving data rather than overfitting to the training distribution.

C3one line summary

AlphaDrive uses GRPO-based RL rewards and two-stage SFT+RL training on VLMs to improve autonomous driving planning performance and efficiency while producing emergent multimodal capabilities.

References

49 extracted · 49 resolved · 16 Pith anchors

[1] GPT-4 Technical Report · arXiv:2303.08774
[2] Flamingo: a visual language model for few-shot learning 2022
[3] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966
[4] Meteor: An automatic metric for mt evaluation with improved correlation with hu- man judgments 2005
[5] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control 2023 · arXiv:2307.15818

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.738477Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

66750f66d62073a672e6c1cbfd13b271d5568d74846ccc3c164489e69622a1e9

Aliases

arxiv: 2503.07608 · arxiv_version: 2503.07608v1 · doi: 10.48550/arxiv.2503.07608 · pith_short_12: MZ2Q6ZWWEBZ2 · pith_short_16: MZ2Q6ZWWEBZ2M4XG · pith_short_8: MZ2Q6ZWW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MZ2Q6ZWWEBZ2M4XGYHF72E5SOH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 66750f66d62073a672e6c1cbfd13b271d5568d74846ccc3c164489e69622a1e9
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "251251be26a88dc4ba247abe36d2d3fb3589f84bb6d3ebad2a96ee22e776a2fb",
    "cross_cats_sorted": [
      "cs.RO"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-03-10T17:59:42Z",
    "title_canon_sha256": "ffbef9e71aead583bc8fd0ecc333658dd73e98f7124c6a20764b1750183a9904"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.07608",
    "kind": "arxiv",
    "version": 1
  }
}