pith. sign in
Pith Number

pith:ZJBNQPUO

pith:2025:ZJBNQPUONLDWV5Z3AE6W474BSZ
not attested not anchored not stored refs resolved

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

Mengwei Xie, Mu Xu, Ning Guo, Shuang Zeng, Xing Wei, Xinran Liu, Xinyuan Chang, Yifan Bai, Zheng Pan

Generating one future visual frame lets driving models plan trajectories by preserving spatial and temporal details that text chains of thought discard.

arxiv:2505.17685 v3 · 2025-05-23 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZJBNQPUONLDWV5Z3AE6W474BSZ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

This imagined scene serves as the visual spatio-temporal CoT, capturing both spatial structure and temporal evolution in a single representation. ... our visual spatio-temporal CoT bridges the perception-planning gap, enabling safer, more anticipatory autonomous driving.

C2weakest assumption

The generated future frame is physically plausible and contains the exact spatio-temporal cues needed for accurate inverse-dynamics planning; if the predicted lanes or boxes are systematically wrong, the planning step will inherit those errors.

C3one line summary

FSDrive uses a generated future scene frame as visual spatio-temporal CoT to improve VLA models for safer autonomous driving trajectory prediction.

References

98 extracted · 98 resolved · 4 Pith anchors

[1] H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving.CVPR, 2020 2020
[2] X. Chang, M. Xue, X. Liu, Z. Pan, and X. Wei. Driving by the rules: A benchmark for integrating traffic sign regulations into vectorized hd map.CVPR, 2025 2025
[3] VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning 2024 · arXiv:2402.13243
[4] Y . Chen and R. Greer. Technical report for argoverse2 scenario mining challenges on iterative error correction and spatially-aware prompting.arXiv preprint arXiv:2506.11124, 2025 2025
[5] Y . Chen, Y .-Q. Wang, and Z. Zhang. Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers.ICCV, 2025 2025

Formal links

2 machine-checked theorem links

Cited by

30 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.443630Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ca42d83e8e6ac76af73b013d6e7f8196479dcf2303c0a3983b4480d1ef4374d8

Aliases

arxiv: 2505.17685 · arxiv_version: 2505.17685v3 · doi: 10.48550/arxiv.2505.17685 · pith_short_12: ZJBNQPUONLDW · pith_short_16: ZJBNQPUONLDWV5Z3 · pith_short_8: ZJBNQPUO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZJBNQPUONLDWV5Z3AE6W474BSZ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ca42d83e8e6ac76af73b013d6e7f8196479dcf2303c0a3983b4480d1ef4374d8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f88370d0b0b46e96089028241b155b25bbcc1577ea814812f1027878ddc78371",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-05-23T09:55:32Z",
    "title_canon_sha256": "9c96c70b800a60535f0848889775a5e46a9008d67b45de463b8b7b2d2da1b814"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.17685",
    "kind": "arxiv",
    "version": 3
  }
}