pith. sign in
Pith Number

pith:AB5OYI3Q

pith:2025:AB5OYI3QWC6J5DBHFP374UCGTC
not attested not anchored not stored refs resolved

Video models are zero-shot learners and reasoners

Been Kim, Kevin Swersky, Nick Matarese, Paul Vicol, Priyank Jaini, Robert Geirhos, Shixiang Shane Gu, Thadd\"aus Wiedemer, Yuxuan Li

Generative video models like Veo 3 perform zero-shot object segmentation, edge detection, physics understanding, affordance recognition, tool simulation, and early visual reasoning such as maze and symmetry solving.

arxiv:2509.20328 v2 · 2025-09-24 · cs.LG · cs.AI · cs.CV · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{AB5OYI3QWC6J5DBHFP374UCGTC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities enable early forms of visual reasoning like maze and symmetry solving.

C2weakest assumption

That the demonstrated capabilities are genuinely zero-shot and not the result of implicit task information in the prompts, data contamination, or post-hoc selection of successful examples.

C3one line summary

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

References

98 extracted · 98 resolved · 15 Pith anchors

[1] A Survey on Large Language Models for Code Generation 2024 · arXiv:2406.00515
[2] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities 2025 · arXiv:2507.06261
[3] Weaver: Foundation models for creative writing 2024
[4] Multilingual machine translation with large language models: Empirical results and analysis 2023
[5] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 2024 · arXiv:2408.06292

Formal links

1 machine-checked theorem link

Cited by

48 papers in Pith

Receipt and verification
First computed 2026-05-18T02:40:55.632229Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

007aec2370b0bc9e8c272bf7fe504698857211fc963f2d5295a18ea4842ad671

Aliases

arxiv: 2509.20328 · arxiv_version: 2509.20328v2 · doi: 10.48550/arxiv.2509.20328 · pith_short_12: AB5OYI3QWC6J · pith_short_16: AB5OYI3QWC6J5DBH · pith_short_8: AB5OYI3Q
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AB5OYI3QWC6J5DBHFP374UCGTC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 007aec2370b0bc9e8c272bf7fe504698857211fc963f2d5295a18ea4842ad671
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "21c914ce64dfc70a482581f938246ca5411da0dccdb7629a162004911adf78f4",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV",
      "cs.RO"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-09-24T17:17:27Z",
    "title_canon_sha256": "71d3cd4322b4002941d6b5b5741e8571148bd602b0cb7ea08d2b9a2ffff8c90e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.20328",
    "kind": "arxiv",
    "version": 2
  }
}