VLMs show partial alignment with children's performance on six cognitive tasks, with stronger models matching better at task and item levels but struggling on matrix reasoning and mental rotation.
Sparks, Zi Yin, Virginia A
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Current VLMs depend on tightly aligned curated data and cannot exploit the weakly-aligned egocentric video signals that dominate naturalistic infant input.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
citing papers explorer
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.