VLMs show partial alignment with children's performance on six cognitive tasks, with stronger models matching better at task and item levels but struggling on matrix reasoning and mental rotation.
Sparks, Zi Yin, Virginia A
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Current VLMs depend on tightly aligned curated data and cannot exploit the weakly-aligned egocentric video signals that dominate naturalistic infant input.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
citing papers explorer
-
LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")
VLMs show partial alignment with children's performance on six cognitive tasks, with stronger models matching better at task and item levels but struggling on matrix reasoning and mental rotation.
-
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
Current VLMs depend on tightly aligned curated data and cannot exploit the weakly-aligned egocentric video signals that dominate naturalistic infant input.