Vision-language models underperform specialized astronomical methods on real observational data, with accuracy improving when physical explanations are provided in prompts and when raw numerical measurements replace rendered plots.
AstroMMBench: A benchmark for evaluating multimodal large language models capabilities in astronomy
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
A systematic evaluation of vision-language models for observational astronomical reasoning tasks
Vision-language models underperform specialized astronomical methods on real observational data, with accuracy improving when physical explanations are provided in prompts and when raw numerical measurements replace rendered plots.