Large language model for science: A study on P vs

Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei · 2022 · arXiv 2309.05689

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

cs.CV · 2023-10-03 · accept · novelty 8.0

MathVista benchmark shows GPT-4V achieves 49.9% accuracy on visual mathematical reasoning tasks, outperforming other models but trailing humans by 10.4%.

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

cs.AI · 2024-10-06 · unverdicted · novelty 7.0

PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.

citing papers explorer

Showing 2 of 2 citing papers.

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts cs.CV · 2023-10-03 · accept · none · ref 2
MathVista benchmark shows GPT-4V achieves 49.9% accuracy on visual mathematical reasoning tasks, outperforming other models but trailing humans by 10.4%.
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark cs.AI · 2024-10-06 · unverdicted · none · ref 12
PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.

Large language model for science: A study on P vs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer