SciBench shows current LLMs reach at most 43.22% accuracy on curated collegiate scientific problems and reveals no prompting strategy dominates across all required skills.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
SciBench shows current LLMs reach at most 43.22% accuracy on curated collegiate scientific problems and reveals no prompting strategy dominates across all required skills.