Frontier LLMs score 73-97% on a novel relativity concept inventory but fail entirely on a few items due to visual misinterpretation, with more consistent errors than students.
Educational data augmentation in physics education research using ChatGPT
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
physics.ed-ph 1years
2026 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Performance and failure modes of AI chatbots on a novel concept inventory on relativity in classical mechanics
Frontier LLMs score 73-97% on a novel relativity concept inventory but fail entirely on a few items due to visual misinterpretation, with more consistent errors than students.