none"), used for the qualitative error analysis in sections 4.2 and 4.3. The columns (med) and (high) report GPT -5.2 accuracy at reasoning_effort =

etufino · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Performance and failure modes of AI chatbots on a novel concept inventory on relativity in classical mechanics

physics.ed-ph · 2026-05-10 · accept · novelty 8.0

Frontier LLMs score 73-97% on a novel relativity concept inventory but fail entirely on a few items due to visual misinterpretation, with more consistent errors than students.

citing papers explorer

Showing 1 of 1 citing paper.

Performance and failure modes of AI chatbots on a novel concept inventory on relativity in classical mechanics physics.ed-ph · 2026-05-10 · accept · none · ref 19
Frontier LLMs score 73-97% on a novel relativity concept inventory but fail entirely on a few items due to visual misinterpretation, with more consistent errors than students.

none"), used for the qualitative error analysis in sections 4.2 and 4.3. The columns (med) and (high) report GPT -5.2 accuracy at reasoning_effort =

fields

years

verdicts

representative citing papers

citing papers explorer