LLMs show heterogeneous robustness to five types of chain-of-thought perturbations, with MathError causing 50-60% accuracy loss in small models but scaling benefits, UnitConversion remaining hard across sizes, and ExtraSteps causing minimal degradation.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
representative citing papers
LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.
FACET is a multi-agent AI system developed with educational stakeholders that coordinates four agents in a teacher-in-the-loop design to enable differentiated learning materials for heterogeneous classrooms.