LLMs show heterogeneous robustness to five types of chain-of-thought perturbations, with MathError causing 50-60% accuracy loss in small models but scaling benefits, UnitConversion remaining hard across sizes, and ExtraSteps causing minimal degradation.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.
FACET is a multi-agent AI system developed with educational stakeholders that coordinates four agents in a teacher-in-the-loop design to enable differentiated learning materials for heterogeneous classrooms.
citing papers explorer
-
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
LLMs show heterogeneous robustness to five types of chain-of-thought perturbations, with MathError causing 50-60% accuracy loss in small models but scaling benefits, UnitConversion remaining hard across sizes, and ExtraSteps causing minimal degradation.
-
Artificial Phantasia: Emergent Mental Imagery in Large Language Models
LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.
-
FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students
FACET is a multi-agent AI system developed with educational stakeholders that coordinates four agents in a teacher-in-the-loop design to enable differentiated learning materials for heterogeneous classrooms.
- Human Psychometric Questionnaires Mischaracterize LLM Behavior