Turning large language models into cognitive models

Marcel Binz, Eric Schulz · 2023 · arXiv 2306.03917

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry Design

cs.HC · 2026-02-06 · unverdicted · novelty 7.0

WSTypist is a new RL-based simulation model that reproduces human-like word suggestion strategies, individual differences, and adaptation to design changes in mobile text entry.

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs

cs.AI · 2026-05-28 · unverdicted · novelty 5.0

Value-prompted LLMs align with human value structures and value-behavior relationships, and incorporating human value distributions improves population-level simulations.

citing papers explorer

Showing 4 of 4 citing papers after filters.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection cs.CL · 2024-10-06 · unverdicted · none · ref 8
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry Design cs.HC · 2026-02-06 · unverdicted · none · ref 8
WSTypist is a new RL-based simulation model that reproduces human-like word suggestion strategies, individual differences, and adaptation to design changes in mobile text entry.
Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games cs.AI · 2026-06-16 · unverdicted · none · ref 49
Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.
Teaching Values to Machines: Simulating Human-Like Behavior in LLMs cs.AI · 2026-05-28 · unverdicted · none · ref 2
Value-prompted LLMs align with human value structures and value-behavior relationships, and incorporating human value distributions improves population-level simulations.

Turning large language models into cognitive models

fields

years

verdicts

representative citing papers

citing papers explorer