Will the real linda please stand up

Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L Oswald · arXiv 2404.01461

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

cs.CL · 2026-03-30 · unverdicted · novelty 6.0

LLMs prioritize surface heuristics such as distance cues over implicit constraints in reasoning tasks, with the new HOB benchmark showing no model exceeds 75% strict accuracy and hints recovering performance.

citing papers explorer

Showing 2 of 2 citing papers.

Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 117
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning cs.CL · 2026-03-30 · unverdicted · none · ref 19
LLMs prioritize surface heuristics such as distance cues over implicit constraints in reasoning tasks, with the new HOB benchmark showing no model exceeds 75% strict accuracy and hints recovering performance.

Will the real linda please stand up

fields

years

verdicts

representative citing papers

citing papers explorer