StratMem-Bench reveals that state-of-the-art LLMs distinguish required from irrelevant memories effectively but struggle to integrate supportive memories in character conversations.
InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 14017–14046, Bangkok, Thailand
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.
citing papers explorer
-
StratMem-Bench: Evaluating Strategic Memory Use in Virtual Character Conversation Beyond Factual Recall
StratMem-Bench reveals that state-of-the-art LLMs distinguish required from irrelevant memories effectively but struggle to integrate supportive memories in character conversations.
-
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
-
Inertia in Moral and Value Judgments of Large Language Models
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.