Fine-tuning LLMs on essays reduces variance in IPIP-NEO responses across models but does not raise full five-trait profile accuracy above near-chance levels from unguided text.
Jessica L Maples, Li Guan, Nathan T Carter, and Joshua D Miller
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces a French OSCE dialogue dataset of 240 interactions and a modular LLM-based controllable virtual patient generation system with multi-level LLM-as-Judge evaluation for clinical skills training.
The work establishes an evaluation framework for personality induction and switching in MLLMs, reporting improved captioning but impaired VQA performance plus balancing and residual effects during multi-trait and dynamic conditions.
citing papers explorer
-
Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?
Fine-tuning LLMs on essays reduces variance in IPIP-NEO responses across models but does not raise full five-trait profile accuracy above near-chance levels from unguided text.
-
A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training
Introduces a French OSCE dialogue dataset of 240 interactions and a modular LLM-based controllable virtual patient generation system with multi-level LLM-as-Judge evaluation for clinical skills training.
-
Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models
The work establishes an evaluation framework for personality induction and switching in MLLMs, reporting improved captioning but impaired VQA performance plus balancing and residual effects during multi-trait and dynamic conditions.