When Do LLM Personas Support Visualization Design? A Cross-Model Study of Color Assignment and Chart Choice
Pith reviewed 2026-07-03 05:44 UTC · model grok-4.3
The pith
LLM personas produce personality-linked color assignments only for certain models while chart choices depend mainly on task context.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Personality-color coupling is highly model-configuration dependent: absent in GPT-4o-mini for all six concepts, consistent in GPT-4.1-mini across all six, and partial in GPT-5-mini for two of six. Concept type further shapes the signal: for abstract concepts, personality explains more hue variance than model identity, while concrete concepts show smaller and comparable effects. In chart choice, trait-aligned cluster aggregation produces stable top-idiom rankings across all nine cluster-context combinations, but a no-persona baseline recovers the same top choice in 8 of 9 model-context cells, indicating that task context drives rank-1 selection more than personality.
What carries the argument
Big Five personality profiles conditioned into LLM prompts for color assignment and chart preference tasks.
If this is right
- Multi-model testing is required before treating LLM persona outputs as reliable for visualization design.
- Abstract concepts produce clearer personality signals in color choices than concrete concepts.
- No-persona baselines must be included to isolate whether personality actually changes chart recommendations.
- LLM personas function as exploratory probes rather than substitutes for human participants in visualization studies.
Where Pith is reading between the lines
- Design workflows could benefit from choosing specific model versions depending on whether personality variation is wanted in early color explorations.
- The results raise the possibility that prompt refinements might reduce the observed model-to-model differences in personality effects.
- If task context dominates chart selection, persona conditioning may add little value for idiom recommendation systems.
Load-bearing premise
The chosen Big Five profiles and task framings accurately capture personality effects relevant to visualization design without being artifacts of the specific prompt structures or model versions tested.
What would settle it
Re-running the color assignment tasks on GPT-4o-mini and observing consistent personality-color coupling across the six concepts would undermine the finding of high model dependence.
Figures
read the original abstract
Large language model personas are increasingly used to approximate diverse users during early-stage visualization design, but it remains unclear whether persona-conditioned outputs reflect stable personality effects or artifacts of model choice and task framing. We examine this question across two visualization-relevant tasks: color assignment for abstract and concrete concepts, and chart-idiom preference ratings across task contexts. Using 43 Big Five profiles across GPT-4o-mini, GPT-4.1-mini, and GPT-5-mini, we find that personality-color coupling is highly model-configuration dependent: absent in GPT-4o-mini for all six concepts, consistent in GPT-4.1-mini across all six, and partial in GPT-5-mini for two of six. Concept type further shapes the signal: for abstract concepts, personality explains more hue variance than model identity, while concrete concepts show smaller and comparable effects. In chart choice, trait-aligned cluster aggregation produces stable top-idiom rankings across all nine cluster-context combinations, but a no-persona baseline recovers the same top choice in 8 of 9 model-context cells, indicating that task context drives rank-1 selection more than personality. These findings position LLM personas as exploratory probes for visualization design, not substitutes for human participants, and motivate multi-model testing, concept-type disaggregation, and no-persona baselines in future studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLM personas derived from Big Five personality profiles produce highly model-dependent effects in two visualization design tasks (color assignment to abstract/concrete concepts and chart-idiom preference). Across GPT-4o-mini, GPT-4.1-mini, and GPT-5-mini with 43 profiles, personality-color coupling is absent in one model, consistent in another, and partial in the third; abstract concepts show stronger personality effects than concrete ones. In chart choice, trait-aligned clusters yield stable top rankings, yet a no-persona baseline matches the top choice in 8/9 cells, indicating task context dominates personality. The work positions LLM personas as exploratory probes rather than human substitutes and calls for multi-model testing and baselines.
Significance. If the empirical patterns hold after methodological clarification, the result is significant for HCI and visualization research because it supplies concrete evidence that persona effects are unstable across models and often weaker than task framing. This directly informs the growing practice of using LLMs to simulate user diversity in early design and supplies actionable guidance (no-persona baselines, concept-type disaggregation) that can be adopted immediately in follow-on studies.
major comments (1)
- [Abstract] Abstract: the directional claims (model-dependent color coupling, task-context dominance in chart choice) are presented without any information on experimental design, statistical tests, sample sizes per cell, prompt templates, or data-exclusion criteria. This absence is load-bearing because it prevents evaluation of whether the reported patterns are supported by the data.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of the findings for HCI and visualization research. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the directional claims (model-dependent color coupling, task-context dominance in chart choice) are presented without any information on experimental design, statistical tests, sample sizes per cell, prompt templates, or data-exclusion criteria. This absence is load-bearing because it prevents evaluation of whether the reported patterns are supported by the data.
Authors: We agree that the abstract omits these details, as is conventional for length-constrained abstracts (typically 150-250 words). The full manuscript provides them in Section 3 (Methods): experimental design uses 43 Big Five personality profiles (derived from standard inventories) applied to three models (GPT-4o-mini, GPT-4.1-mini, GPT-5-mini) across two tasks; statistical tests include variance partitioning (personality vs. model identity) for color assignments and rank-stability analysis for chart choices; sample size is 43 profiles per model-task cell with no data exclusion (all LLM outputs retained); prompt templates appear in Appendix A; and the no-persona baseline is explicitly described. The directional claims are directly supported by the quantitative results in Sections 4 and 5. We are willing to revise the abstract to include a one-sentence methods summary (e.g., "Across 43 Big Five profiles and three models...") if the editor prefers, though this would require trimming other content. revision: partial
Circularity Check
No significant circularity; empirical observations only
full rationale
This is an empirical comparison study that reports direct outputs from LLM queries across models, personas, and tasks. No equations, derivations, fitted parameters, or self-citation chains are used to support the central claims. All reported patterns (model-dependent color coupling, concept-type effects, baseline comparisons) are presented as observations from the experimental runs rather than reductions to prior inputs or definitions. The study is self-contained against its own data collection protocol.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Big Five personality model provides a valid basis for constructing LLM personas that approximate human trait effects in design tasks
Reference graph
Works this paper leans on
- [1]
-
[2]
Alves, B
T. Alves, B. Ramalho, D. Gonc ¸alves, S. Gama, and J. Henriques- Calado. Exploring how personality models information visualization preferences. In2020 IEEE Visualization Conference (VIS), pp. 201–
-
[3]
L. P. Argyle, E. C. Busby, N. Fulda, J. R. Gubler, C. Rytting, and D. Wingate. Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023. doi: 10. 1017/pan.2023.2 1
2023
-
[4]
R. Beran. Minimum Hellinger distance estimates for parametric mod- els.The Annals of Statistics, 5(3):445–463, 1977. doi: 10.1214/aos/ 1176343842 2
-
[5]
Cheng, E
M. Cheng, E. Durmus, and D. Jurafsky. Marked personas: Using natural language prompts to measure stereotypes in language mod- els. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1504–1532, 2023. doi: 10. 18653/v1/2023.acl-long.84 1, 5
2023
-
[6]
de Borda.M ´emoire sur les ´elections au scrutin
J.-C. de Borda.M ´emoire sur les ´elections au scrutin. Histoire de l’Acad´emie Royale des Sciences, Paris, 1781. 3
-
[7]
D. Dillion, N. Tandon, Y . Gu, and K. Gray. Can AI language models replace human participants?Trends in Cognitive Sciences, 27(7):597– 600, 2023. doi: 10.1016/j.tics.2023.04.008 1
-
[8]
Gupta, V
S. Gupta, V . Shrivastava, A. Deshpande, A. Kalyan, P. Clark, A. Sab- harwal, and T. Khot. Bias runs deep: Implicit reasoning biases in persona-assigned LLMs. InThe Twelfth International Conference on Learning Representations (ICLR), 2024. 1, 5
2024
-
[9]
Jiang, M
G. Jiang, M. Xu, S.-C. Zhu, W. Han, C. Zhang, and Y . Zhu. Evaluating and inducing personality in pre-trained language models.Advances in Neural Information Processing Systems, 36, 2024. 1
2024
-
[10]
O. P. John and S. Srivastava. The big five trait taxonomy: History, measurement, and theoretical perspectives. 1999. 1
1999
-
[11]
Character Sequence Models for ColorfulWords
K. Kawakami, C. Dyer, B. R. Routledge, and N. A. Smith. Character sequence models for colorful words.ArXiv, abs/1609.08777, 2016. 1
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [12]
-
[13]
N. Mantel. The detection of disease clustering and a generalized re- gression approach.Cancer Research, 27(2):209–220, Feb. 1967. 2
1967
-
[14]
Marjieh, I
R. Marjieh, I. Sucholutsky, P. van Rijn, N. Jacoby, and T. L. Griffiths. Large language models predict human sensory judgments across six modalities.Scientific Reports, 14(1):21445, 2024. 1
2024
-
[15]
Colourful Language: Measuring Word-Colour Associations
S. Mohammad. Colourful language: Measuring word-colour associa- tions.arXiv preprint arXiv:1309.5942, 2013. 1
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[16]
Niszczota, M
P. Niszczota, M. Janczak, and M. Misiak. Large language models can replicate cross-cultural differences in personality.Journal of Research in Personality, p. 104584, 2025. 1
2025
-
[17]
J. S. Park, C. Q. Zou, A. Shaw, B. M. Hill, C. Cai, M. R. Morris, R. Willer, P. Liang, and M. S. Bernstein. Generative agent simulations of 1,000 people.arXiv preprint arXiv:2411.10109, 2024. 1
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
P. J. Rentfrow, S. D. Gosling, M. Jokela, D. J. Stillwell, M. Kosin- ski, and J. Potter. Divided we stand: three psychological regions of the united states and their political, economic, social, and health cor- relates.Journal of Personality and Social Psychology, 105(6):996,
-
[19]
S. Salim, T. Pial, and K. Mueller. What is the color of serendipity? in- vestigating the use of language models for semantically resonant color generation.IEEE Transactions on Visualization and Computer Graph- ics, 32(1):670–680, 2026. doi: 10.1109/TVCG.2025.3634243 1, 2
-
[20]
G. Serapio-Garc ´ıa, M. Safdari, C. Crepy, L. Sun, S. Fitz, M. Abdulhai, A. Faust, and M. Matari´c. Personality traits in large language models. arXiv preprint arXiv:2307.00184, 2023. 1
- [21]
- [22]
-
[23]
Zheng, W.-L
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Sto- ica. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Advances in Neural Information Processing Systems (NeurIPS), 2023. 1 5
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.