Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval
Pith reviewed 2026-06-30 23:22 UTC · model grok-4.3
The pith
Grounding synthetic personas in user study artifacts aligns their feedback with real expert concerns in genomics visualization retrieval, though both miss modality preferences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using Sycamore's three-condition probe on Geranium, the study finds that grounding synthetic personas with voice-of-customer artifacts from prior interviews shifts their feedback toward the language and concerns of documented users, ungrounded personas drift toward operational specifics not mentioned by real participants, and both synthetic conditions converge on a find-and-adapt frame while missing the image-modality preference observed in the expert study.
What carries the argument
The three-condition probe design that compares outputs from ungrounded synthetic personas, grounded synthetic personas constrained by voice-of-customer artifacts, and real expert baselines to characterize differences in evaluation feedback.
If this is right
- Grounding synthetic personas improves their alignment with real user concerns and language.
- Synthetic personas tend to converge on a find-and-adapt frame regardless of grounding.
- Real expert studies reveal preferences like image-modality that synthetic evaluators miss.
- Voice-of-customer artifacts from interviews can constrain synthetic personas effectively.
Where Pith is reading between the lines
- Synthetic personas could reduce the need for initial expert recruitment in visualization system evaluation by providing preliminary insights.
- Future designs might combine synthetic and real feedback to capture both common frames and modality-specific preferences.
- The convergence on find-and-adapt suggests this is a robust user need in genomics visualization retrieval worth prioritizing in system design.
Load-bearing premise
The published baseline study of real domain experts provides an unbiased and complete reference standard against which synthetic outputs can be compared without bias or incompleteness.
What would settle it
Replicating the expert study with a new cohort of genomics domain experts and finding that their concerns differ substantially from the published baseline or that synthetic personas also identify image-modality preferences.
Figures
read the original abstract
Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Sycamore, an exploratory three-condition probe design that evaluates the Geranium genomics visualization search engine using (1) ungrounded synthetic personas from generic LLM priors, (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study, and (3) a published baseline study of real domain experts. It reports that grounding shifts synthetic feedback toward the language and concerns of documented users, ungrounded evaluators drift toward operational specifics not raised by real participants, and both synthetic conditions converge on a find-and-adapt frame while missing the image-modality preference observed in the expert study. Supplemental materials are provided at OSF.
Significance. If the observations hold under more rigorous quantification, the work offers a useful framing for the appropriate role of synthetic personas alongside expert studies in niche-domain visualization evaluation, where expert recruitment is difficult. The open release of supplemental materials supports reproducibility and is a clear strength.
major comments (2)
- [Abstract] Abstract and probe description: the central observational claims (shifts in language/concerns, convergence on find-and-adapt, and missing image-modality preference) are presented as directional findings without quantitative metrics, inter-rater reliability details, prompt templates, or statistical comparisons, leaving the support for these claims unverifiable from the reported evidence.
- [Results and Discussion] Baseline comparison (throughout results and discussion): the claims of 'drift,' 'convergence,' and 'miss' treat the published expert study as a complete, unbiased reference standard, but the manuscript provides no sensitivity check, coverage validation, or discussion of potential selection/reporting biases in the baseline's participant pool or protocol, which is load-bearing for the comparative conclusions.
minor comments (1)
- [Methods] The three conditions could be summarized in a table for clearer side-by-side comparison of inputs, outputs, and observed differences.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the potential value of this work in framing the role of synthetic personas in niche-domain visualization evaluation. We address each major comment below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract and probe description: the central observational claims (shifts in language/concerns, convergence on find-and-adapt, and missing image-modality preference) are presented as directional findings without quantitative metrics, inter-rater reliability details, prompt templates, or statistical comparisons, leaving the support for these claims unverifiable from the reported evidence.
Authors: The study is explicitly framed as an exploratory three-condition probe rather than a confirmatory experiment, so the claims are intentionally directional and observational. Quantitative metrics, inter-rater reliability, and statistical comparisons are not applicable to this design and were not performed. To improve verifiability, we will expand the methods section to include the exact prompt templates used for both synthetic conditions and will ensure all analysis materials (including any coding schemes) are fully documented in the OSF supplement. We will also revise the abstract and discussion to more explicitly characterize the findings as qualitative observations. revision: partial
-
Referee: [Results and Discussion] Baseline comparison (throughout results and discussion): the claims of 'drift,' 'convergence,' and 'miss' treat the published expert study as a complete, unbiased reference standard, but the manuscript provides no sensitivity check, coverage validation, or discussion of potential selection/reporting biases in the baseline's participant pool or protocol, which is load-bearing for the comparative conclusions.
Authors: We agree that the comparative claims rest on the published expert study serving as a reference point and that potential biases in its participant pool or protocol should be addressed. Because the baseline is a previously published study, we lack access to its raw data and therefore cannot conduct new sensitivity or coverage analyses. We will add a dedicated limitations paragraph in the discussion that explicitly discusses possible selection and reporting biases in the baseline and how they could influence the observed differences. This will qualify the language around 'drift,' 'convergence,' and 'miss' to reflect the exploratory nature of the comparison. revision: partial
- Access to the raw participant data and protocol details from the published baseline expert study, which would be required to perform sensitivity checks or coverage validation.
Circularity Check
No circularity: exploratory comparison to external published baseline
full rationale
The paper conducts an empirical, qualitative comparison across three conditions (ungrounded synthetic personas, grounded synthetic personas using prior interview artifacts, and a published real-expert baseline study). There are no equations, fitted parameters, predictions, or first-principles derivations. The central observations (shifts in language/concerns, convergence on find-and-adapt frame, missing image-modality preference) are direct thematic comparisons of outputs against the external baseline, not reductions of any result to the paper's own inputs by construction. The reference to prior work functions as an independent benchmark rather than a self-definitional or load-bearing premise that forces the outcome. This matches the default case of a self-contained empirical probe with external anchors.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-generated personas can produce evaluable feedback on visualization systems when prompted appropriately
Forward citations
Cited by 1 Pith paper
-
Through the WordStream Glass: Revisiting Quantitative Encoding for Qualitative Learning Analytics
A study of 10 experts reveals disagreement on whether frequency visualizations aid or hinder qualitative analysis of student responses in learning analytics tools.
Reference graph
Works this paper leans on
-
[1]
ATLAS.ti Mac
ATLAS.ti Scientific Software Development GmbH. ATLAS.ti Mac. https://atlasti.com, 2024. Version 24.0.1. 3
2024
-
[2]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhari- wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. A...
2020
-
[3]
Cooper et al.The inmates are running the asylum: Why high-tech products drive us crazy and how to restore the sanity, vol
A. Cooper et al.The inmates are running the asylum: Why high-tech products drive us crazy and how to restore the sanity, vol. 2. Sams Indianapolis, 2004. 2
2004
-
[4]
A. Crisan, B. Fiore-Gartland, and M. Tory. Passing the data baton: A retrospective analysis on data science work and workers.IEEE Transactions on Visualization and Computer Graphics, 27(2):1860– 1870, 2021. doi: 10.1109/TVCG.2020.3030340 2
-
[5]
B. Gao, Z. Zeng, Y . Yu, I. P. Werry, C. L. Chan, M. Chen, H. Zhang, B. Huang, J. Ji, C. Leung, and C. Miao. ”it seems to understand my heart”: An empirical study of persona-driven persuasive ai agent for aging-in-place in singapore. InProceedings of the 2026 CHI Confer- ence on Human Factors in Computing Systems, CHI ’26. Association for Computing Machin...
-
[6]
S. Jain, C. Park, M. Viana, A. Wilson, and D. Calacci. Interaction con- text often increases sycophancy in llms. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems, CHI ’26. Association for Computing Machinery, New York, NY , USA, 2026. doi: 10.1145/3772318.3791915 4
-
[7]
Are AI-Generated Synthetic Users Replacing Personas? What UX Designers Need to Know, 2024
James Newhook, Interaction Design Foundation. Are AI-Generated Synthetic Users Replacing Personas? What UX Designers Need to Know, 2024. 1
2024
-
[8]
you always get an answer
I. Kaate, J. Salminen, S.-G. Jung, T. T. T. Xuan, E. H ¨ayh¨anen, J. Y . Azem, and B. J. Jansen. “you always get an answer”: Analyzing users’ interaction with ai-generated personas given unanswerable questions and risk of hallucination. InProceedings of the 30th International Conference on Intelligent User Interfaces, pp. 1624–1638, 2025. 2
2025
-
[9]
A. B. Kocaballi, M. Prpa, J. Salminen, D. Amin, and B. J Jansen. From generation to simulation: Responsible use of ai personas in human- centered design and research. InProceedings of the Extended Ab- stracts of the 2026 CHI Conference on Human Factors in Computing Systems, CHI EA ’26. Association for Computing Machinery, New York, NY , USA, 2026. doi: 10...
-
[10]
Krzywinski, J
M. Krzywinski, J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Hors- man, S. J. Jones, and M. A. Marra. Circos: an information aesthetic for comparative genomics.Genome research, 19(9):1639–1645, 2009. 1
2009
-
[11]
S. LYi, Q. Wang, F. Lekschas, and N. Gehlenborg. Gosling: A grammar-based toolkit for scalable and interactive genomics data visualization.IEEE Transactions on Visualization and Computer Graphics, 28(1):140–150, 2021. 1, 2
2021
-
[12]
S. L’Yi, A. van den Brandt, E. Adams, H. N. Nguyen, and N. Gehlen- borg. Learnable and expressive visualization authoring through blended interfaces.IEEE Transactions on Visualization and Computer Graphics, 31(1):459–469, 2025. doi: 10.1109/TVCG.2024.3456598 1
-
[13]
S. L’Yi, Q. Wang, and N. Gehlenborg. The role of visualization in genomics data analysis workflows: The interviews. In2023 IEEE Visualization and Visual Analytics (VIS), pp. 101–105. IEEE, 2023. 2
2023
-
[14]
H. N. Nguyen and N. Gehlenborg. Safire: Similarity framework for visualization retrieval. In2025 IEEE Visualization and Visual Ana- lytics (VIS), pp. 246–250, 2025. doi: 10.1109/VIS60296.2025.00055 2
-
[15]
H. N. Nguyen and N. Gehlenborg. Visualization retrieval for data literacy: Position paper.CHI 2026 Workshop on Data Literacy, Mar
2026
-
[16]
doi: 10.48550/arXiv.2604.09598 4
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.09598
-
[17]
H. N. Nguyen, S. L’Yi, T. C. Smits, S. Gao, M. Zitnik, and N. Gehlen- borg. Geranium: Multimodal retrieval of genomics data visualiza- tions.IEEE Transactions on Visualization and Computer Graphics, pp. 1–17, 2026. doi: 10.1109/TVCG.2026.3683429 1, 2, 3
-
[18]
A. Pandey, S. L’Yi, Q. Wang, M. A. Borkin, and N. Gehlenborg. Genorec: A recommendation system for interactive genomics data visualization.IEEE Transactions on Visualization and Computer Graphics, 29(1):570–580, 2023. doi: 10.1109/TVCG.2022.3209407 2
-
[19]
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behav- ior. UIST ’23. Association for Computing Machinery, New York, NY , USA, 2023. doi: 10.1145/3586183.3606763 2
-
[20]
J. S. Park, C. Q. Zou, J. Kamphorst, N. Egan, A. Shaw, B. M. Hill, C. Cai, M. R. Morris, P. Liang, R. Willer, and M. S. Bernstein. Llm agents grounded in self-reports enable general-purpose simulation of individuals, 2026. 2
2026
-
[21]
J. Salminen, C. Liu, W. Pian, J. Chi, E. H ¨ayh¨anen, and B. J. Jansen. Deus ex machina and personas from large language models: Investi- gating the composition of ai-generated persona descriptions. InPro- ceedings of the 2024 CHI Conference on Human Factors in Comput- ing Systems, CHI ’24. Association for Computing Machinery, New York, NY , USA, 2024. do...
-
[22]
Thorvaldsd ´ottir, J
H. Thorvaldsd ´ottir, J. T. Robinson, and J. P. Mesirov. Integrative ge- nomics viewer (igv): high-performance genomics data visualization and exploration.Briefings in bioinformatics, 14(2):178–192, 2013. 1, 2
2013
-
[23]
M. Truss. Personacite: V oc-grounded interviewable agentic synthetic ai personas for verifiable user and design research. InProceedings of the Extended Abstracts of the 2026 CHI Conference on Human Fac- tors in Computing Systems, pp. 1–7, 2026. 2, 3
2026
-
[24]
van den Brandt, S
A. van den Brandt, S. L’Yi, H. N. Nguyen, A. Vilanova, and N. Gehlenborg. Understanding visualization authoring techniques for genomics data in the context of personas and tasks.IEEE Trans- actions on Visualization and Computer Graphics, 31(1):1180–1190,
-
[25]
doi: 10.1109/TVCG.2024.3456298 1, 2, 3
-
[26]
Welch, F
L. Welch, F. Lewitter, R. Schwartz, C. Brooksbank, P. Radivojac, B. Gaeta, and M. V . Schneider. Bioinformatics curriculum guidelines: toward a definition of core competencies.PLOS computational biol- ogy, 10(3):e1003496, 2014. 2
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.