pith. sign in

arxiv: 2605.08630 · v2 · pith:NLWJL6XVnew · submitted 2026-05-09 · 💻 cs.HC

Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval

Pith reviewed 2026-06-30 23:22 UTC · model grok-4.3

classification 💻 cs.HC
keywords synthetic personasLLM-based evaluationgenomics visualizationuser evaluationvisualization systemsdomain expertsfeedback alignmentmultimodal retrieval
0
0 comments X

The pith

Grounding synthetic personas in user study artifacts aligns their feedback with real expert concerns in genomics visualization retrieval, though both miss modality preferences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how synthetic personas perform when evaluating a genomics visualization search engine compared to real domain experts. It uses a three-condition design with ungrounded LLMs, grounded ones using prior interview data, and a real expert baseline. Grounding makes synthetic feedback match user language and concerns better, while ungrounded versions highlight operational issues not raised by experts. Both synthetic approaches settle on a find-and-adapt model and overlook experts' preference for image modalities. This setup helps clarify the role of synthetic evaluators alongside scarce expert studies in specialized domains.

Core claim

Using Sycamore's three-condition probe on Geranium, the study finds that grounding synthetic personas with voice-of-customer artifacts from prior interviews shifts their feedback toward the language and concerns of documented users, ungrounded personas drift toward operational specifics not mentioned by real participants, and both synthetic conditions converge on a find-and-adapt frame while missing the image-modality preference observed in the expert study.

What carries the argument

The three-condition probe design that compares outputs from ungrounded synthetic personas, grounded synthetic personas constrained by voice-of-customer artifacts, and real expert baselines to characterize differences in evaluation feedback.

If this is right

  • Grounding synthetic personas improves their alignment with real user concerns and language.
  • Synthetic personas tend to converge on a find-and-adapt frame regardless of grounding.
  • Real expert studies reveal preferences like image-modality that synthetic evaluators miss.
  • Voice-of-customer artifacts from interviews can constrain synthetic personas effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Synthetic personas could reduce the need for initial expert recruitment in visualization system evaluation by providing preliminary insights.
  • Future designs might combine synthetic and real feedback to capture both common frames and modality-specific preferences.
  • The convergence on find-and-adapt suggests this is a robust user need in genomics visualization retrieval worth prioritizing in system design.

Load-bearing premise

The published baseline study of real domain experts provides an unbiased and complete reference standard against which synthetic outputs can be compared without bias or incompleteness.

What would settle it

Replicating the expert study with a new cohort of genomics domain experts and finding that their concerns differ substantially from the published baseline or that synthetic personas also identify image-modality preferences.

Figures

Figures reproduced from arXiv: 2605.08630 by Astrid van den Brandt, Huyen N. Nguyen, Nils Gehlenborg.

Figure 1
Figure 1. Figure 1: Diagram of the Sycamore system. Left: three-condition evaluation on a common visualization retrieval system to address [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The object of evaluation, Geranium [16] multimodal retrieval system. Users can search with text, image, or Gosling specification and will rank their modality preferences at the end of the evaluation. 3.2 Evaluation Protocol Sycamore adopts the published Geranium user study [16] in two roles. First, its protocol provides the shared procedure that all three conditions follow. Second, its reported findings pr… view at source ↗
Figure 3
Figure 3. Figure 3: Four-step pipeline for instantiating synthetic evaluators: [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The Sycamore session viewer streaming a grounded evaluator (CB1, Computational Biologist) through the Geranium protocol. (1) [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
read the original abstract

Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents Sycamore, an exploratory three-condition probe design that evaluates the Geranium genomics visualization search engine using (1) ungrounded synthetic personas from generic LLM priors, (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study, and (3) a published baseline study of real domain experts. It reports that grounding shifts synthetic feedback toward the language and concerns of documented users, ungrounded evaluators drift toward operational specifics not raised by real participants, and both synthetic conditions converge on a find-and-adapt frame while missing the image-modality preference observed in the expert study. Supplemental materials are provided at OSF.

Significance. If the observations hold under more rigorous quantification, the work offers a useful framing for the appropriate role of synthetic personas alongside expert studies in niche-domain visualization evaluation, where expert recruitment is difficult. The open release of supplemental materials supports reproducibility and is a clear strength.

major comments (2)
  1. [Abstract] Abstract and probe description: the central observational claims (shifts in language/concerns, convergence on find-and-adapt, and missing image-modality preference) are presented as directional findings without quantitative metrics, inter-rater reliability details, prompt templates, or statistical comparisons, leaving the support for these claims unverifiable from the reported evidence.
  2. [Results and Discussion] Baseline comparison (throughout results and discussion): the claims of 'drift,' 'convergence,' and 'miss' treat the published expert study as a complete, unbiased reference standard, but the manuscript provides no sensitivity check, coverage validation, or discussion of potential selection/reporting biases in the baseline's participant pool or protocol, which is load-bearing for the comparative conclusions.
minor comments (1)
  1. [Methods] The three conditions could be summarized in a table for clearer side-by-side comparison of inputs, outputs, and observed differences.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive review and for recognizing the potential value of this work in framing the role of synthetic personas in niche-domain visualization evaluation. We address each major comment below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract and probe description: the central observational claims (shifts in language/concerns, convergence on find-and-adapt, and missing image-modality preference) are presented as directional findings without quantitative metrics, inter-rater reliability details, prompt templates, or statistical comparisons, leaving the support for these claims unverifiable from the reported evidence.

    Authors: The study is explicitly framed as an exploratory three-condition probe rather than a confirmatory experiment, so the claims are intentionally directional and observational. Quantitative metrics, inter-rater reliability, and statistical comparisons are not applicable to this design and were not performed. To improve verifiability, we will expand the methods section to include the exact prompt templates used for both synthetic conditions and will ensure all analysis materials (including any coding schemes) are fully documented in the OSF supplement. We will also revise the abstract and discussion to more explicitly characterize the findings as qualitative observations. revision: partial

  2. Referee: [Results and Discussion] Baseline comparison (throughout results and discussion): the claims of 'drift,' 'convergence,' and 'miss' treat the published expert study as a complete, unbiased reference standard, but the manuscript provides no sensitivity check, coverage validation, or discussion of potential selection/reporting biases in the baseline's participant pool or protocol, which is load-bearing for the comparative conclusions.

    Authors: We agree that the comparative claims rest on the published expert study serving as a reference point and that potential biases in its participant pool or protocol should be addressed. Because the baseline is a previously published study, we lack access to its raw data and therefore cannot conduct new sensitivity or coverage analyses. We will add a dedicated limitations paragraph in the discussion that explicitly discusses possible selection and reporting biases in the baseline and how they could influence the observed differences. This will qualify the language around 'drift,' 'convergence,' and 'miss' to reflect the exploratory nature of the comparison. revision: partial

standing simulated objections not resolved
  • Access to the raw participant data and protocol details from the published baseline expert study, which would be required to perform sensitivity checks or coverage validation.

Circularity Check

0 steps flagged

No circularity: exploratory comparison to external published baseline

full rationale

The paper conducts an empirical, qualitative comparison across three conditions (ungrounded synthetic personas, grounded synthetic personas using prior interview artifacts, and a published real-expert baseline study). There are no equations, fitted parameters, predictions, or first-principles derivations. The central observations (shifts in language/concerns, convergence on find-and-adapt frame, missing image-modality preference) are direct thematic comparisons of outputs against the external baseline, not reductions of any result to the paper's own inputs by construction. The reference to prior work functions as an independent benchmark rather than a self-definitional or load-bearing premise that forces the outcome. This matches the default case of a self-contained empirical probe with external anchors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical HCI probe study whose central observations rest on the assumption that LLM outputs can be meaningfully compared to human expert feedback and that the prior interview artifacts are representative.

axioms (1)
  • domain assumption LLM-generated personas can produce evaluable feedback on visualization systems when prompted appropriately
    Foundational premise enabling the entire three-condition comparison.

pith-pipeline@v0.9.1-grok · 5775 in / 1314 out tokens · 25835 ms · 2026-06-30T23:22:55.414112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Through the WordStream Glass: Revisiting Quantitative Encoding for Qualitative Learning Analytics

    cs.CY 2026-06 unverdicted novelty 4.0

    A study of 10 experts reveals disagreement on whether frequency visualizations aid or hinder qualitative analysis of student responses in learning analytics tools.

Reference graph

Works this paper leans on

26 extracted references · 12 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    ATLAS.ti Mac

    ATLAS.ti Scientific Software Development GmbH. ATLAS.ti Mac. https://atlasti.com, 2024. Version 24.0.1. 3

  2. [2]

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhari- wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. A...

  3. [3]

    Cooper et al.The inmates are running the asylum: Why high-tech products drive us crazy and how to restore the sanity, vol

    A. Cooper et al.The inmates are running the asylum: Why high-tech products drive us crazy and how to restore the sanity, vol. 2. Sams Indianapolis, 2004. 2

  4. [4]

    Crisan, B

    A. Crisan, B. Fiore-Gartland, and M. Tory. Passing the data baton: A retrospective analysis on data science work and workers.IEEE Transactions on Visualization and Computer Graphics, 27(2):1860– 1870, 2021. doi: 10.1109/TVCG.2020.3030340 2

  5. [5]

    B. Gao, Z. Zeng, Y . Yu, I. P. Werry, C. L. Chan, M. Chen, H. Zhang, B. Huang, J. Ji, C. Leung, and C. Miao. ”it seems to understand my heart”: An empirical study of persona-driven persuasive ai agent for aging-in-place in singapore. InProceedings of the 2026 CHI Confer- ence on Human Factors in Computing Systems, CHI ’26. Association for Computing Machin...

  6. [6]

    S. Jain, C. Park, M. Viana, A. Wilson, and D. Calacci. Interaction con- text often increases sycophancy in llms. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems, CHI ’26. Association for Computing Machinery, New York, NY , USA, 2026. doi: 10.1145/3772318.3791915 4

  7. [7]

    Are AI-Generated Synthetic Users Replacing Personas? What UX Designers Need to Know, 2024

    James Newhook, Interaction Design Foundation. Are AI-Generated Synthetic Users Replacing Personas? What UX Designers Need to Know, 2024. 1

  8. [8]

    you always get an answer

    I. Kaate, J. Salminen, S.-G. Jung, T. T. T. Xuan, E. H ¨ayh¨anen, J. Y . Azem, and B. J. Jansen. “you always get an answer”: Analyzing users’ interaction with ai-generated personas given unanswerable questions and risk of hallucination. InProceedings of the 30th International Conference on Intelligent User Interfaces, pp. 1624–1638, 2025. 2

  9. [9]

    A. B. Kocaballi, M. Prpa, J. Salminen, D. Amin, and B. J Jansen. From generation to simulation: Responsible use of ai personas in human- centered design and research. InProceedings of the Extended Ab- stracts of the 2026 CHI Conference on Human Factors in Computing Systems, CHI EA ’26. Association for Computing Machinery, New York, NY , USA, 2026. doi: 10...

  10. [10]

    Krzywinski, J

    M. Krzywinski, J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Hors- man, S. J. Jones, and M. A. Marra. Circos: an information aesthetic for comparative genomics.Genome research, 19(9):1639–1645, 2009. 1

  11. [11]

    S. LYi, Q. Wang, F. Lekschas, and N. Gehlenborg. Gosling: A grammar-based toolkit for scalable and interactive genomics data visualization.IEEE Transactions on Visualization and Computer Graphics, 28(1):140–150, 2021. 1, 2

  12. [12]

    S. L’Yi, A. van den Brandt, E. Adams, H. N. Nguyen, and N. Gehlen- borg. Learnable and expressive visualization authoring through blended interfaces.IEEE Transactions on Visualization and Computer Graphics, 31(1):459–469, 2025. doi: 10.1109/TVCG.2024.3456598 1

  13. [13]

    S. L’Yi, Q. Wang, and N. Gehlenborg. The role of visualization in genomics data analysis workflows: The interviews. In2023 IEEE Visualization and Visual Analytics (VIS), pp. 101–105. IEEE, 2023. 2

  14. [14]

    H. N. Nguyen and N. Gehlenborg. Safire: Similarity framework for visualization retrieval. In2025 IEEE Visualization and Visual Ana- lytics (VIS), pp. 246–250, 2025. doi: 10.1109/VIS60296.2025.00055 2

  15. [15]

    H. N. Nguyen and N. Gehlenborg. Visualization retrieval for data literacy: Position paper.CHI 2026 Workshop on Data Literacy, Mar

  16. [16]

    doi: 10.48550/arXiv.2604.09598 4

  17. [17]

    H. N. Nguyen, S. L’Yi, T. C. Smits, S. Gao, M. Zitnik, and N. Gehlen- borg. Geranium: Multimodal retrieval of genomics data visualiza- tions.IEEE Transactions on Visualization and Computer Graphics, pp. 1–17, 2026. doi: 10.1109/TVCG.2026.3683429 1, 2, 3

  18. [18]

    Xiong, C

    A. Pandey, S. L’Yi, Q. Wang, M. A. Borkin, and N. Gehlenborg. Genorec: A recommendation system for interactive genomics data visualization.IEEE Transactions on Visualization and Computer Graphics, 29(1):570–580, 2023. doi: 10.1109/TVCG.2022.3209407 2

  19. [19]

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behav- ior. UIST ’23. Association for Computing Machinery, New York, NY , USA, 2023. doi: 10.1145/3586183.3606763 2

  20. [20]

    J. S. Park, C. Q. Zou, J. Kamphorst, N. Egan, A. Shaw, B. M. Hill, C. Cai, M. R. Morris, P. Liang, R. Willer, and M. S. Bernstein. Llm agents grounded in self-reports enable general-purpose simulation of individuals, 2026. 2

  21. [21]

    Salminen, C

    J. Salminen, C. Liu, W. Pian, J. Chi, E. H ¨ayh¨anen, and B. J. Jansen. Deus ex machina and personas from large language models: Investi- gating the composition of ai-generated persona descriptions. InPro- ceedings of the 2024 CHI Conference on Human Factors in Comput- ing Systems, CHI ’24. Association for Computing Machinery, New York, NY , USA, 2024. do...

  22. [22]

    Thorvaldsd ´ottir, J

    H. Thorvaldsd ´ottir, J. T. Robinson, and J. P. Mesirov. Integrative ge- nomics viewer (igv): high-performance genomics data visualization and exploration.Briefings in bioinformatics, 14(2):178–192, 2013. 1, 2

  23. [23]

    M. Truss. Personacite: V oc-grounded interviewable agentic synthetic ai personas for verifiable user and design research. InProceedings of the Extended Abstracts of the 2026 CHI Conference on Human Fac- tors in Computing Systems, pp. 1–7, 2026. 2, 3

  24. [24]

    van den Brandt, S

    A. van den Brandt, S. L’Yi, H. N. Nguyen, A. Vilanova, and N. Gehlenborg. Understanding visualization authoring techniques for genomics data in the context of personas and tasks.IEEE Trans- actions on Visualization and Computer Graphics, 31(1):1180–1190,

  25. [25]

    doi: 10.1109/TVCG.2024.3456298 1, 2, 3

  26. [26]

    Welch, F

    L. Welch, F. Lewitter, R. Schwartz, C. Brooksbank, P. Radivojac, B. Gaeta, and M. V . Schneider. Bioinformatics curriculum guidelines: toward a definition of core competencies.PLOS computational biol- ogy, 10(3):e1003496, 2014. 2