pith. sign in

arxiv: 2604.09598 · v1 · submitted 2026-03-06 · 💻 cs.HC

Visualization Retrieval for Data Literacy: Position Paper

Pith reviewed 2026-05-15 14:57 UTC · model grok-4.3

classification 💻 cs.HC
keywords visualization retrievaldata literacyeducationdesign space explorationinquiry-based learningvisualization comparisonresource curation
0
0 comments X

The pith

Visualization retrieval systems turn static galleries into dynamic tools that help learners query, compare, and explore design choices in data literacy education.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that current visualization galleries and datasets fall short for teaching data literacy because they offer only static examples without ways to query or navigate the full range of design options. It positions visualization retrieval as core infrastructure that can support the entire data lifecycle, from exploring visual designs and building vocabulary to comparing examples for critique and curating resources. A reader would care if this holds because it could let learners state what they want visually, reduce the technical hurdles of creating visualizations, and encourage active reasoning with data rather than passive viewing. The argument rests on analyzing how retrieval fits into design exploration, data consumption, and management tasks. It concludes by calling for research on integrated authoring tools, relevance models for teaching, and shared educational collections.

Core claim

Visualization retrieval is essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. It facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. The systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.

What carries the argument

Visualization retrieval, the mechanism that lets users query and navigate collections of visualizations by intent or features to support learning across the data lifecycle.

If this is right

  • Retrieval supports design space exploration and vocabulary expansion during learning.
  • It enables visualization comparison and critique for better data consumption.
  • It aids resource curation for improved data management.
  • Future systems could integrate retrieval directly with authoring tools.
  • Pedagogical relevance modeling and collaborative corpora become feasible research directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such retrieval tools might integrate into everyday data analysis platforms to support on-the-fly learning.
  • Modeling what counts as pedagogically relevant could influence how educational AI recommends visualizations.
  • A practical test would involve deploying a prototype corpus and tracking changes in learner self-reported confidence with data tasks.
  • The same retrieval approach could apply to training in scientific or business visualization domains beyond general data literacy.

Load-bearing premise

Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently.

What would settle it

A controlled study measuring whether students using a visualization retrieval system show measurable gains in design exploration speed, vocabulary use, and accuracy of data reasoning tasks compared with students limited to static galleries.

Figures

Figures reproduced from arXiv: 2604.09598 by Huyen N. Nguyen, Nils Gehlenborg.

Figure 1
Figure 1. Figure 1: Illustration of visualization retrieval bridging the semantic gap. A student searching for an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently. This position paper advocates for visualization retrieval as essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. We analyze the role of retrieval across the data lifecycle, demonstrating how it facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. We outline key opportunities for future research and system design, including integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora. Ultimately, we argue that visualization retrieval systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This position paper claims that visualization retrieval systems are essential infrastructure for data literacy education. It argues that static resources like visualization galleries and datasets lack mechanisms for efficient querying, comparison, and navigation of the design space. The paper analyzes retrieval's role across the data lifecycle to support design space exploration, vocabulary expansion, visualization comparison and critique, and resource curation. It outlines three opportunities for future work—integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora—and concludes that such systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.

Significance. If the proposed framework is adopted, the work could meaningfully shape HCI research on educational visualization tools by reframing static collections as dynamic, inquiry-driven environments. The coherent mapping of retrieval functions to stages of the data lifecycle and the identification of concrete research opportunities provide a useful roadmap. The manuscript's internal consistency and avoidance of circular reasoning or invented parameters are strengths for a position paper.

major comments (2)
  1. [Role of retrieval across the data lifecycle] The analysis of retrieval across the data lifecycle (detailed after the introduction) remains at a conceptual level without citing or briefly describing any existing visualization retrieval prototypes or systems that have been applied in educational settings; this weakens the bridge from current limitations to the claimed transformations in learning environments.
  2. [Opportunities for future research and system design] In the section outlining opportunities, the description of 'pedagogical relevance modeling' does not specify even high-level criteria or data sources (e.g., learner goals, curriculum alignment) that would be used to rank results; without this, it is difficult to evaluate how the proposed infrastructure would differ from generic retrieval.
minor comments (2)
  1. [Abstract] The abstract states that three opportunities are outlined but does not name them, reducing immediate clarity for readers scanning the paper.
  2. [Data lifecycle analysis] Some sentences in the discussion of the data lifecycle use broad terms such as 'proactively reason with data' without a short parenthetical gloss or example, which could be tightened for precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. The comments help strengthen the manuscript's connections to practice and clarify the proposed opportunities. We address each major comment below and will incorporate the suggested enhancements in the revised version.

read point-by-point responses
  1. Referee: The analysis of retrieval across the data lifecycle (detailed after the introduction) remains at a conceptual level without citing or briefly describing any existing visualization retrieval prototypes or systems that have been applied in educational settings; this weakens the bridge from current limitations to the claimed transformations in learning environments.

    Authors: We agree that grounding the conceptual analysis with concrete references would strengthen the argument. Although the position paper focuses on outlining a framework rather than surveying implementations, we will revise the relevant section to briefly describe and cite existing visualization retrieval systems (e.g., those supporting query-by-sketch or example-based search) that have been explored or applied in educational or design contexts. This addition will better illustrate the transition from current static resources to dynamic learning environments without altering the paper's core positioning. revision: yes

  2. Referee: In the section outlining opportunities, the description of 'pedagogical relevance modeling' does not specify even high-level criteria or data sources (e.g., learner goals, curriculum alignment) that would be used to rank results; without this, it is difficult to evaluate how the proposed infrastructure would differ from generic retrieval.

    Authors: We appreciate this point and acknowledge that the current description remains high-level. In the revised manuscript we will expand the 'pedagogical relevance modeling' subsection to outline high-level criteria and data sources, including learner goals, curriculum alignment, prior knowledge indicators, and educational outcome metrics. These signals would be used to re-rank results beyond generic similarity measures, thereby distinguishing the approach from standard retrieval systems. revision: yes

Circularity Check

0 steps flagged

No significant circularity; position paper argument is self-contained

full rationale

This is a position paper advocating visualization retrieval as infrastructure for data literacy. It analyzes the data lifecycle conceptually, outlines three research opportunities (integrated environments, pedagogical modeling, collaborative corpora), and argues for empowerment of learners. No equations, no fitted parameters, no predictions derived from data, and no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The central claims are argumentative and draw on general concepts of retrieval and education without reducing to self-definition or renaming of known results. The derivation chain is absent; the text is internally coherent as advocacy without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

As a position paper the central claims rest on domain assumptions about educational needs and the benefits of retrieval rather than new data or derivations; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Current visualization galleries and datasets lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently.
    Stated directly in the abstract as the core motivation for the position.
  • domain assumption Visualization retrieval facilitates design space exploration, vocabulary expansion, visualization comparison and critique, and resource curation across the data lifecycle.
    Core mapping presented in the abstract that underpins the advocacy for retrieval systems.

pith-pipeline@v0.9.0 · 5417 in / 1355 out tokens · 75186 ms · 2026-05-15T14:57:03.479078+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval

    cs.HC 2026-05 conditional novelty 6.0

    Grounding synthetic personas in real-user artifacts aligns their feedback language and concerns with documented experts, but both synthetic conditions converge on a find-and-adapt frame and miss the image-modality pre...

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper

  1. [1]

    Ali Baigelenov, Prakash Shukla, and Paul Parsons. 2025. How Visualization Designers Perceive and Use Inspiration. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–13

  2. [2]

    As-Rigid-As- Possible Deformation of Gaussian Radiance Fields ,

    Hannah K. Bako, Xinyi Liu, Leilani Battle, and Zhicheng Liu. 2023. Understanding how Designers Find and Use Data Visualization Examples.IEEE Transactions on Visualization and Computer Graphics29, 1 (2023), 1048–1058. doi:10.1109/TVCG. 2022.3209490

  3. [3]

    Leilani Battle, Peitong Duan, Zachery Miranda, Dana Mukusheva, Remco Chang, and Michael Stonebraker. 2018. Beagle: Automated Extraction and Interpretation of Visualizations from the Web. InProceedings of the 2018 CHI Conference on Hu- man Factors in Computing Systems(Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–...

  4. [4]

    Katy Börner, Andreas Bueckle, and Michael Ginda. 2019. Data visualization liter- acy: Definitions, conceptual frameworks, exercises, and assessments.Proceedings of the National Academy of Sciences116, 6 (2019), 1857–1864

  5. [5]

    Jian Chen, Meng Ling, Rui Li, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Torsten Möller, Robert S Laramee, Han-Wei Shen, Katharina Wünsche, et al. 2021. Vis30k: A collection of figures and tables from ieee visualization conference publications.IEEE Transactions on Visualization and Computer Graphics27, 9 (2021), 3826–3833

  6. [6]

    Qing Chen, Ying Chen, Ruishi Zou, Wei Shuai, Yi Guo, Jiazhe Wang, and Nan Cao. 2025. Chart2Vec: A Universal Embedding of Context-Aware Visualizations. IEEE Transactions on Visualization and Computer Graphics31, 4 (2025), 2167–2181. doi:10.1109/TVCG.2024.3383089

  7. [7]

    Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2023. VisImages: A Fine-Grained Expert-Annotated Visualization Dataset.IEEE Transactions on Visualization and Computer Graphics29, 7 (2023), 3298–3311. doi:10.1109/TVCG.2022.3155440

  8. [8]

    Elif E Firat, Alark Joshi, and Robert S Laramee. 2022. Interactive visualization literacy: The state-of-the-art.Information Visualization21, 3 (2022), 285–310

  9. [9]

    Yan Holtz. 2024. The Python Graph Gallery. https://python-graph-gallery.com/. Accessed: 2026-02-14

  10. [10]

    Yan Holtz. 2024. The R Graph Gallery. https://r-graph-gallery.com/. Accessed: 2026-02-14

  11. [11]

    Enamul Hoque and Maneesh Agrawala. 2020. Searching the Visual Style and Structure of D3 Visualizations.IEEE Transactions on Visualization and Computer Graphics26, 1 (2020), 1236–1245. doi:10.1109/TVCG.2019.2934431

  12. [12]

    Kristin Hunter-Thomson. 2025. How Can We Help Students Explore Data in Their Sensemaking?(Data Literacy 101).Science Scope48, 1 (2025), 7–11

  13. [13]

    Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, and Pranava Mad- hyastha. 2025. Capturing Visualization Design Rationale. In2025 IEEE Visualiza- tion and Visual Analytics (VIS). 231–235. doi:10.1109/VIS60296.2025.00052

  14. [14]

    Hammad R Khan, Jeonghyun Kim, and Hsia-Ching Chang. 2018. Toward an understanding of data literacy.iConference 2018 Proceedings(2018)

  15. [15]

    Jens Koenen, Marvin Petersen, Christoph Garth, and Tim Gerrits. 2024. DaVE - A Curated Database of Visualization Examples. In2024 IEEE Visualization and Visual Analytics (VIS). 11–15. doi:10.1109/VIS55277.2024.00010

  16. [16]

    Ard W Lazonder and Ruth Harmsen. 2016. Meta-analysis of inquiry-based learning: Effects of guidance.Review of educational research86, 3 (2016), 681– 718

  17. [17]

    Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, and Huamin Qu. 2022. Structure- aware Visualization Retrieval. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA). ACM, New York, Article 409, 14 pages. doi:10.1145/3491102.3502048

  18. [18]

    Mahbubul Majumder, Becky Brusky, Michelle Friend, Julie Dierberger, Sarah Moulton, Andrew W Swift, and Betty Love. 2025. Developing a Data Literacy and Visualization Service Learning Course that Fulfills Undergraduate Quantitative Literacy Requirements.Journal of Statistics and Data Science Educationjust- accepted (2025), 1–12

  19. [19]

    2008.Introduction to information retrieval

    Christopher D Manning. 2008.Introduction to information retrieval. Syngress Publishing,

  20. [20]

    Nguyen and Nils Gehlenborg

    Huyen N. Nguyen and Nils Gehlenborg. 2025. Safire: Similarity Framework for Visualization Retrieval. In2025 IEEE Visualization and Visual Analytics (VIS). 246–250. doi:10.1109/VIS60296.2025.00055

  21. [21]

    Huyen N Nguyen, Sehi L’Yi, Thomas C Smits, Shanghua Gao, Marinka Zitnik, and Nils Gehlenborg. 2025. Geranium: Multimodal Retrieval of Genomics Data Visualizations. doi:10.31219/osf.io/zatw9_v6

  22. [22]

    Michael Oppermann, Robert Kincaid, and Tamara Munzner. 2021. VizCommender: Computing Text-Based Similarity in Visualization Repositories for Content-Based Recommendations.IEEE Transactions on Visualization and Computer Graphics27, 2 (2021), 495–505. doi:10.1109/TVCG.2020.3030387

  23. [23]

    Jonathan C Roberts, Peter Butcher, and Panagiotis D Ritsos. 2025. From Data to Insight: Using Contextual Scenarios to Teach Critical Thinking in Data Vi- sualisation. In2025 IEEE VIS Workshop on Visualization Education, Literacy, and Activities (EduVIS). IEEE, 65–70

  24. [24]

    Vidya Setlur, Andriy Kanyuka, and Arjun Srinivasan. 2023. Olio: A Semantic Search Interface for Data Repositories. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23). ACM, New York, Article 95, 16 pages. doi:10.1145/3586183.3606806

  25. [25]

    Guardieiro, F

    Astrid van den Brandt, Sehi L’Yi, Huyen N. Nguyen, Anna Vilanova, and Nils Gehlenborg. 2025. Understanding Visualization Authoring Techniques for Ge- nomics Data in the Context of Personas and Tasks.IEEE Transactions on Visual- ization and Computer Graphics31, 1 (2025), 1180–1190. doi:10.1109/TVCG.2024. 3456298

  26. [26]

    Smits, David Kouřil, Huyen N

    Skylar Sargent Walters, Arthea Valderrama, Thomas C. Smits, David Kouřil, Huyen N. Nguyen, Sehi L’Yi, Devin Lange, and Nils Gehlenborg. 2025. GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI. (Jul 2025)

  27. [27]

    Shishi Xiao, Yihan Hou, Cheng Jin, and Wei Zeng. 2023. WYTIWYR: A User Intent-Aware Framework with Multi-modal Inputs for Visualization Retrieval. In Computer Graphics Forum, Vol. 42. Wiley Online Library, 311–322. doi:10.1111/ cgf.14832

  28. [28]

    Yilin Ye, Rong Huang, and Wei Zeng. 2024. VISAtlas: An Image-Based Explo- ration and Query System for Large Visualization Collections via Neural Image Embedding.IEEE Transactions on Visualization and Computer Graphics30, 7 (2024), 3224–3240. doi:10.1109/TVCG.2022.3229023

  29. [29]

    Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, and Yingcai Wu. 2024. VAID: Indexing View Designs in Visual Analytics System. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). ACM, New York, Article 198, 15 pages. doi:10.1145/3613904.3642237