Visualization Retrieval for Data Literacy: Position Paper
Pith reviewed 2026-05-15 14:57 UTC · model grok-4.3
The pith
Visualization retrieval systems turn static galleries into dynamic tools that help learners query, compare, and explore design choices in data literacy education.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Visualization retrieval is essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. It facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. The systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.
What carries the argument
Visualization retrieval, the mechanism that lets users query and navigate collections of visualizations by intent or features to support learning across the data lifecycle.
If this is right
- Retrieval supports design space exploration and vocabulary expansion during learning.
- It enables visualization comparison and critique for better data consumption.
- It aids resource curation for improved data management.
- Future systems could integrate retrieval directly with authoring tools.
- Pedagogical relevance modeling and collaborative corpora become feasible research directions.
Where Pith is reading between the lines
- Such retrieval tools might integrate into everyday data analysis platforms to support on-the-fly learning.
- Modeling what counts as pedagogically relevant could influence how educational AI recommends visualizations.
- A practical test would involve deploying a prototype corpus and tracking changes in learner self-reported confidence with data tasks.
- The same retrieval approach could apply to training in scientific or business visualization domains beyond general data literacy.
Load-bearing premise
Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently.
What would settle it
A controlled study measuring whether students using a visualization retrieval system show measurable gains in design exploration speed, vocabulary use, and accuracy of data reasoning tasks compared with students limited to static galleries.
Figures
read the original abstract
Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently. This position paper advocates for visualization retrieval as essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. We analyze the role of retrieval across the data lifecycle, demonstrating how it facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. We outline key opportunities for future research and system design, including integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora. Ultimately, we argue that visualization retrieval systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This position paper claims that visualization retrieval systems are essential infrastructure for data literacy education. It argues that static resources like visualization galleries and datasets lack mechanisms for efficient querying, comparison, and navigation of the design space. The paper analyzes retrieval's role across the data lifecycle to support design space exploration, vocabulary expansion, visualization comparison and critique, and resource curation. It outlines three opportunities for future work—integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora—and concludes that such systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.
Significance. If the proposed framework is adopted, the work could meaningfully shape HCI research on educational visualization tools by reframing static collections as dynamic, inquiry-driven environments. The coherent mapping of retrieval functions to stages of the data lifecycle and the identification of concrete research opportunities provide a useful roadmap. The manuscript's internal consistency and avoidance of circular reasoning or invented parameters are strengths for a position paper.
major comments (2)
- [Role of retrieval across the data lifecycle] The analysis of retrieval across the data lifecycle (detailed after the introduction) remains at a conceptual level without citing or briefly describing any existing visualization retrieval prototypes or systems that have been applied in educational settings; this weakens the bridge from current limitations to the claimed transformations in learning environments.
- [Opportunities for future research and system design] In the section outlining opportunities, the description of 'pedagogical relevance modeling' does not specify even high-level criteria or data sources (e.g., learner goals, curriculum alignment) that would be used to rank results; without this, it is difficult to evaluate how the proposed infrastructure would differ from generic retrieval.
minor comments (2)
- [Abstract] The abstract states that three opportunities are outlined but does not name them, reducing immediate clarity for readers scanning the paper.
- [Data lifecycle analysis] Some sentences in the discussion of the data lifecycle use broad terms such as 'proactively reason with data' without a short parenthetical gloss or example, which could be tightened for precision.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation for minor revision. The comments help strengthen the manuscript's connections to practice and clarify the proposed opportunities. We address each major comment below and will incorporate the suggested enhancements in the revised version.
read point-by-point responses
-
Referee: The analysis of retrieval across the data lifecycle (detailed after the introduction) remains at a conceptual level without citing or briefly describing any existing visualization retrieval prototypes or systems that have been applied in educational settings; this weakens the bridge from current limitations to the claimed transformations in learning environments.
Authors: We agree that grounding the conceptual analysis with concrete references would strengthen the argument. Although the position paper focuses on outlining a framework rather than surveying implementations, we will revise the relevant section to briefly describe and cite existing visualization retrieval systems (e.g., those supporting query-by-sketch or example-based search) that have been explored or applied in educational or design contexts. This addition will better illustrate the transition from current static resources to dynamic learning environments without altering the paper's core positioning. revision: yes
-
Referee: In the section outlining opportunities, the description of 'pedagogical relevance modeling' does not specify even high-level criteria or data sources (e.g., learner goals, curriculum alignment) that would be used to rank results; without this, it is difficult to evaluate how the proposed infrastructure would differ from generic retrieval.
Authors: We appreciate this point and acknowledge that the current description remains high-level. In the revised manuscript we will expand the 'pedagogical relevance modeling' subsection to outline high-level criteria and data sources, including learner goals, curriculum alignment, prior knowledge indicators, and educational outcome metrics. These signals would be used to re-rank results beyond generic similarity measures, thereby distinguishing the approach from standard retrieval systems. revision: yes
Circularity Check
No significant circularity; position paper argument is self-contained
full rationale
This is a position paper advocating visualization retrieval as infrastructure for data literacy. It analyzes the data lifecycle conceptually, outlines three research opportunities (integrated environments, pedagogical modeling, collaborative corpora), and argues for empowerment of learners. No equations, no fitted parameters, no predictions derived from data, and no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The central claims are argumentative and draw on general concepts of retrieval and education without reducing to self-definition or renaming of known results. The derivation chain is absent; the text is internally coherent as advocacy without circular reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Current visualization galleries and datasets lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently.
- domain assumption Visualization retrieval facilitates design space exploration, vocabulary expansion, visualization comparison and critique, and resource curation across the data lifecycle.
Forward citations
Cited by 1 Pith paper
-
Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval
Grounding synthetic personas in real-user artifacts aligns their feedback language and concerns with documented experts, but both synthetic conditions converge on a find-and-adapt frame and miss the image-modality pre...
Reference graph
Works this paper leans on
-
[1]
Ali Baigelenov, Prakash Shukla, and Paul Parsons. 2025. How Visualization Designers Perceive and Use Inspiration. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–13
work page 2025
-
[2]
As-Rigid-As- Possible Deformation of Gaussian Radiance Fields ,
Hannah K. Bako, Xinyi Liu, Leilani Battle, and Zhicheng Liu. 2023. Understanding how Designers Find and Use Data Visualization Examples.IEEE Transactions on Visualization and Computer Graphics29, 1 (2023), 1048–1058. doi:10.1109/TVCG. 2022.3209490
-
[3]
Leilani Battle, Peitong Duan, Zachery Miranda, Dana Mukusheva, Remco Chang, and Michael Stonebraker. 2018. Beagle: Automated Extraction and Interpretation of Visualizations from the Web. InProceedings of the 2018 CHI Conference on Hu- man Factors in Computing Systems(Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–...
-
[4]
Katy Börner, Andreas Bueckle, and Michael Ginda. 2019. Data visualization liter- acy: Definitions, conceptual frameworks, exercises, and assessments.Proceedings of the National Academy of Sciences116, 6 (2019), 1857–1864
work page 2019
-
[5]
Jian Chen, Meng Ling, Rui Li, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Torsten Möller, Robert S Laramee, Han-Wei Shen, Katharina Wünsche, et al. 2021. Vis30k: A collection of figures and tables from ieee visualization conference publications.IEEE Transactions on Visualization and Computer Graphics27, 9 (2021), 3826–3833
work page 2021
-
[6]
Qing Chen, Ying Chen, Ruishi Zou, Wei Shuai, Yi Guo, Jiazhe Wang, and Nan Cao. 2025. Chart2Vec: A Universal Embedding of Context-Aware Visualizations. IEEE Transactions on Visualization and Computer Graphics31, 4 (2025), 2167–2181. doi:10.1109/TVCG.2024.3383089
-
[7]
Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2023. VisImages: A Fine-Grained Expert-Annotated Visualization Dataset.IEEE Transactions on Visualization and Computer Graphics29, 7 (2023), 3298–3311. doi:10.1109/TVCG.2022.3155440
-
[8]
Elif E Firat, Alark Joshi, and Robert S Laramee. 2022. Interactive visualization literacy: The state-of-the-art.Information Visualization21, 3 (2022), 285–310
work page 2022
-
[9]
Yan Holtz. 2024. The Python Graph Gallery. https://python-graph-gallery.com/. Accessed: 2026-02-14
work page 2024
-
[10]
Yan Holtz. 2024. The R Graph Gallery. https://r-graph-gallery.com/. Accessed: 2026-02-14
work page 2024
-
[11]
Enamul Hoque and Maneesh Agrawala. 2020. Searching the Visual Style and Structure of D3 Visualizations.IEEE Transactions on Visualization and Computer Graphics26, 1 (2020), 1236–1245. doi:10.1109/TVCG.2019.2934431
-
[12]
Kristin Hunter-Thomson. 2025. How Can We Help Students Explore Data in Their Sensemaking?(Data Literacy 101).Science Scope48, 1 (2025), 7–11
work page 2025
-
[13]
Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, and Pranava Mad- hyastha. 2025. Capturing Visualization Design Rationale. In2025 IEEE Visualiza- tion and Visual Analytics (VIS). 231–235. doi:10.1109/VIS60296.2025.00052
-
[14]
Hammad R Khan, Jeonghyun Kim, and Hsia-Ching Chang. 2018. Toward an understanding of data literacy.iConference 2018 Proceedings(2018)
work page 2018
-
[15]
Jens Koenen, Marvin Petersen, Christoph Garth, and Tim Gerrits. 2024. DaVE - A Curated Database of Visualization Examples. In2024 IEEE Visualization and Visual Analytics (VIS). 11–15. doi:10.1109/VIS55277.2024.00010
-
[16]
Ard W Lazonder and Ruth Harmsen. 2016. Meta-analysis of inquiry-based learning: Effects of guidance.Review of educational research86, 3 (2016), 681– 718
work page 2016
-
[17]
Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, and Huamin Qu. 2022. Structure- aware Visualization Retrieval. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA). ACM, New York, Article 409, 14 pages. doi:10.1145/3491102.3502048
-
[18]
Mahbubul Majumder, Becky Brusky, Michelle Friend, Julie Dierberger, Sarah Moulton, Andrew W Swift, and Betty Love. 2025. Developing a Data Literacy and Visualization Service Learning Course that Fulfills Undergraduate Quantitative Literacy Requirements.Journal of Statistics and Data Science Educationjust- accepted (2025), 1–12
work page 2025
-
[19]
2008.Introduction to information retrieval
Christopher D Manning. 2008.Introduction to information retrieval. Syngress Publishing,
work page 2008
-
[20]
Huyen N. Nguyen and Nils Gehlenborg. 2025. Safire: Similarity Framework for Visualization Retrieval. In2025 IEEE Visualization and Visual Analytics (VIS). 246–250. doi:10.1109/VIS60296.2025.00055
-
[21]
Huyen N Nguyen, Sehi L’Yi, Thomas C Smits, Shanghua Gao, Marinka Zitnik, and Nils Gehlenborg. 2025. Geranium: Multimodal Retrieval of Genomics Data Visualizations. doi:10.31219/osf.io/zatw9_v6
-
[22]
Michael Oppermann, Robert Kincaid, and Tamara Munzner. 2021. VizCommender: Computing Text-Based Similarity in Visualization Repositories for Content-Based Recommendations.IEEE Transactions on Visualization and Computer Graphics27, 2 (2021), 495–505. doi:10.1109/TVCG.2020.3030387
-
[23]
Jonathan C Roberts, Peter Butcher, and Panagiotis D Ritsos. 2025. From Data to Insight: Using Contextual Scenarios to Teach Critical Thinking in Data Vi- sualisation. In2025 IEEE VIS Workshop on Visualization Education, Literacy, and Activities (EduVIS). IEEE, 65–70
work page 2025
-
[24]
Vidya Setlur, Andriy Kanyuka, and Arjun Srinivasan. 2023. Olio: A Semantic Search Interface for Data Repositories. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23). ACM, New York, Article 95, 16 pages. doi:10.1145/3586183.3606806
-
[25]
Astrid van den Brandt, Sehi L’Yi, Huyen N. Nguyen, Anna Vilanova, and Nils Gehlenborg. 2025. Understanding Visualization Authoring Techniques for Ge- nomics Data in the Context of Personas and Tasks.IEEE Transactions on Visual- ization and Computer Graphics31, 1 (2025), 1180–1190. doi:10.1109/TVCG.2024. 3456298
-
[26]
Skylar Sargent Walters, Arthea Valderrama, Thomas C. Smits, David Kouřil, Huyen N. Nguyen, Sehi L’Yi, Devin Lange, and Nils Gehlenborg. 2025. GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI. (Jul 2025)
work page 2025
-
[27]
Shishi Xiao, Yihan Hou, Cheng Jin, and Wei Zeng. 2023. WYTIWYR: A User Intent-Aware Framework with Multi-modal Inputs for Visualization Retrieval. In Computer Graphics Forum, Vol. 42. Wiley Online Library, 311–322. doi:10.1111/ cgf.14832
work page 2023
-
[28]
Yilin Ye, Rong Huang, and Wei Zeng. 2024. VISAtlas: An Image-Based Explo- ration and Query System for Large Visualization Collections via Neural Image Embedding.IEEE Transactions on Visualization and Computer Graphics30, 7 (2024), 3224–3240. doi:10.1109/TVCG.2022.3229023
-
[29]
Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, and Yingcai Wu. 2024. VAID: Indexing View Designs in Visual Analytics System. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). ACM, New York, Article 198, 15 pages. doi:10.1145/3613904.3642237
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.