pith. sign in

arxiv: 2510.16662 · v1 · submitted 2025-10-18 · 💻 cs.HC · cs.AI· cs.IR· cs.LG

Safire: Similarity Framework for Visualization Retrieval

Pith reviewed 2026-05-18 05:32 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.IRcs.LG
keywords visualization retrievalsimilarity frameworkcomparison criteriarepresentation modalitiesvisual encodingretrieval systemsmultimodal learning
0
0 comments X

The pith

Safire frames visualization similarity along comparison criteria and representation modalities to guide retrieval design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Similarity Framework for Visualization Retrieval (Safire) to address the missing systematic approach to defining what makes two visualizations similar. It organizes similarity into comparison criteria that cover primary facets such as data, visual encoding, interaction, style, and metadata, plus derived data-centric and human-centric measures. Safire then links those criteria to representation modalities grouped by information content and determinism into raster images, vector images, specifications, and natural language descriptions. The framework is used to analyze existing retrieval systems, revealing how choices of criteria and modalities shape what is computable and what limitations arise in practice.

Core claim

Safire is a conceptual model that frames visualization similarity along two dimensions: comparison criteria, which identify the aspects that make visualizations similar through primary facets (data, visual encoding, interaction, style, metadata) and derived properties, and representation modalities, which are categorized into four groups based on levels of information content and visualization determinism (raster image, vector image, specification, natural language description). This structure connects what to compare with how comparisons are executed, showing what is computable and comparable while guiding the design and analysis of retrieval systems.

What carries the argument

The Safire framework, which connects comparison criteria (primary facets and derived properties) to representation modalities (raster, vector, specification, natural language) to determine computable similarity in visualization retrieval.

If this is right

  • The choice of representation modality is an important decision that shapes retrieval capabilities and limitations beyond mere implementation details.
  • Particular criteria and modalities align across different use cases in existing visualization retrieval systems.
  • The framework supports clearer design decisions for multimodal learning and AI applications in visualization.
  • Recommendations from the analysis can improve visualization reproducibility by making similarity considerations explicit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Safire could serve as a foundation for creating standardized benchmarks that evaluate retrieval systems across multiple similarity dimensions.
  • Future work might test whether adding interactive or temporal modalities extends the framework without breaking its structure.
  • Database designers could use the modality categories to index visualizations for faster and more targeted searches.

Load-bearing premise

The proposed primary facets, derived properties, and four modality categories form a sufficiently complete and non-overlapping decomposition of visualization similarity.

What would settle it

A retrieval system that achieves strong performance by relying on a similarity aspect outside the five primary facets or a representation type beyond the four modality categories would challenge the framework's completeness.

Figures

Figures reproduced from arXiv: 2510.16662 by Huyen N. Nguyen, Nils Gehlenborg.

Figure 1
Figure 1. Figure 1: Our proposed similarity framework for visualization retrieval establishes clear comparison criteria and representation [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization representation across four modalities: a Vega-Lite JSON specification (right) rendered as SVG vector–with accompanying [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Searching D3 Visualizations [7] 4.2 Multimodal Retrieval of Genomics Visualizations Nguyen et al. [18] present a multimodal retrieval system for ge￾nomics data visualizations, covering all five comparison criteria: data, visual encoding, interaction, style, and metadata. Their system uses three modalities: raster images, Gosling [11] gram￾mar specifications, and NL descriptions (both alt-text and LLM￾enric… view at source ↗
Figure 4
Figure 4. Figure 4: Multimodal Retrieval of Genomics Data Visualizations [ [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: WYTIWYR: User Intent-Aware Framework [30] The system processes raster images as visualization inputs, with optional text prompts expressing user intent, and combines them via a CLIP-based multimodal encoder. 4.4 VAID: Indexing View Designs in VA system Ying et al. present VAID [32], an index structure for complex and composite visualizations. VAID compares both primary facets (data-related, visual encoding… view at source ↗
Figure 6
Figure 6. Figure 6: VAID: Indexing View Designs in VA system [ [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
read the original abstract

Effective visualization retrieval necessitates a clear definition of similarity. Despite the growing body of work in specialized visualization retrieval systems, a systematic approach to understanding visualization similarity remains absent. We introduce the Similarity Framework for Visualization Retrieval (Safire), a conceptual model that frames visualization similarity along two dimensions: comparison criteria and representation modalities. Comparison criteria identify the aspects that make visualizations similar, which we divide into primary facets (data, visual encoding, interaction, style, metadata) and derived properties (data-centric and human-centric measures). Safire connects what to compare with how comparisons are executed through representation modalities. We categorize existing representation approaches into four groups based on their levels of information content and visualization determinism: raster image, vector image, specification, and natural language description, together guiding what is computable and comparable. We analyze several visualization retrieval systems using Safire to demonstrate its practical value in clarifying similarity considerations. Our findings reveal how particular criteria and modalities align across different use cases. Notably, the choice of representation modality is not only an implementation detail but also an important decision that shapes retrieval capabilities and limitations. Based on our analysis, we provide recommendations and discuss broader implications for multimodal learning, AI applications, and visualization reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces the Similarity Framework for Visualization Retrieval (Safire), a conceptual model framing visualization similarity along two dimensions: comparison criteria (primary facets of data, visual encoding, interaction, style, and metadata, plus derived data-centric and human-centric properties) and representation modalities (raster image, vector image, specification, and natural language description, grouped by information content and determinism). The framework is applied to analyze several existing visualization retrieval systems to illustrate alignments across use cases, with discussion of implications for multimodal learning, AI applications, and visualization reproducibility.

Significance. If the framework holds, it supplies a timely organizing lens for visualization retrieval research, an area seeing rapid growth alongside AI-driven tools. The explicit connection between what aspects to compare and how to represent visualizations for comparison clarifies design trade-offs, as shown in the system analyses. The conceptual nature avoids overclaiming empirical performance while highlighting that modality selection shapes computability and capabilities; this is a useful contribution for guiding future work without relying on fitted parameters or self-referential definitions.

minor comments (2)
  1. [Section 5] A summary table mapping the analyzed retrieval systems to specific Safire facets and modalities would improve readability and allow readers to quickly compare alignments across examples.
  2. [Section 3] The description of derived properties (data-centric and human-centric measures) would benefit from one or two concrete visualization examples to illustrate how they differ from primary facets in practice.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, recognition of its timeliness for visualization retrieval research, and recommendation to accept. The feedback correctly identifies the framework's role in clarifying design trade-offs between comparison criteria and representation modalities.

Circularity Check

0 steps flagged

No significant circularity; Safire is an independent conceptual framework

full rationale

The paper introduces Safire as a new conceptual model that defines visualization similarity via two dimensions (comparison criteria with primary facets like data/visual encoding and derived properties, plus four representation modalities based on information content and determinism). These categorizations are presented as an organizing lens for analysis of existing systems rather than quantities derived from equations, fitted parameters, or self-referential inputs. No load-bearing steps reduce by construction to prior results or self-citations; the framework is self-contained as a definitional contribution that guides retrieval design without claiming predictive derivations or uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central contribution rests on domain assumptions about the decomposability of visualization similarity and the utility of categorizing representations by information content and determinism; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Visualization similarity can be decomposed into primary facets (data, visual encoding, interaction, style, metadata) and derived data-centric and human-centric properties.
    This decomposition is invoked to structure the comparison criteria dimension of the framework.
  • domain assumption Representation approaches can be grouped into raster image, vector image, specification, and natural language description based on information content and visualization determinism.
    This grouping is used to connect comparison criteria with what is computable in retrieval systems.
invented entities (1)
  • Safire framework no independent evidence
    purpose: To provide a structured model for visualization similarity in retrieval contexts
    A new conceptual construct introduced by the paper to organize existing approaches.

pith-pipeline@v0.9.0 · 5742 in / 1407 out tokens · 39533 ms · 2026-05-18T05:32:16.970692+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Bannour.Building and Using Knowledge Models for Semantic Image Annotation

    H. Bannour.Building and Using Knowledge Models for Semantic Image Annotation. PhD thesis, Ecole Centrale Paris, 2013. 2

  2. [2]

    Bostock, V

    M. Bostock, V . Ogievetsky, and J. Heer. D³ Data-Driven Docu- ments.IEEE Transactions on Visualization and Computer Graphics, 17(12):2301–2309, 2011. doi: 10.1109/TVCG.2011.185 2

  3. [3]

    Q. Chen, Y . Chen, R. Zou, W. Shuai, Y . Guo, J. Wang, and N. Cao. Chart2Vec: A Universal Embedding of Context-Aware Visualiza- tions.IEEE Transactions on Visualization and Computer Graphics, 31(4):2167–2181, 2025. doi: 10.1109/TVCG.2024.3383089 1, 2, 3

  4. [4]

    T. Dang, H. N. Nguyen, and V . Pham. WordStream: Interactive Vi- sualization for Topic Evolution. InEuroVis 2019 - Short Papers. The Eurographics Association, 2019. doi: 10.2312/evs.20191178 2

  5. [5]

    Dimara and C

    E. Dimara and C. Perin. What is Interaction for Data Visualiza- tion?IEEE Transactions on Visualization and Computer Graphics, 26(1):119–129, 2020. doi: 10.1109/TVCG.2019.2934283 2

  6. [6]

    S. L. Franconeri, L. M. Padilla, P. Shah, J. M. Zacks, and J. Hullman. The Science of Visual Data Communication: What Works.Psycho- logical Science in the Public Interest, 22(3):110–161, 2021. doi: 10. 1177/15291006211051956 2

  7. [7]

    Hoque and M

    E. Hoque and M. Agrawala. Searching the Visual Style and Structure of D3 Visualizations.IEEE Transactions on Visualization and Com- puter Graphics, 26(1):1236–1245, 2020. doi: 10.1109/TVCG.2019. 2934431 1, 2, 3, 4

  8. [8]

    D. Keim, G. Andrienko, J.-D. Fekete, C. G ¨org, J. Kohlhammer, and G. Melanc ¸on. Visual Analytics: Definition, Process, and Challenges. InInformation Visualization: Human-Centered Issues and Perspec- tives, pp. 154–175. Springer, Berlin, Heidelberg, 2008. doi: 10.1007/ 978-3-540-70956-5 7 2

  9. [9]

    H. Li, Y . Wang, A. Wu, H. Wei, and H. Qu. Structure-aware Visu- alization Retrieval. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, New York, 2022. doi: 10.1145/3491102.3502048 2, 3

  10. [10]

    Lundgard and A

    A. Lundgard and A. Satyanarayan. Accessible Visualization via Nat- ural Language Descriptions: A Four-Level Model of Semantic Con- tent.IEEE Transactions on Visualization and Computer Graphics, 28(1):1073–1083, 2022. doi: 10.1109/TVCG.2021.3114770 4

  11. [11]

    S. LYi, Q. Wang, F. Lekschas, and N. Gehlenborg. Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization.IEEE Transactions on Visualization and Computer Graphics, 28(1):140–150, 2022. doi: 10.1109/TVCG.2021.3114876 4

  12. [12]

    C. D. Manning, P. Raghavan, and H. Sch ¨utze.Introduction to Infor- mation Retrieval. Cambridge University Press, USA, 2008. 2

  13. [13]

    Mauri, T

    M. Mauri, T. Elli, G. Caviglia, G. Uboldi, and M. Azzi. RAWGraphs: A Visualisation Platform to Create Open Outputs. InProceedings of the 12th Biannual Conference on Italian SIGCHI Chapter, CHItaly ’17. ACM, New York, 2017. doi: 10.1145/3125571.3125585 2

  14. [14]

    Meyer, M

    M. Meyer, M. Sedlmair, and T. Munzner. The Four-Level Nested Model Revisited: Blocks and Guidelines. InProceedings of the 2012 BELIV Workshop, BELIV ’12. ACM, New York, 2012. doi: 10.1145/ 2442576.2442587 2

  15. [15]

    T. Munzner. A Nested Model for Visualization Design and Valida- tion.IEEE Transactions on Visualization and Computer Graphics, 15(6):921–928, 2009. doi: 10.1109/TVCG.2009.111 2

  16. [16]

    H. N. Nguyen, F. Abri, V . Pham, M. Chatterjee, A. S. Namin, and T. Dang. MalView: Interactive Visual Analytics for Comprehending Malware Behavior.IEEE Access, 10:99909–99930, 2022. doi: 10. 1109/ACCESS.2022.3207782 2

  17. [17]

    H. N. Nguyen, T. Dang, and K. A. Bowe. WordStream Maker: A Lightweight End-to-end Visualization Platform for Qualitative Time- series Data. InNLVIZ: Exploring Research Opportunities for Natural Language, Text, and Data Visualization Workshop, 2022. 2

  18. [18]

    H. N. Nguyen, S. L’Yi, T. C. Smits, S. Gao, M. Zitnik, and N. Gehlen- borg. Multimodal Retrieval of Genomics Data Visualizations, 2025. doi: 10.31219/osf.io/zatw9 v1 1, 2, 3, 4

  19. [19]

    H. N. Nguyen, C. M. Trujillo, K. Wee, and K. A. Bowe. Interactive Qualitative Data Visualization for Educational Assessment. InPro- ceedings of the 12th International Conference on Advances in Infor- mation Technology, IAIT ’21. ACM, New York, 2021. doi: 10.1145/ 3468784.3469851 2

  20. [20]

    Implicit multidimensional projection of local subspaces,

    M. Oppermann, R. Kincaid, and T. Munzner. VizCommender: Computing Text-Based Similarity in Visualization Repositories for Content-Based Recommendations.IEEE Transactions on Visualiza- tion and Computer Graphics, 27(2):495–505, 2021. doi: 10.1109/ TVCG.2020.3030387 1, 2, 3

  21. [21]

    A. V . Pandey, J. Krause, C. Felix, J. Boy, and E. Bertini. Towards Understanding Human Similarity Perception in the Analysis of Large Sets of Scatter Plots. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, p. 3659–3669. ACM, New York, 2016. doi: 10.1145/2858036.2858155 3

  22. [22]

    Saleh, M

    B. Saleh, M. Dontcheva, A. Hertzmann, and Z. Liu. Learning Style Similarity for Searching Infographics. InProceedings of the 41st Graphics Interface Conference, GI ’15, p. 59–64. Canadian Informa- tion Processing Society, CAN, 2015. 2

  23. [23]

    Satyanarayan, D

    A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega- Lite: A Grammar of Interactive Graphics.IEEE Transactions on Vi- sualization and Computer Graphics, 23(1):341–350, 2017. doi: 10. 1109/TVCG.2016.2599030 3

  24. [24]

    Setlur, A

    V . Setlur, A. Kanyuka, and A. Srinivasan. Olio: A Semantic Search Interface for Data Repositories. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23. ACM, New York, 2023. doi: 10.1145/3586183.3606806 2

  25. [25]

    T. C. Smits, S. L’Yi, A. P. Mar, and N. Gehlenborg. AltGosling: Auto- matic Generation of Text Descriptions for Accessible Genomics Data Visualization.Bioinformatics, 40(12):btae670, 11 2024. doi: 10.1093/ bioinformatics/btae670 3

  26. [26]

    T. C. Smits, S. L’Yi, H. N. Nguyen, A. P. Mar, and N. Gehlen- borg. Explaining Unfamiliar Genomics Data Visualizations to a Blind Individual through Transitions. In2024 1st Workshop on Accessi- ble Data Visualization (AccessViz), pp. 24–28, 2024. doi: 10.1109/ AccessViz64636.2024.00010 3

  27. [27]

    M. Sun, Y . Ma, Y . Wang, T. Li, J. Zhao, Y . Liu, and P.-S. Zhong. To- ward Systematic Considerations of Missingness in Visual Analytics. In2022 IEEE Visualization and Visual Analytics (VIS), pp. 110–114,

  28. [28]

    doi: 10.1109/VIS54862.2022.00031 2

  29. [29]

    Vaidya and A

    S. Vaidya and A. Dasgupta. Knowing what to look for: A Fact- Evidence Reasoning Framework for Decoding Communicative Visu- alization. In2020 IEEE Visualization Conference (VIS), pp. 231–235,

  30. [30]

    doi: 10.1109/VIS47514.2020.00053 2

  31. [31]

    van den Brandt, S

    A. van den Brandt, S. L’Yi, H. N. Nguyen, A. Vilanova, and N. Gehlenborg. Understanding Visualization Authoring Techniques for Genomics Data in the Context of Personas and Tasks.IEEE Trans- actions on Visualization and Computer Graphics, 31(1):1180–1190,

  32. [32]

    doi: 10.1109/TVCG.2024.3456298 4

  33. [33]

    S. Xiao, Y . Hou, C. Jin, and W. Zeng. WYTIWYR: A User Intent- Aware Framework with Multi-modal Inputs for Visualization Re- trieval. InComputer Graphics F orum, vol. 42, pp. 311–322. Wiley Online Library, 2023. doi: 10.1111/cgf.14832 1, 2, 3, 4

  34. [34]

    Y . Ye, R. Huang, and W. Zeng. VISAtlas: An Image-Based Explo- ration and Query System for Large Visualization Collections via Neu- ral Image Embedding.IEEE Transactions on Visualization and Com- puter Graphics, 30(7):3224–3240, 2024. doi: 10.1109/TVCG.2022. 3229023 1, 2, 3

  35. [35]

    L. Ying, A. Wu, H. Li, Z. Deng, J. Lan, J. Wu, Y . Wang, H. Qu, D. Deng, and Y . Wu. V AID: Indexing View Designs in Visual Ana- lytics System. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24. ACM, New York, 2024. doi: 10.1145/3613904.3642237 2, 4

  36. [36]

    J. Zhao, M. Fan, and M. Feng. ChartSeer: Interactive Steering Ex- ploratory Visual Analysis With Machine Intelligence.IEEE Trans- actions on Visualization and Computer Graphics, 28(3):1500–1513,

  37. [37]

    doi: 10.1109/TVCG.2020.3018724 2 5