VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation
Pith reviewed 2026-05-23 23:57 UTC · model grok-4.3
The pith
VERA automatically generates static visual explanations for 2D embeddings by annotating informative regions with human-interpretable features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VERA identifies informative regions in the embedding space and associates them with user-provided human-interpretable features, producing concise visual annotations that summarize the structure of the embedding landscape at a glance. Rather than merely showing where feature values occur, VERA automatically filters, merges, and ranks candidate explanations, enabling users to focus on the most informative embedding structures without manual exploration.
What carries the argument
Automatic identification, filtering, merging, and ranking of region annotations based on user-provided features to explain embedding patterns.
If this is right
- Users gain understanding of clusters and patterns in embeddings with reduced manual effort.
- Static explanations can replace or supplement interactive data mining for exploratory tasks.
- Analysis of high-dimensional data visualizations becomes more efficient in terms of time spent.
- The approach supports typical EDA tasks by highlighting structures that matter most.
Where Pith is reading between the lines
- VERA might be combined with machine learning to suggest relevant features when user input is limited.
- Similar region-based annotation could apply to other visualization types like graphs or 3D embeddings.
- Testing on larger datasets could reveal scalability limits of the automatic ranking process.
Load-bearing premise
The method assumes that user-provided human-interpretable features are sufficient to label the most informative regions and that the automatic filtering, merging, and ranking steps will surface the structures that matter most without missing key patterns.
What would settle it
An experiment showing that VERA misses important embedding structures identified by domain experts or that users complete EDA tasks slower with VERA than with interactive tools.
Figures
read the original abstract
Two-dimensional embeddings obtained from dimensionality reduction techniques such as MDS, t-SNE, or UMAP, are widely used to visualize high-dimensional data and support researchers in visually identifying clusters, outliers, and other interesting patterns in the data. However, the main challenge is not only to detect such patterns, but to explain what they represent in terms of the original, human-interpretable features of the data. Existing approaches often rely on interactive exploration or direct feature encodings, requiring substantial manual inspection that can be time-consuming and repetitive. As an alternative, we propose VERA (Visual Explanations via Region Annotation), a general-purpose method for explaining two-dimensional embeddings through automatically generated, static, region-based visual explanations. VERA identifies informative regions in the embedding space and associates them with user-provided human-interpretable features, producing concise visual annotations that summarize the structure of the embedding landscape at a glance. Rather than merely showing where feature values occur, VERA automatically filters, merges, and ranks candidate explanations, enabling users to focus on the most informative embedding structures without manual exploration. We demonstrate VERA's utility on several real-world datasets and evaluate its effectiveness in a user study comparing it with the utility of a comprehensive interactive data mining toolkit. Our results show that VERA's generated static explanations can convey the essential insights of complex embeddings and support users in typical exploratory data analysis tasks, while requiring significantly less time and user effort.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces VERA, a general-purpose method for producing static visual explanations of 2D embeddings (from MDS, t-SNE, UMAP, etc.) by automatically detecting informative regions, associating them with user-supplied human-interpretable features, and applying filtering/merging/ranking steps to generate concise annotations. It demonstrates the approach on real-world datasets and reports a user study claiming that the resulting static explanations convey essential insights into clusters/outliers while requiring significantly less time and effort than a comprehensive interactive data-mining toolkit.
Significance. If the algorithmic details and user-study evidence hold, VERA would offer a practical alternative to interactive exploration for embedding analysis, potentially lowering the barrier for non-expert users. The paper receives credit for including both real-dataset demonstrations and a comparative user study, which directly addresses a common pain point in exploratory data analysis.
major comments (3)
- [Abstract and §3] Abstract and §3 (Method): No description is given of the region-detection procedure, the concrete criteria or thresholds used for automatic filtering/merging/ranking of candidate explanations, or the precise mechanism that associates regions with user-provided features. Without these, the central claim that the generated annotations reliably surface the most informative structures cannot be evaluated or reproduced.
- [§5] §5 (User Study): The headline result that VERA reduces time and user effort is stated without reporting participant count, task design, statistical tests, p-values, effect sizes, or any analysis of variance. This absence makes it impossible to assess whether the evidence supports the claim that static explanations are sufficient for typical EDA tasks.
- [§4–5] §4–5 (Evaluation): The method’s correctness rests on the untested assumption that the supplied human-interpretable features plus the automatic steps will capture all structures an analyst would discover interactively. No ablation removing features, no failure-case examples, and no comparison against ground-truth structures are provided, leaving open the possibility that key patterns are systematically omitted.
minor comments (1)
- [Abstract] Abstract: The phrase “several real-world datasets” is used without naming the datasets or citing their sources.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback. We address each of the major comments below, indicating the revisions we will make to improve the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): No description is given of the region-detection procedure, the concrete criteria or thresholds used for automatic filtering/merging/ranking of candidate explanations, or the precise mechanism that associates regions with user-provided features. Without these, the central claim that the generated annotations reliably surface the most informative structures cannot be evaluated or reproduced.
Authors: We agree that additional details are necessary for reproducibility. Although the method section describes the high-level procedure, specific implementation details such as the region detection algorithm, thresholds for filtering and merging, and the exact association mechanism were not sufficiently elaborated. In the revised manuscript, we will expand §3 with a precise description of these components, including the criteria used, any thresholds, and pseudocode for the key steps. This will allow readers to evaluate and reproduce the central claims. revision: yes
-
Referee: [§5] §5 (User Study): The headline result that VERA reduces time and user effort is stated without reporting participant count, task design, statistical tests, p-values, effect sizes, or any analysis of variance. This absence makes it impossible to assess whether the evidence supports the claim that static explanations are sufficient for typical EDA tasks.
Authors: We acknowledge the lack of detailed statistical reporting in the user study section. The study was designed to compare VERA with an interactive toolkit, but the manuscript does not include the requested details. We will revise §5 to report the participant count, provide a full description of the task design, and include appropriate statistical analyses such as t-tests or ANOVA with p-values and effect sizes to substantiate the claims about reduced time and effort. revision: yes
-
Referee: [§4–5] §4–5 (Evaluation): The method’s correctness rests on the untested assumption that the supplied human-interpretable features plus the automatic steps will capture all structures an analyst would discover interactively. No ablation removing features, no failure-case examples, and no comparison against ground-truth structures are provided, leaving open the possibility that key patterns are systematically omitted.
Authors: The user study provides some validation that the generated explanations convey essential insights, as evidenced by user performance. However, we agree that more rigorous evaluation is warranted. We will add ablation studies on the role of user-provided features, include examples of potential failure cases, and where possible, compare against known ground-truth structures in the datasets used. This will address the concern about systematic omissions. revision: yes
Circularity Check
No significant circularity; VERA is a self-contained procedural method
full rationale
The paper describes VERA as a general-purpose algorithmic procedure: identify informative regions in 2D embeddings, associate them with user-provided human-interpretable features, then apply automatic filtering/merging/ranking to produce static annotations. No equations, fitted parameters, or derivations are presented that reduce to their own inputs by construction. No self-citations appear as load-bearing premises for uniqueness, ansatzes, or theorems. The central claims rest on demonstrations across datasets and a user study comparing static output to an interactive toolkit; these evaluations are independent of any prior author results. This matches the default case of a non-circular methods paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vi- sualizing high-dimensional data: Advances in the past decade,
S. Liu, D. Maljovec, B. Wang, P.-T. Bremer, and V . Pascucci, “Vi- sualizing high-dimensional data: Advances in the past decade,” IEEE Transactions on Visualization and Computer Graphics , vol. 23, no. 3, pp. 1249–1268, 2017
work page 2017
-
[2]
I. T. Jolliffe, Principal Component Analysis . Springer, 2002
work page 2002
-
[3]
The use of multiple measurements in taxonomic prob- lems,
R. A. Fisher, “The use of multiple measurements in taxonomic prob- lems,” Annals of Eugenics , vol. 7, no. 2, pp. 179–188, 1936
work page 1936
-
[4]
J. Dem ˇsar, G. Leban, and B. Zupan, “FreeViz—An intelligent multivari- ate visualization approach to explorative analysis of biomedical data,” Journal of biomedical informatics , vol. 40, no. 6, pp. 661–671, 2007
work page 2007
-
[5]
J. B. Kruskal and M. Wish, Multidimensional Scaling. SAGE Publica- tions, Inc., 1978
work page 1978
-
[6]
L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research , vol. 9, no. 11, 2008
work page 2008
-
[7]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,
L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” ArXiv e- prints, 2018
work page 2018
-
[8]
Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations,
E. Kandogan, “Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations,” in 2012 IEEE Conference on Visual Analytics Science and Technology (VAST) , pp. 73–82, 2012
work page 2012
-
[9]
The Data Context Map: Fusing Data and Attributes into a Unified Display,
S. Cheng and K. Mueller, “The Data Context Map: Fusing Data and Attributes into a Unified Display,” IEEE Transactions on Visualization and Computer Graphics , vol. 22, pp. 121–130, jan 2016
work page 2016
-
[10]
A. Bibal, A. Clarinval, B. Dumas, and B. Fr ´enay, “IXVC: An interac- tive pipeline for explaining visual clusters in dimensionality reduction visualizations with decision trees,” Array, vol. 11, p. 100080, 2021
work page 2021
-
[11]
Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data,
G. C. Linderman, M. Rachh, J. G. Hoskins, S. Steinerberger, and Y . Kluger, “Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data,” Nature Methods, vol. 16, no. 3, pp. 243– 245, 2019
work page 2019
-
[12]
P. Lambert, C. de Bodt, M. Verleysen, and J. A. Lee, “SQuadMDS: A lean Stochastic Quartet MDS improving global structure preservation in neighbor embedding like t-SNE and UMAP,” Neurocomputing, vol. 503, pp. 17–27, 2022
work page 2022
-
[13]
The art of using t-sne for single-cell tran- scriptomics,
D. Kobak and P. Berens, “The art of using t-sne for single-cell tran- scriptomics,” Nature Communications, vol. 10, p. 5416, Nov 2019
work page 2019
-
[14]
Attribute-based Visual Explanation of Multidimensional Projec- tions,
R. R. O. d. Silva, P. E. Rauber, R. M. Martins, R. Minghim, and A. C. Telea, “Attribute-based Visual Explanation of Multidimensional Projec- tions,” in EuroVis Workshop on Visual Analytics (EuroVA) (E. Bertini and J. C. Roberts, eds.), The Eurographics Association, 2015
work page 2015
-
[15]
Enhanced Attribute-Based Explanations of Multidimensional Projections,
D. v. Driel, X. Zhai, Z. Tian, and A. Telea, “Enhanced Attribute-Based Explanations of Multidimensional Projections,” in EuroVis Workshop on Visual Analytics (EuroVA) (C. Turkay and K. Vrotsou, eds.), The Eurographics Association, 2020. 12
work page 2020
-
[16]
Z. Tian, X. Zhai, D. van Driel, G. van Steenpaal, M. Espadoto, and A. Telea, “Using multiple attribute-based explanations of multidimen- sional projections to explore high-dimensional data,” Computers & Graphics, vol. 98, pp. 93–104, 2021
work page 2021
-
[17]
DimReader: Axis Lines That Explain Non-Linear Projections,
R. Faust, D. Glickenstein, and C. Scheidegger, “DimReader: Axis Lines That Explain Non-Linear Projections,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 481–490, 2019
work page 2019
-
[18]
Explaining Groups of Points in Low-Dimensional Representations,
G. Plumb, J. Terhorst, S. Sankararaman, and A. Talwalkar, “Explaining Groups of Points in Low-Dimensional Representations,” in Proceedings of the 37th International Conference on Machine Learning (H. D. III and A. Singh, eds.), vol. 119 of Proceedings of Machine Learning Research, pp. 7762–7771, PMLR, 13–18 Jul 2020
work page 2020
-
[19]
L. G. Nonato and M. Aupetit, “Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrich- ment,” IEEE Transactions on Visualization and Computer Graphics , vol. 25, no. 8, pp. 2650–2673, 2019
work page 2019
-
[20]
Techniques for precision-based visual analysis of projected data,
T. Schreck, T. von Landesberger, and S. Bremm, “Techniques for precision-based visual analysis of projected data,” Information Visual- ization, vol. 9, no. 3, pp. 181–193, 2010
work page 2010
-
[21]
Stress Maps: Analysing Lo- cal Phenomena in Dimensionality Reduction Based Visualisations,
C. Seifert, V . Sabol, and W. Kienreich, “Stress Maps: Analysing Lo- cal Phenomena in Dimensionality Reduction Based Visualisations,” in EuroVAST 2010: International Symposium on Visual Analytics Science and Technology (J. Kohlhammer and D. Keim, eds.), The Eurographics Association, 2010
work page 2010
-
[22]
Visualizing distortions and recovering topology in contin- uous projection techniques,
M. Aupetit, “Visualizing distortions and recovering topology in contin- uous projection techniques,” Neurocomputing, vol. 70, no. 7, pp. 1304– 1330, 2007
work page 2007
-
[23]
Checkviz: Sanity check and topological clues for linear and non-linear mappings,
S. Lespinats and M. Aupetit, “Checkviz: Sanity check and topological clues for linear and non-linear mappings,” Computer Graphics Forum , vol. 30, no. 1, pp. 113–125, 2011
work page 2011
-
[24]
Visual anal- ysis of dimensionality reduction quality for parameterized projections,
R. M. Martins, D. B. Coimbra, R. Minghim, and A. Telea, “Visual anal- ysis of dimensionality reduction quality for parameterized projections,” Computers & Graphics , vol. 41, pp. 26–42, 2014
work page 2014
-
[25]
ProxiLens: Interactive Explo- ration of High-Dimensional Data using Projections,
N. Heulot, M. Aupetit, and J.-D. Fekete, “ProxiLens: Interactive Explo- ration of High-Dimensional Data using Projections,” in EuroVis Work- shop on Visual Analytics using Multidimensional Projections(M. Aupetit and L. van der Maaten, eds.), The Eurographics Association, 2013
work page 2013
-
[26]
J. Stahnke, M. D ¨ork, B. M ¨uller, and A. Thom, “Probing projections: Interaction techniques for interpreting arrangements and errors of dimen- sionality reductions,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 629–638, 2016
work page 2016
-
[27]
E. Couplet, P. Lambert, M. Verleysen, D. Mulders, J. A. Lee, and C. De Bodt, “Natively Interpretable t-SNE,” in Proceedings of AIMLAI workshop, vol. 1, p. 1, 2023
work page 2023
-
[28]
To- ward computing attributions for dimensionality reduction techniques,
M. Scicluna, J.-C. Grenier, R. Poujol, S. Lemieux, and J. G. Hussin, “To- ward computing attributions for dimensionality reduction techniques,” Bioinformatics advances, vol. 3, no. 1, p. vbad097, 2023
work page 2023
-
[29]
Visual Analysis of Multi- Dimensional Categorical Data Sets,
B. Broeksema, A. C. Telea, and T. Baudel, “Visual Analysis of Multi- Dimensional Categorical Data Sets,” Computer Graphics Forum, 2013
work page 2013
-
[30]
Explaining three-dimensional dimensionality reduction plots,
D. B. Coimbra, R. M. Martins, T. T. Neves, A. C. Telea, and F. V . Paulovich, “Explaining three-dimensional dimensionality reduction plots,” Information Visualization, vol. 15, no. 2, pp. 154–172, 2016
work page 2016
-
[31]
Uncovering representative groups in multidimensional projections,
P. Joia, F. Petronetto, and L. Nonato, “Uncovering representative groups in multidimensional projections,” Computer Graphics Forum , vol. 34, no. 3, pp. 281–290, 2015
work page 2015
-
[32]
Understanding attribute variability in multidimensional projections,
L. Pagliosa, P. Pagliosa, and L. G. Nonato, “Understanding attribute variability in multidimensional projections,” in 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) , pp. 297– 304, 2016
work page 2016
-
[33]
Explanation in artificial intelligence: Insights from the social sciences,
T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial Intelligence, vol. 267, pp. 1–38, 2019
work page 2019
-
[34]
Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning,
T. Fujiwara, O.-H. Kwon, and K.-L. Ma, “Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning,” IEEE Transactions on Visualization and Computer Graphics , vol. 26, no. 1, pp. 45–55, 2020
work page 2020
-
[35]
Contrastive analysis for scatterplot-based representations of dimensionality reduction,
W. E. Marc ´ılio-Jr, D. M. Eler, and R. E. Garcia, “Contrastive analysis for scatterplot-based representations of dimensionality reduction,” Com- puters & Graphics , vol. 101, pp. 46–58, 2021
work page 2021
-
[36]
Explaining dimensionality reduction results using shapley values,
W. E. Marc ´ılio-Jr and D. M. Eler, “Explaining dimensionality reduction results using shapley values,”Expert Systems with Applications, vol. 178, p. 115020, 2021
work page 2021
-
[37]
t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections,
A. Chatzimparmpas, R. M. Martins, and A. Kerren, “t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 8, pp. 2696–2714, 2020
work page 2020
-
[38]
Visual explo- ration of relationships and structure in low-dimensional embeddings,
K. Eckelt, A. Hinterreiter, P. Adelberger, C. Walchshofer, V . Dhanoa, C. Humer, M. Heckmann, C. Steinparz, and M. Streit, “Visual explo- ration of relationships and structure in low-dimensional embeddings,” IEEE Transactions on Visualization and Computer Graphics , vol. 29, p. 3312–3326, mar 2022
work page 2022
-
[39]
Explaining t-SNE embeddings locally by adapting LIME,
A. Bibal, V . Vu, G. Nanfack, and B. Fr ´enay, “Explaining t-SNE embeddings locally by adapting LIME,” in ESANN 2020, pp. 393–398, ESANN (i6doc.com), 2020
work page 2020
-
[40]
Splatterplots: Overcoming overdraw in scatter plots,
A. Mayorga and M. Gleicher, “Splatterplots: Overcoming overdraw in scatter plots,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 9, pp. 1526–1538, 2013
work page 2013
-
[41]
General projective maps for multidi- mensional data projection,
D. J. Lehmann and H. Theisel, “General projective maps for multidi- mensional data projection,” Computer Graphics Forum , vol. 35, no. 2, pp. 443–453, 2016
work page 2016
-
[42]
G. A. Miller, “The magical number seven, plus or minus two: Some lim- its on our capacity for processing information.,” Psychological Review, vol. 63, no. 2, pp. 81–97, 1956
work page 1956
-
[43]
The magical number 4 in short-term memory: A recon- sideration of mental storage capacity,
N. Cowan, “The magical number 4 in short-term memory: A recon- sideration of mental storage capacity,” Behavioral and Brain Sciences , vol. 24, no. 1, p. 87–114, 2001
work page 2001
-
[44]
Chunks in expert memory: evidence for the magical number four ... or is it two?,
F. Gobet and G. Clarkson, “Chunks in expert memory: evidence for the magical number four ... or is it two?,” Memory, vol. 12, pp. 732–747, Nov. 2004
work page 2004
- [45]
-
[46]
openTSNE: A Modular Python Library for t-SNE Dimensionality Reduction and Embedding,
P. G. Poli ˇcar, M. Straˇzar, and B. Zupan, “openTSNE: A Modular Python Library for t-SNE Dimensionality Reduction and Embedding,” Journal of Statistical Software , vol. 109, no. 3, p. 1–30, 2024
work page 2024
-
[47]
Orange: data mining toolbox in python,
J. Dem ˇsar, T. Curk, A. Erjavec, ˇC. Gorup, T. Ho ˇcevar, M. Milutinovi ˇc, M. Mo ˇzina, M. Polajnar, M. Toplak, A. Stari ˇc, et al. , “Orange: data mining toolbox in python,” The Journal of Machine Learning Research , vol. 14, no. 1, pp. 2349–2353, 2013
work page 2013
-
[48]
E. Anderson, “The species problem in iris,” Annals of the Missouri Botanical Garden, vol. 23, no. 3, pp. 457–509, 1936
work page 1936
-
[49]
I. C ¸ inar, M. Koklu, and S. Tasdemir, “Raisin.” UCI Machine Learning Repository, 2023
work page 2023
-
[50]
A. M. Horst, A. P. Hill, and K. B. Gorman, palmerpenguins: Palmer Archipelago (Antarctica) penguin data , 2020. R package version 0.1.0
work page 2020
-
[51]
R. J. M. Dawson, “The “unusual episode” data revisited,” Journal of Statistics Education, vol. 3, no. 3, 1995
work page 1995
-
[52]
Modeling wine preferences by data mining from physicochemical properties,
P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis, “Modeling wine preferences by data mining from physicochemical properties,” Decision support systems , vol. 47, no. 4, pp. 547–553, 2009
work page 2009
-
[53]
Language models are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amode...
work page 1901
-
[54]
LLaMA: Open and Efficient Foundation Language Models,
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” 2023
work page 2023
-
[55]
The Gene Ontology (GO) database and informatics resource,
G. O. Consortium, “The Gene Ontology (GO) database and informatics resource,” Nucleic Acids Research , vol. 32, pp. D258–D261, 01 2004
work page 2004
-
[56]
P. G. Poli ˇcar, D. Stanimirovi´c, and B. Zupan, “Nation-wide eprescription data reveals landscape of physicians and their drug prescribing patterns in slovenia,” in Artificial Intelligence in Medicine (J. M. Juarez, M. Mar- cos, G. Stiglic, and A. Tucker, eds.), (Cham), pp. 283–292, Springer Nature Switzerland, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.