arxiv: 2604.07989 · v1 · submitted 2026-04-09 · 💻 cs.IR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Show Me the Infographic I Imagine: Intent-Aware Infographic Retrieval for Authoring Support

Jing Xu , Jiarui Hu , Zhihao Shuai , Yiyun Chen , Weikai Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:05 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords infographic retrievalintent taxonomyquery enrichmentdesign authoringvisual design facetsexemplar adaptationuser intent modeling

0 comments

The pith

An intent taxonomy from user descriptions enriches natural language queries to retrieve infographics that better match design goals for authoring reuse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that infographic retrieval can be made more effective by first studying how people describe these visuals and building a taxonomy of their intents around content and visual design. This taxonomy then adds targeted cues to otherwise vague user queries, steering the search toward exemplars whose text-heavy, multi-component layouts actually align with what the user has in mind. Standard keyword searches and general image models fall short here because they ignore the layered nature of infographics. If the approach works, novice authors gain quicker access to reusable designs they can adapt to their own data, lowering the effort needed to produce data-driven stories.

Core claim

The authors present an intent-aware retrieval framework in which a taxonomy of content and visual-design intents, obtained through a formative study of user descriptions, is used to enrich and refine free-form queries. This enriched query then guides retrieval from a large infographic corpus so that the returned exemplars more closely satisfy the user's design intent. The framework further supports authoring by letting users express high-level edit intents that an interactive agent translates into low-level adaptations of the retrieved designs.

What carries the argument

The intent taxonomy that spans content and visual design facets and is applied to enrich free-form user queries with intent-specific cues for retrieval guidance.

If this is right

Retrieval quality exceeds that of baseline keyword and general vision-language methods on quantitative measures.
Retrieved exemplars more closely satisfy the user's stated design intent across content and visual facets.
Users complete infographic authoring tasks more efficiently by adapting retrieved designs rather than creating from scratch.
An interactive agent translates high-level edit intents into concrete low-level changes on the selected infographic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same taxonomy-driven enrichment could be tested on retrieval for other multi-element visuals such as dashboards or posters.
The enriched intent representation might serve as a bridge to generative models that create new infographics directly from the refined description.
Long-term use would require periodic checks to see whether new infographic styles require updates to the taxonomy categories.

Load-bearing premise

The taxonomy derived from the formative study will apply to diverse future user queries and that adding intent-specific cues will reliably improve retrieval performance on the text-heavy, composite nature of infographics.

What would settle it

A controlled benchmark in which intent-enriched queries produce no improvement, or a decline, in retrieval precision or user-rated intent satisfaction compared with plain keyword search or standard vision-language models on a held-out set of infographic queries.

Figures

Figures reproduced from arXiv: 2604.07989 by Jiarui Hu, Jing Xu, Weikai Yang, Yiyun Chen, Zhihao Shuai.

**Figure 1.** Figure 1: Overview of our intent-aware retrieve-and-adapt workflow for infographic authoring. (a) A natural-language query is parsed into [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Four infographic exemplars used in the formative study to elicit [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Facet patterns in participants’ queries. (a) A facet co-occurrence matrix computed from the natural language queries and keyword queries. (b) An annotated example query illustrates how a single query can contain multiple facet signals, motivating facet-specific rewriting in retrieval. design constraints. When comparing natural language queries and keyword queries, natural language ones expressed more facet… view at source ↗

**Figure 4.** Figure 4: Overview of our intent-aware infographic retrieval framework. Top (training): ChartGalaxy images are described by a multimodal LLM and distilled into short facet-specific captions (two per facet). We fine-tune the full model (text encoder, image encoder and facet MLP heads) with a multi-facet in-batch contrastive loss (Eq. 5). Bottom (inference): A user query q is parsed into facet rewrites {qf }, facet we… view at source ↗

**Figure 5.** Figure 5: System UI for conversational exemplar retrieval and SVG-based adaptation. (A) chat panel for intent articulation, assistant feedback, and iterative refinement; (B) committed infographic exemplars that persist as references during adaptation; (C) rendered SVG outputs with version history; and (D) retrieval panel for exemplar discovery and commitment. 5.2 Exemplar Retrieval During the inspiration phase, the … view at source ↗

read the original abstract

While infographics have become a powerful medium for communicating data-driven stories, authoring them from scratch remains challenging, especially for novice users. Retrieving relevant exemplars from a large corpus can provide design inspiration and promote reuse, substantially lowering the barrier to infographic authoring. However, effective retrieval is difficult because users often express design intent in ambiguous natural language, while infographics embody rich and multi-faceted visual designs. As a result, keyword-based search often fails to capture design intent, and general-purpose vision-language retrieval models trained on natural images are ill-suited to the text-heavy, multi-component nature of infographics. To address these challenges, we develop an intent-aware infographic retrieval framework that better aligns user queries with infographic designs. We first conduct a formative study of how people describe infographics and derive an intent taxonomy spanning content and visual design facets. This taxonomy is then leveraged to enrich and refine free-form user queries, guiding the retrieval process with intent-specific cues. Building on the retrieved exemplars, users can adapt the designs to their own data with high-level edit intents, supported by an interactive agent that performs low-level adaptation. Both quantitative evaluations and user studies are conducted to demonstrate that our method improves retrieval quality over baseline methods while better supporting intent satisfaction and efficient infographic authoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces an intent-aware infographic retrieval framework for authoring support. It conducts a formative study to derive a taxonomy of user intents spanning content and visual design facets, then uses this taxonomy to enrich and refine free-form natural-language queries with intent-specific cues. The enriched queries drive retrieval of exemplars from a corpus, after which an interactive agent assists users in adapting the designs to their own data via high-level edit intents. Quantitative evaluations and user studies are reported to demonstrate gains over baseline methods in retrieval quality, intent satisfaction, and authoring efficiency.

Significance. If the taxonomy generalizes and the enrichment step produces measurable lifts in alignment for text-heavy, multi-component infographics, the work could meaningfully advance retrieval techniques for complex visual documents and lower barriers for novice infographic authors. The combination of taxonomy-driven query refinement with agent-supported adaptation offers a concrete, implementable pipeline that integrates IR methods with HCI-style authoring support. However, the significance is currently limited by the absence of reported study scales, coverage statistics, ablation results, and concrete metrics in the provided description.

major comments (3)

[Formative Study / Intent Taxonomy] Formative study and taxonomy section: no participant count, query volume, derivation procedure, or coverage statistics are supplied for the intent taxonomy. Because the central claim rests on the taxonomy both spanning real user intents and enabling reliable cue enrichment for infographic-specific structure, the lack of these details leaves the generalization assumption unverified.
[Quantitative Evaluations] Quantitative evaluations section: the manuscript states that the method 'improves retrieval quality over baseline methods' but supplies no description of the baselines, the metrics employed (e.g., recall@K, nDCG), dataset characteristics, sample sizes, or any ablation isolating the contribution of intent-specific cue enrichment. Without these, the reported gains cannot be assessed for statistical robustness or attribution to the proposed components.
[User Studies] User studies section: the abstract claims better support for 'intent satisfaction and efficient infographic authoring,' yet no details on study design, task measures, statistical tests, or comparison conditions are visible. This is load-bearing for the authoring-support claim.

minor comments (1)

[Abstract] Abstract: key numerical outcomes (effect sizes, significance levels, or even qualitative summary statistics) from the quantitative evaluations and user studies are omitted, reducing the abstract's utility as a standalone summary.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. The comments highlight important areas where additional transparency is needed to support our claims. We have revised the manuscript to address each point by expanding the relevant sections with the requested methodological details, which strengthens the paper without altering its core contributions.

read point-by-point responses

Referee: [Formative Study / Intent Taxonomy] Formative study and taxonomy section: no participant count, query volume, derivation procedure, or coverage statistics are supplied for the intent taxonomy. Because the central claim rests on the taxonomy both spanning real user intents and enabling reliable cue enrichment for infographic-specific structure, the lack of these details leaves the generalization assumption unverified.

Authors: We agree that these details are critical for establishing the taxonomy's validity and generalizability. In the revised manuscript, we have expanded the formative study section to include the participant count and recruitment criteria, the total volume of queries collected, a clear description of the derivation procedure (including the analysis method and inter-rater reliability measures), and coverage statistics showing the proportion of observed intents captured by the taxonomy. We have also added illustrative examples of query enrichment. This revision directly verifies the assumptions underlying our approach. revision: yes
Referee: [Quantitative Evaluations] Quantitative evaluations section: the manuscript states that the method 'improves retrieval quality over baseline methods' but supplies no description of the baselines, the metrics employed (e.g., recall@K, nDCG), dataset characteristics, sample sizes, or any ablation isolating the contribution of intent-specific cue enrichment. Without these, the reported gains cannot be assessed for statistical robustness or attribution to the proposed components.

Authors: We apologize for the lack of specificity in this section. We have substantially revised the quantitative evaluations section to describe the baseline methods in detail, specify the evaluation metrics (including Recall@K and nDCG), characterize the dataset (size, source, and composition), report sample sizes and query counts, and present ablation results that isolate the contribution of the intent-specific cue enrichment step. Statistical significance of the observed improvements is now included. These changes allow readers to assess the robustness of the gains and attribute them appropriately to our proposed components. revision: yes
Referee: [User Studies] User studies section: the abstract claims better support for 'intent satisfaction and efficient infographic authoring,' yet no details on study design, task measures, statistical tests, or comparison conditions are visible. This is load-bearing for the authoring-support claim.

Authors: We acknowledge that the user studies section required more elaboration to support the authoring-support claims. In the revised version, we have added a full description of the study design (including participant details and procedure), the specific task measures for intent satisfaction and authoring efficiency, the statistical tests employed, and the comparison conditions (our system versus relevant baselines). Quantitative results with appropriate analysis are now presented to substantiate the improvements in intent satisfaction and authoring efficiency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external formative study and evaluations

full rationale

The paper's chain begins with a formative study to derive an intent taxonomy, then applies that taxonomy to enrich queries, performs retrieval, and validates via quantitative metrics plus user studies. None of these steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the taxonomy and performance claims are grounded in new data collection and external benchmarks rather than renaming or re-deriving prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the assumption that a formative-study-derived taxonomy can be used to meaningfully enrich queries and that user studies will demonstrate practical authoring benefits; no free parameters, invented entities, or formal axioms are specified.

pith-pipeline@v0.9.0 · 5542 in / 1082 out tokens · 42091 ms · 2026-05-10T18:05:37.386226+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first conduct a formative study of how people describe infographics and derive an intent taxonomy spanning content and visual design facets. This taxonomy is then leveraged to enrich and refine free-form user queries, guiding the retrieval process with intent-specific cues.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compute facet-conditioned similarities s_f(q, x_i) = e_{q,f}^T e_{x_i,f} and optimize with multi-facet in-batch contrastive loss on synthetic facet captions.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 41 canonical work pages · 1 internal anchor

[1]

H. K. Bako, X. Liu, L. Battle, and Z. Liu. Understanding how design- ers find and use data visualization examples.IEEE Transactions on Visualization and Computer Graphics, 29(1):1048–1058, 2023. doi: 10. 1109/TVCG.2022.3209490 1, 2

work page arXiv 2023
[2]

A. J. Chaney, D. M. Blei, and T. Eliassi-Rad. A probabilistic model for using social networks in personalized item recommendation. InProceed- ings of the 9th ACM Conference on Recommender Systems, RecSys ’15, 8 pages, p. 43–50. Association for Computing Machinery, New York, NY , USA, 2015. doi: 10.1145/2792838.2800193 8

work page doi:10.1145/2792838.2800193 2015
[3]

Q. Chen, Y . Chen, R. Zou, W. Shuai, Y . Guo, J. Wang, and N. Cao. Chart2Vec: A universal embedding of context-aware visualizations.IEEE Transactions on Visualization and Computer Graphics, 31(4):2167–2181,
[4]

doi: 10.1109/TVCG.2024.3383089 2

work page doi:10.1109/tvcg.2024.3383089 2024
[5]

W. Cui, J. Wang, H. Huang, Y . Wang, C.-Y . Lin, H. Zhang, and D. Zhang. A mixed-initiative approach to reusing infographic charts.IEEE Transac- tions on Visualization and Computer Graphics, 28(1):173–183, 2022. doi: 10.1109/TVCG.2021.3114856 1, 2

work page doi:10.1109/tvcg.2021.3114856 2022
[6]

Davis, X

R. Davis, X. Pu, Y . Ding, B. D. Hall, K. Bonilla, M. Feng, M. Kay, and L. Harrison. The risks of ranking: Revisiting graphical perception to model individual differences in visualization performance.IEEE Transactions on Visualization and Computer Graphics, 30(3):1756–1771, 2024. doi: 10 .1109/TVCG.2022.3226463 2

work page arXiv 2024
[7]

Dibia and C

V . Dibia and C. Demiralp. Data2vis: Automatic generation of data visu- alizations using sequence-to-sequence recurrent neural networks.IEEE Computer Graphics and Applications, 39(5):33–46, 2019. doi: 10.1109/ MCG.2019.2924636 2

work page arXiv 2019
[8]

Elaldi and T

S. Elaldi and T. Çifçi. The effectiveness of using infographics on academic achievement: A meta-analysis and a meta-thematic analysis.Journal of Pedagogical Research, 5(4):92–118, 2021. 1

2021
[9]

G. Guo, S. Das, J. Zhao, and A. Endert. More like vis, less like vis: Comparing interactions for integrating user preferences into partial specifi- cation recommenders.IEEE Transactions on Visualization and Computer Graphics, 31(12):10328–10339, 2025. doi: 10.1109/TVCG.2025.3596541 2

work page doi:10.1109/tvcg.2025.3596541 2025
[10]

S. G. Hart and L. E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research. In P. A. Hancock and N. Meshkati, eds.,Human Mental Workload, vol. 52 ofAdvances in Psychology, pp. 139–183. North-Holland, 1988. doi: 10.1016/S0166-4115 (08)62386-9 9

work page doi:10.1016/s0166-4115 1988
[11]

K. Z. Hu, M. A. Bakker, S. Li, T. Kraska, and C. A. Hidalgo. VizML: A machine learning approach to visualization recommendation. InPro- ceedings of the 2019 CHI Conference on Human Factors in Computing Systems, article no. 128, pp. 128:1–128:12. Association for Computing Machinery, New York, NY , USA, 2019. doi: 10.1145/3290605.3300358 2

work page doi:10.1145/3290605.3300358 2019
[12]

C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In M. Meila and T. Zhang, eds., Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Research, pp. 4904–...

2021
[13]

Moritz, C

B. Kovacs, P. O’Donovan, K. Bala, and A. Hertzmann. Context-aware asset search for graphic design.IEEE Transactions on Visualization and Computer Graphics, 25(7):2419–2429, 2019. doi: 10.1109/TVCG.2018. 2842734 2

work page doi:10.1109/tvcg.2018 2019
[14]

B. Lee, S. Srivastava, R. Kumar, R. Brafman, and S. R. Klemmer. De- signing with interactive example galleries. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, 10 pages, p. 2257–2266. Association for Computing Machinery, New York, NY , USA, 2010. doi: 10.1145/1753326.1753667 1, 2

work page doi:10.1145/1753326.1753667 2010
[15]

H. Li, Y . Wang, A. Wu, H. Wei, and H. Qu. Structure-aware visualization retrieval. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems, article no. 409, pp. 409:1–409:14. Association for Computing Machinery, New York, NY , USA, 2022. doi: 10.1145/3491102 .3502048 2

work page doi:10.1145/3491102 2022
[16]

H. Li, Y . Wang, S. Zhang, Y . Song, and H. Qu. KG4Vis: A knowledge graph-based approach for visualization recommendation.IEEE Transac- tions on Visualization and Computer Graphics, 28(1):195–205, 2022. doi: 10.1109/TVCG.2021.3114863 2

work page doi:10.1109/tvcg.2021.3114863 2022
[17]

Z. Li, D. Li, Y . Guo, X. Guo, B. Li, L. Xiao, S. Qiao, J. Chen, Z. Wu, H. Zhang, X. Shu, and S. Liu. ChartGalaxy: A dataset for infographic chart understanding and generation. InThe Fourteenth International Conference on Learning Representations, 2026. 4, 5

2026
[18]

S. Liu, W. Yang, J. Wang, and J. Yuan.Visualization for Artificial Intelli- gence. Springer Nature Switzerland, 2025. doi: 10.1007/978-3-031-75340 -4 1

work page doi:10.1007/978-3-031-75340 2025
[19]

Z. Liu, J. Thompson, A. Wilson, M. Dontcheva, J. Delorey, S. Grigg, B. Kerr, and J. Stasko. Data Illustrator: Augmenting vector design tools with lazy data binding for expressive visualization authoring. InProceed- ings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, 13 pages, p. 1–13. Association for Computing Machinery, New Yo...

work page doi:10.1145/3173574.3173697 2018
[20]

Mackinlay, P

J. Mackinlay, P. Hanrahan, and C. Stolte. Show Me: Automatic presenta- tion for visual analysis.IEEE Transactions on Visualization and Computer Graphics, 13(6):1137–1144, 2007. doi: 10.1109/TVCG.2007.70594 2

work page doi:10.1109/tvcg.2007.70594 2007
[21]

Effectiveness of Animation in Trend Visualization

D. Moritz, C. Wang, G. L. Nelson, H. Lin, A. M. Smith, B. Howe, and J. Heer. Formalizing visualization design knowledge as constraints: Ac- tionable and extensible models in draco.IEEE Transactions on Visualiza- tion and Computer Graphics, 25(1):438–448, 2019. doi: 10.1109/TVCG. 2018.2865240 2

work page doi:10.1109/tvcg 2019
[22]

Narechania, A

A. Narechania, A. Srinivasan, and J. Stasko. NL4DV: A toolkit for gener- ating analytic specifications for data visualization from natural language queries.IEEE Transactions on Visualization and Computer Graphics, 27(2):369–379, 2021. doi: 10.1109/TVCG.2020.3030378 2

work page doi:10.1109/tvcg.2020.3030378 2021
[23]

H. N. Nguyen and N. Gehlenborg. Safire: Similarity framework for visualization retrieval. In2025 IEEE Visualization and Visual Analytics (VIS), pp. 246–250, 2025. doi: 10.1109/VIS60296.2025.00055 2

work page doi:10.1109/vis60296.2025.00055 2025
[24]

Nowak, F

A. Nowak, F. Piccinno, and Y . Altun. Multimodal chart retrieval: A com- parison of text, table and image based approaches. In K. Duh, H. Gomez, and S. Bethard, eds.,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies (Volume 1: Long Papers), pp. 5488–5505. Associ...

2024
[25]

doi: 10.18653/v1/2024.naacl-long.307 2

work page doi:10.18653/v1/2024.naacl-long.307 2024
[26]

Oppermann, R

M. Oppermann, R. Kincaid, and T. Munzner. VizCommender: Computing text-based similarity in visualization repositories for content-based recom- mendations.IEEE Transactions on Visualization and Computer Graphics, 27(2):495–505, 2021. doi: 10.1109/TVCG.2020.3030387 2

work page doi:10.1109/tvcg.2020.3030387 2021
[27]

C. Qian, S. Sun, W. Cui, J.-G. Lou, H. Zhang, and D. Zhang. Retrieve- Then-Adapt: Example-based automatic generation for proportion-related infographics.IEEE Transactions on Visualization and Computer Graphics, 27(2):443–452, 2021. doi: 10.1109/TVCG.2020.3030448 2

work page doi:10.1109/tvcg.2020.3030448 2021
[28]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, eds.,Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Re...

2021
[29]

D. Ren, B. Lee, and M. Brehmer. Charticulator: Interactive construction of bespoke chart layouts.IEEE Transactions on Visualization and Computer Graphics, 25(1):789–799, 2019. doi: 10.1109/TVCG.2018.2865158 2

work page doi:10.1109/tvcg.2018.2865158 2019
[30]

Saleh, M

B. Saleh, M. Dontcheva, A. Hertzmann, and Z. Liu. Learning style similarity for searching infographics. InProceedings of the 41st Graphics Interface Conference, GI ’15, 6 pages, p. 59–64. Canadian Information Processing Society, CAN, 2015. 1, 2

2015
[31]

Satyanarayan, D

A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-lite: A grammar of interactive graphics.IEEE Transactions on Visualization and Computer Graphics, 23(1):341–350, 2017. doi: 10.1109/TVCG.2016. 2599030 2

work page doi:10.1109/tvcg.2016 2017
[32]

Shuai, B

Z. Shuai, B. Li, S. Yan, Y . Luo, and W. Yang. Deepvis: Bridging natural language and data visualization through step-wise reasoning.IEEE Trans- actions on Visualization and Computer Graphics, 32(1):868–878, 2026. doi: 10.1109/TVCG.2025.3634645 2

work page doi:10.1109/tvcg.2025.3634645 2026
[33]

K. Son, D. Choi, T. S. Kim, Y .-H. Kim, and J. Kim. GenQuery: Supporting expressive visual search with generative models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, article 10 arXiv preprint no. 180, pp. 180:1–180:19. Association for Computing Machinery, New York, NY , USA, 2024. doi: 10.1145/3613904.3642847 2

work page doi:10.1145/3613904.3642847 2024
[34]

Y . Song, X. Zhao, R. C.-W. Wong, and D. Jiang. RGVisNet: A hybrid retrieval-generation neural framework towards automatic data visualization generation. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, 10 pages, p. 1646–1655. Association for Computing Machinery, New York, NY , USA, 2022. doi: 10.1145/353467...

work page doi:10.1145/3534678.3539330 2022
[35]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

M. Tschannen, A. Gritsenko, X. Wang, M. F. Naeem, I. Alabdulmohsin, N. Parthasarathy, T. Evans, L. Beyer, Y . Xia, B. Mustafa, O. Hénaff, J. Harmsen, A. Steiner, and X. Zhai. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025. 2, 7

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Tyagi, J

A. Tyagi, J. Zhao, P. Patel, S. Khurana, and K. Mueller. Infographics Wiz- ard: Flexible Infographics Authoring and Design Exploration.Computer Graphics Forum, 2022. doi: 10.1111/cgf.14527 2

work page doi:10.1111/cgf.14527 2022
[37]

C. Wang, J. Thompson, and B. Lee. Data Formulator: Ai-powered concept- driven visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 30(1):1128–1138, 2024. doi: 10.1109/TVCG.2023. 3326585 2

work page doi:10.1109/tvcg.2023 2024
[38]

Y . Wang, Z. Hou, L. Shen, T. Wu, J. Wang, H. Huang, H. Zhang, and D. Zhang. Towards natural language-based visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232,
[39]

doi: 10.1109/TVCG.2022.3209357 2

work page doi:10.1109/tvcg.2022.3209357 2022
[40]

Y . Wang, H. Zhang, H. Huang, X. Chen, Q. Yin, Z. Hou, D. Zhang, Q. Luo, and H. Qu. InfoNice: Easy creation of information graphics. InProceed- ings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, 12 pages, p. 1–12. Association for Computing Machinery, New York, NY , USA, 2018. doi: 10.1145/3173574.3173909 2

work page doi:10.1145/3173574.3173909 2018
[41]

Wongsuphasawat, D

K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer. Towards a general-purpose query language for visualization recommendation. InProceedings of the Workshop on Human-In-the- Loop Data Analytics, HILDA ’16, article no. 4, 6 pages. Association for Computing Machinery, New York, NY , USA, 2016. doi: 10.1145/2939502 .2939506 2

work page doi:10.1145/2939502 2016
[42]

Wongsuphasawat, D

K. Wongsuphasawat, D. Moritz, A. Anand, J. D. Mackinlay, B. Howe, and J. Heer. V oyager: Exploratory analysis via faceted browsing of visualization recommendations.IEEE Transactions on Visualization and Computer Graphics, 22(1):649–658, 2016. doi: 10.1109/TVCG.2015. 2467191 2

work page doi:10.1109/tvcg.2015 2016
[43]

Keith Edwards, and Rebecca E

K. Wongsuphasawat, Z. Qu, D. Moritz, R. Chang, F. Ouk, A. Anand, J. Mackinlay, B. Howe, and J. Heer. V oyager 2: Augmenting visual analysis with partial view specifications. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 2648–2659. Association for Computing Machinery, New York, NY , USA, 2017. doi: 10.1145/3025453.3025768 2

work page doi:10.1145/3025453.3025768 2017
[44]

W. Yang, M. Liu, Z. Wang, and S. Liu. Foundation models meet visu- alizations: Challenges and opportunities.Computational Visual Media, 10(3):399–424, May 2024. doi: 10.1007/s41095-023-0393-x 2

work page doi:10.1007/s41095-023-0393-x 2024
[45]

L.-P. Yuan, Z. Zhou, J. Zhao, Y . Guo, F. Du, and H. Qu. InfoColorizer: Interactive recommendation of color palettes for infographics.IEEE Transactions on Visualization and Computer Graphics, 28(12):4252–4266,
[46]

doi: 10.1109/TVCG.2021.3085327 2

work page doi:10.1109/tvcg.2021.3085327 2021
[47]

Zhang, H

S. Zhang, H. Li, H. Qu, and Y . Wang. AdaVis: Adaptive and explainable visualization recommendation for tabular data.IEEE Transactions on Visualization and Computer Graphics, 30(9):5923–5938, 2024. doi: 10. 1109/TVCG.2023.3316469 2

work page arXiv 2024
[48]

J. Zhou, Y . Xiong, Z. Liu, Z. Liu, S. Xiao, Y . Wang, B. Zhao, C. J. Zhang, and D. Lian. MegaPairs: Massive data synthesis for universal multimodal retrieval. In W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, eds.,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 19076–19095. Associa...

work page doi:10.18653/v1/2025.acl-long.935 2025
[49]

T. Zhou, J. Huang, and G. Y .-Y . Chan. Epigraphics: Message-driven infographics authoring. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, article no. 200, 18 pages. Association for Computing Machinery, New York, NY , USA, 2024. doi: 10.1145/3613904.3642172 2

work page doi:10.1145/3613904.3642172 2024
[50]

Zhu-Tian, Y

C. Zhu-Tian, Y . Wang, Q. Wang, Y . Wang, and H. Qu. Towards automated infographic design: Deep learning-based auto-extraction of extensible timeline.IEEE Transactions on Visualization and Computer Graphics, 26(1):917–926, 2020. doi: 10.1109/TVCG.2019.2934810 2 11

work page doi:10.1109/tvcg.2019.2934810 2020