SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini
Pith reviewed 2026-05-10 07:24 UTC · model grok-4.3
The pith
SciDraw-6K provides 6,291 AI-generated scientific illustrations paired with prompts in eleven languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SciDraw-6K is a curated dataset of 6,291 scientific illustrations synthesized by image-generation models, with each image paired with prompts in eleven languages spanning English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, and Russian. The images cover eight broad categories such as biomedical, chemistry, materials, electronics, environment, AI systems, physics, and other, and are produced mainly by specific model families. The dataset is purpose-built for scientific illustration including schematic diagrams, mechanism figures, table-of-contents graphics, and conceptual posters, and is released to support multilingual文本到
What carries the argument
The SciDraw-6K dataset of synthesized scientific illustrations with multilingual prompt pairings, built through a dedicated generation and curation pipeline for schematic and conceptual graphics.
Load-bearing premise
The generated illustrations accurately and representatively capture the intended scientific concepts without significant factual distortions.
What would settle it
Expert scientists reviewing a sample of the images and finding frequent inaccuracies in depicted mechanisms, structures, or concepts would indicate the dataset may not be suitable as training data.
Figures
read the original abstract
We present SciDraw-6K, a curated dataset of 6,291 scientific illustrations synthesized by Google Gemini image-generation models, each paired with prompts in eleven languages (English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, and Russian). Images span eight broad scientific categories -- biomedical, chemistry, materials, electronics, environment, AI systems, physics, and a long "other" tail -- and are produced primarily by the gemini-2.5-flash-image and gemini-3-pro-image-preview model families. In contrast to general-purpose text-to-image corpora that dominate the literature, SciDraw-6K is purpose-built for the scientific illustration genre: schematic diagrams, mechanism figures, table-of-contents graphics, and conceptual posters. We describe the construction pipeline, report dataset statistics, and document its use as the substrate of sci-draw.com, a public scientific drawing service. The dataset is released to support multilingual text-to-image research, domain-adapted diffusion fine-tuning, and prompt-engineering studies for scientific visualization. Dataset: https://huggingface.co/datasets/SciDrawAI/SciDraw-6K Code: https://github.com/SciDrawAI/scidraw-6k
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SciDraw-6K, a dataset of 6,291 scientific illustrations generated primarily by gemini-2.5-flash-image and gemini-3-pro-image-preview models from Google Gemini. Each image is paired with prompts in eleven languages (English, Simplified/Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, Russian) and spans eight categories (biomedical, chemistry, materials, electronics, environment, AI systems, physics, and a long 'other' tail). The work describes the construction pipeline, reports dataset statistics, documents its use as the substrate for the sci-draw.com service, and releases the data on Hugging Face with code on GitHub to support multilingual text-to-image research, domain-adapted diffusion fine-tuning, and prompt-engineering studies for scientific visualization.
Significance. If the generated images prove scientifically accurate, the dataset would fill a useful niche by providing a large, purpose-built, multilingual collection of schematic diagrams, mechanism figures, and conceptual posters that general-purpose text-to-image corpora do not emphasize. The public release with code and the explicit positioning for fine-tuning and prompt studies are strengths that could accelerate domain-specific work in computer vision.
major comments (2)
- [Abstract / construction pipeline] Abstract and construction pipeline description: the central claim that SciDraw-6K supplies a 'curated' resource 'suitable for ... scientific visualization research' is unsupported because the manuscript reports no validation of scientific accuracy—no expert review, no error-rate statistics, no comparison against ground-truth diagrams, and no explicit filtering criteria beyond broad category labels. This is load-bearing: without evidence that the images are faithful to the scientific concepts in the prompts (e.g., correct bond angles, circuit topologies, or process mechanisms), the dataset's utility for the stated downstream uses cannot be assessed.
- [Dataset statistics / release] Dataset statistics and release sections: the paper provides counts and category breakdowns but supplies no quantitative or qualitative evidence of curation for correctness, such as inter-annotator agreement on factual validity or rejection rates for implausible outputs. This omission directly affects the claim that the resource is ready for training or benchmarking.
minor comments (1)
- [Abstract] The abstract lists eleven languages but does not break down image counts or prompt quality per language; adding this table would improve transparency without altering the core contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments correctly identify that the manuscript does not report expert validation or quantitative accuracy metrics for the generated illustrations. We address each point below and will revise the manuscript to clarify the scope of our claims and add an explicit limitations discussion.
read point-by-point responses
-
Referee: [Abstract / construction pipeline] Abstract and construction pipeline description: the central claim that SciDraw-6K supplies a 'curated' resource 'suitable for ... scientific visualization research' is unsupported because the manuscript reports no validation of scientific accuracy—no expert review, no error-rate statistics, no comparison against ground-truth diagrams, and no explicit filtering criteria beyond broad category labels. This is load-bearing: without evidence that the images are faithful to the scientific concepts in the prompts (e.g., correct bond angles, circuit topologies, or process mechanisms), the dataset's utility for the stated downstream uses cannot be assessed.
Authors: We agree that no expert review, error-rate statistics, or ground-truth comparisons are reported. The word 'curated' in the abstract and pipeline description refers only to the systematic choice of eight scientific categories, prompt templates, and eleven-language translations; it does not imply post-generation verification of scientific fidelity. Because the images are synthesized by Gemini models, we did not perform such validation. We will revise the abstract to replace 'curated' with 'constructed' and insert a dedicated Limitations section that states the absence of accuracy validation, notes potential inaccuracies (e.g., incorrect diagrams), and clarifies that the dataset is released to enable community studies of AI-generated scientific visuals and domain-specific fine-tuning rather than as a ready-to-use benchmark of verified content. revision: yes
-
Referee: [Dataset statistics / release] Dataset statistics and release sections: the paper provides counts and category breakdowns but supplies no quantitative or qualitative evidence of curation for correctness, such as inter-annotator agreement on factual validity or rejection rates for implausible outputs. This omission directly affects the claim that the resource is ready for training or benchmarking.
Authors: We acknowledge that no inter-annotator agreement, rejection rates, or correctness statistics are provided. The released dataset contains all generated images without filtering for factual accuracy, as the goal is to supply a large, unfiltered multilingual corpus of Gemini outputs for research on prompt engineering and fine-tuning. We will update the dataset statistics and release sections to explicitly describe the lack of post-generation filtering and will add a short paragraph on how users may apply their own validation. The accompanying GitHub repository will be extended with example scripts for basic quality checks. revision: yes
Circularity Check
No circularity: purely descriptive dataset release
full rationale
The paper is a standard dataset release describing the generation of 6,291 images via Gemini models, multilingual prompt pairing, category statistics, and public hosting. It contains no derivations, equations, predictions, fitted parameters, uniqueness theorems, or self-citations that bear load on any claim. All content is observational and external (model outputs + release links), with no reduction of any result to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Synthetic vision datasets from frontier generative models, 2024
Yuntao Bai et al. Synthetic vision datasets from frontier generative models, 2024. Survey reference
work page 2024
-
[2]
SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini
Davie Chen. SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini. Zenodo, 2026. DOI: 10.5281/zenodo.19642870
-
[3]
PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, et al. PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023
work page 2023
-
[4]
Gemini: A family of highly capable multimodal models
Google DeepMind. Gemini: A family of highly capable multimodal models. Technical report, Google, 2024. 8 Figure 5: Gemini source-model distribution across approved images
work page 2024
-
[5]
Ting-Yao Hsu, C. Lee Giles, and Ting-Hao K. Huang. SciCap: Generating captions for scientific figures. InFindings of EMNLP, 2021
work page 2021
-
[6]
FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018
Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, ´Akos K´ ad´ ar, Adam Trischler, and Yoshua Bengio. FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018
work page 2018
-
[7]
JourneyDB: A benchmark for generative image understanding
Junting Pan, Keqiang Sun, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, et al. JourneyDB: A benchmark for generative image understanding. InNeurIPS, 2023
work page 2023
- [8]
-
[9]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022
work page 2022
-
[10]
LAION- 5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. LAION- 5B: An open large-scale dataset for training next generation image-text models. InNeurIPS Datasets and Benchmarks, 2022
work page 2022
-
[11]
Self-instruct: Aligning language model with self generated instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language model with self generated instructions. ACL, 2023
work page 2023
-
[12]
DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023
Zijie J Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023. 9
work page 2023
-
[13]
AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023
Fulong Ye, Guang Liu, Xinya Wu, and Lei Wu. AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023. 10
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.