SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Davie Chen

arxiv: 2604.17206 · v1 · submitted 2026-04-19 · 💻 cs.CV

SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Davie Chen This is my paper

Pith reviewed 2026-05-10 07:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords scientific illustration datasetmultilingual text-to-imageAI-generated diagramsscientific visualizationdataset releaseprompt engineeringdiffusion model fine-tuningschematic figures

0 comments

The pith

SciDraw-6K provides 6,291 AI-generated scientific illustrations paired with prompts in eleven languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new dataset called SciDraw-6K consisting of thousands of scientific illustrations created by image generation models. These illustrations are paired with text prompts translated into eleven languages to cover a range of scientific fields including physics, chemistry, and biomedicine. Unlike general image datasets, this one focuses on schematic diagrams, mechanism figures, and conceptual graphics that scientists use. The authors detail how they built it and release it publicly along with a website that uses it for generating scientific drawings. This matters because it gives researchers a targeted resource to improve how AI systems handle the specific demands of scientific visualization.

Core claim

SciDraw-6K is a curated dataset of 6,291 scientific illustrations synthesized by image-generation models, with each image paired with prompts in eleven languages spanning English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, and Russian. The images cover eight broad categories such as biomedical, chemistry, materials, electronics, environment, AI systems, physics, and other, and are produced mainly by specific model families. The dataset is purpose-built for scientific illustration including schematic diagrams, mechanism figures, table-of-contents graphics, and conceptual posters, and is released to support multilingual文本到

What carries the argument

The SciDraw-6K dataset of synthesized scientific illustrations with multilingual prompt pairings, built through a dedicated generation and curation pipeline for schematic and conceptual graphics.

Load-bearing premise

The generated illustrations accurately and representatively capture the intended scientific concepts without significant factual distortions.

What would settle it

Expert scientists reviewing a sample of the images and finding frequent inaccuracies in depicted mechanisms, structures, or concepts would indicate the dataset may not be suitable as training data.

Figures

Figures reproduced from arXiv: 2604.17206 by Davie Chen.

**Figure 2.** Figure 2: Per-language non-null rate of prompt fields. All eleven languages are populated for 100% [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: English prompt length distribution (characters). [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Number of images generated per month. Potential harms. Synthetic scientific imagery can in principle be misused to fabricate plausiblelooking but incorrect figures. We discourage use of SciDraw-6K imagery as ground-truth scientific evidence; the dataset is intended for visualization, education, and ML research purposes. 7 Conclusion We have introduced SciDraw-6K, a small but high-density dataset of 6,291 … view at source ↗

**Figure 5.** Figure 5: Gemini source-model distribution across approved images. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

We present SciDraw-6K, a curated dataset of 6,291 scientific illustrations synthesized by Google Gemini image-generation models, each paired with prompts in eleven languages (English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, and Russian). Images span eight broad scientific categories -- biomedical, chemistry, materials, electronics, environment, AI systems, physics, and a long "other" tail -- and are produced primarily by the gemini-2.5-flash-image and gemini-3-pro-image-preview model families. In contrast to general-purpose text-to-image corpora that dominate the literature, SciDraw-6K is purpose-built for the scientific illustration genre: schematic diagrams, mechanism figures, table-of-contents graphics, and conceptual posters. We describe the construction pipeline, report dataset statistics, and document its use as the substrate of sci-draw.com, a public scientific drawing service. The dataset is released to support multilingual text-to-image research, domain-adapted diffusion fine-tuning, and prompt-engineering studies for scientific visualization. Dataset: https://huggingface.co/datasets/SciDrawAI/SciDraw-6K Code: https://github.com/SciDrawAI/scidraw-6k

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SciDraw-6K is a straightforward release of 6k Gemini-generated scientific diagrams with 11-language prompts, useful as a niche resource but missing any check on factual accuracy.

read the letter

The paper's main point is the creation and release of SciDraw-6K: 6,291 images of scientific illustrations produced by Gemini models, each paired with prompts in English plus ten other languages. The images cover eight categories including biomedical, chemistry, physics, and electronics, with a focus on schematics, mechanisms, and conceptual figures rather than general photos. They also describe the generation pipeline, release the data on Hugging Face, share code on GitHub, and note its use in the sci-draw.com service. That combination of scientific focus and multilingual pairing is what is actually new here compared with broader text-to-image collections. The release itself is clean and immediately accessible, which is a practical plus for anyone wanting to experiment with domain-adapted fine-tuning or prompt studies in this narrow genre. The stats on category distribution and model versions used are helpful for understanding the scope. The soft spot is the absence of any validation that the images are scientifically correct. The abstract talks about curation but gives no details on expert review, error rates, or filtering for factual mistakes such as wrong diagrams or mislabeled processes. For a dataset meant to support research on scientific visualization, that gap matters because plausible-looking but inaccurate content would limit its value for training or benchmarking. This work is aimed at computer vision groups doing multilingual or specialized generative modeling. Readers who need a ready-made starting set for fine-tuning in the scientific illustration space could get some use out of it, though they would likely have to add their own quality checks. It is worth sending to peer review. Dataset papers benefit from external eyes on the construction details and release artifacts, and the multilingual angle gives it enough substance to justify referee time even if revisions are needed on the validation side.

Referee Report

2 major / 1 minor

Summary. The paper presents SciDraw-6K, a dataset of 6,291 scientific illustrations generated primarily by gemini-2.5-flash-image and gemini-3-pro-image-preview models from Google Gemini. Each image is paired with prompts in eleven languages (English, Simplified/Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, Russian) and spans eight categories (biomedical, chemistry, materials, electronics, environment, AI systems, physics, and a long 'other' tail). The work describes the construction pipeline, reports dataset statistics, documents its use as the substrate for the sci-draw.com service, and releases the data on Hugging Face with code on GitHub to support multilingual text-to-image research, domain-adapted diffusion fine-tuning, and prompt-engineering studies for scientific visualization.

Significance. If the generated images prove scientifically accurate, the dataset would fill a useful niche by providing a large, purpose-built, multilingual collection of schematic diagrams, mechanism figures, and conceptual posters that general-purpose text-to-image corpora do not emphasize. The public release with code and the explicit positioning for fine-tuning and prompt studies are strengths that could accelerate domain-specific work in computer vision.

major comments (2)

[Abstract / construction pipeline] Abstract and construction pipeline description: the central claim that SciDraw-6K supplies a 'curated' resource 'suitable for ... scientific visualization research' is unsupported because the manuscript reports no validation of scientific accuracy—no expert review, no error-rate statistics, no comparison against ground-truth diagrams, and no explicit filtering criteria beyond broad category labels. This is load-bearing: without evidence that the images are faithful to the scientific concepts in the prompts (e.g., correct bond angles, circuit topologies, or process mechanisms), the dataset's utility for the stated downstream uses cannot be assessed.
[Dataset statistics / release] Dataset statistics and release sections: the paper provides counts and category breakdowns but supplies no quantitative or qualitative evidence of curation for correctness, such as inter-annotator agreement on factual validity or rejection rates for implausible outputs. This omission directly affects the claim that the resource is ready for training or benchmarking.

minor comments (1)

[Abstract] The abstract lists eleven languages but does not break down image counts or prompt quality per language; adding this table would improve transparency without altering the core contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments correctly identify that the manuscript does not report expert validation or quantitative accuracy metrics for the generated illustrations. We address each point below and will revise the manuscript to clarify the scope of our claims and add an explicit limitations discussion.

read point-by-point responses

Referee: [Abstract / construction pipeline] Abstract and construction pipeline description: the central claim that SciDraw-6K supplies a 'curated' resource 'suitable for ... scientific visualization research' is unsupported because the manuscript reports no validation of scientific accuracy—no expert review, no error-rate statistics, no comparison against ground-truth diagrams, and no explicit filtering criteria beyond broad category labels. This is load-bearing: without evidence that the images are faithful to the scientific concepts in the prompts (e.g., correct bond angles, circuit topologies, or process mechanisms), the dataset's utility for the stated downstream uses cannot be assessed.

Authors: We agree that no expert review, error-rate statistics, or ground-truth comparisons are reported. The word 'curated' in the abstract and pipeline description refers only to the systematic choice of eight scientific categories, prompt templates, and eleven-language translations; it does not imply post-generation verification of scientific fidelity. Because the images are synthesized by Gemini models, we did not perform such validation. We will revise the abstract to replace 'curated' with 'constructed' and insert a dedicated Limitations section that states the absence of accuracy validation, notes potential inaccuracies (e.g., incorrect diagrams), and clarifies that the dataset is released to enable community studies of AI-generated scientific visuals and domain-specific fine-tuning rather than as a ready-to-use benchmark of verified content. revision: yes
Referee: [Dataset statistics / release] Dataset statistics and release sections: the paper provides counts and category breakdowns but supplies no quantitative or qualitative evidence of curation for correctness, such as inter-annotator agreement on factual validity or rejection rates for implausible outputs. This omission directly affects the claim that the resource is ready for training or benchmarking.

Authors: We acknowledge that no inter-annotator agreement, rejection rates, or correctness statistics are provided. The released dataset contains all generated images without filtering for factual accuracy, as the goal is to supply a large, unfiltered multilingual corpus of Gemini outputs for research on prompt engineering and fine-tuning. We will update the dataset statistics and release sections to explicitly describe the lack of post-generation filtering and will add a short paragraph on how users may apply their own validation. The accompanying GitHub repository will be extended with example scripts for basic quality checks. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive dataset release

full rationale

The paper is a standard dataset release describing the generation of 6,291 images via Gemini models, multilingual prompt pairing, category statistics, and public hosting. It contains no derivations, equations, predictions, fitted parameters, uniqueness theorems, or self-citations that bear load on any claim. All content is observational and external (model outputs + release links), with no reduction of any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset construction and release paper with no mathematical derivations, fitted parameters, or postulated entities; it relies on standard practices of using commercial image generators and basic curation.

pith-pipeline@v0.9.0 · 5517 in / 1082 out tokens · 44929 ms · 2026-05-10T07:24:35.590788+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Synthetic vision datasets from frontier generative models, 2024

Yuntao Bai et al. Synthetic vision datasets from frontier generative models, 2024. Survey reference

work page 2024
[2]

SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini

Davie Chen. SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini. Zenodo, 2026. DOI: 10.5281/zenodo.19642870

work page doi:10.5281/zenodo.19642870 2026
[3]

PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, et al. PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023

work page 2023
[4]

Gemini: A family of highly capable multimodal models

Google DeepMind. Gemini: A family of highly capable multimodal models. Technical report, Google, 2024. 8 Figure 5: Gemini source-model distribution across approved images

work page 2024
[5]

Lee Giles, and Ting-Hao K

Ting-Yao Hsu, C. Lee Giles, and Ting-Hao K. Huang. SciCap: Generating captions for scientific figures. InFindings of EMNLP, 2021

work page 2021
[6]

FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018

Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, ´Akos K´ ad´ ar, Adam Trischler, and Yoshua Bengio. FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018

work page 2018
[7]

JourneyDB: A benchmark for generative image understanding

Junting Pan, Keqiang Sun, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, et al. JourneyDB: A benchmark for generative image understanding. InNeurIPS, 2023

work page 2023
[8]

Friedrich

Obioma Pelka, Sven Koitka, Johannes R¨ uckert, Felix Nensa, and Christoph M. Friedrich. Radiology objects in context (ROCO): A multimodal image-dataset.MICCAI Workshop, 2018

work page 2018
[9]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

work page 2022
[10]

LAION- 5B: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. LAION- 5B: An open large-scale dataset for training next generation image-text models. InNeurIPS Datasets and Benchmarks, 2022

work page 2022
[11]

Self-instruct: Aligning language model with self generated instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language model with self generated instructions. ACL, 2023

work page 2023
[12]

DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023

Zijie J Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023. 9

work page 2023
[13]

AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023

Fulong Ye, Guang Liu, Xinya Wu, and Lei Wu. AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023. 10

work page arXiv 2023

[1] [1]

Synthetic vision datasets from frontier generative models, 2024

Yuntao Bai et al. Synthetic vision datasets from frontier generative models, 2024. Survey reference

work page 2024

[2] [2]

SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini

Davie Chen. SciDraw-6K: A multilingual scientific illustration dataset generated by Google Gemini. Zenodo, 2026. DOI: 10.5281/zenodo.19642870

work page doi:10.5281/zenodo.19642870 2026

[3] [3]

PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, et al. PaLI: A jointly-scaled multilingual language-image model.ICLR, 2023

work page 2023

[4] [4]

Gemini: A family of highly capable multimodal models

Google DeepMind. Gemini: A family of highly capable multimodal models. Technical report, Google, 2024. 8 Figure 5: Gemini source-model distribution across approved images

work page 2024

[5] [5]

Lee Giles, and Ting-Hao K

Ting-Yao Hsu, C. Lee Giles, and Ting-Hao K. Huang. SciCap: Generating captions for scientific figures. InFindings of EMNLP, 2021

work page 2021

[6] [6]

FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018

Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, ´Akos K´ ad´ ar, Adam Trischler, and Yoshua Bengio. FigureQA: An annotated figure dataset for visual reasoning.ICLR Workshop, 2018

work page 2018

[7] [7]

JourneyDB: A benchmark for generative image understanding

Junting Pan, Keqiang Sun, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, et al. JourneyDB: A benchmark for generative image understanding. InNeurIPS, 2023

work page 2023

[8] [8]

Friedrich

Obioma Pelka, Sven Koitka, Johannes R¨ uckert, Felix Nensa, and Christoph M. Friedrich. Radiology objects in context (ROCO): A multimodal image-dataset.MICCAI Workshop, 2018

work page 2018

[9] [9]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

work page 2022

[10] [10]

LAION- 5B: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. LAION- 5B: An open large-scale dataset for training next generation image-text models. InNeurIPS Datasets and Benchmarks, 2022

work page 2022

[11] [11]

Self-instruct: Aligning language model with self generated instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language model with self generated instructions. ACL, 2023

work page 2023

[12] [12]

DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023

Zijie J Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models.ACL, 2023. 9

work page 2023

[13] [13]

AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023

Fulong Ye, Guang Liu, Xinya Wu, and Lei Wu. AltDiffusion: A multilingual text-to-image diffusion model.arXiv preprint arXiv:2308.09991, 2023. 10

work page arXiv 2023