MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
PMC-InterCPT builds a context-grounded biomedical interleaved corpus from PMC literature and shows it improves multimodal performance on Qwen3.5-4B-Base after CPT and SFT while using fewer tokens.
citing papers explorer
-
Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature
MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
-
PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining
PMC-InterCPT builds a context-grounded biomedical interleaved corpus from PMC literature and shows it improves multimodal performance on Qwen3.5-4B-Base after CPT and SFT while using fewer tokens.