From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction

Ali Shargh; Jaafar A. El-Awady; Jing Luo; Khalid A. El-Awady; Koushik Rameshbabu

arxiv: 2604.07584 · v1 · submitted 2026-04-08 · 💻 cs.AI

From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction

Koushik Rameshbabu , Jing Luo , Ali Shargh , Khalid A. El-Awady , Jaafar A. El-Awady This is my paper

Pith reviewed 2026-05-10 17:30 UTC · model grok-4.3

classification 💻 cs.AI

keywords materials data extractionLLM workflowshock physicsspall strengthpriority-based extractionstructured datasets from papersphysics-based validationfull-text mining

0 comments

The pith

A priority-based LLM workflow extracts structured shock-physics data from full-text papers by combining direct text extraction, physics derivations, and figure digitization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a prompt-driven workflow that uses large language models to pull and organize experimental records from research articles on alloy spall strength. Information scattered across text, tables, figures, and equations is handled through three priority levels, with direct values taken first, followed by calculations from verified physics relations, and figure digitization last. Values are normalized to standard units, tagged for source traceability, and checked for physical consistency. On a test set of 30 papers containing 11,967 data points, the method reaches 94.93 percent accuracy on direct extractions, 92.04 percent on derivations, 83.49 percent on figures, and 94.69 percent overall weighted accuracy. This matters because manual collection of such data is slow and prone to error, while the workflow produces analysis-ready tables without task-specific model training.

Core claim

The workflow targets 37 experimentally relevant fields per shot and reconstructs complete records by routing each field through a three-level priority strategy: T1 for direct extraction from text and tables, T2 for physics-based derivation using governing relations when direct values are missing, and T3 for digitization from figures when needed. Extracted values are normalized to canonical units, tagged by priority for traceability, and validated through physics-based consistency and plausibility checks, yielding priority-wise accuracies of 94.93 percent (T1), 92.04 percent (T2), and 83.49 percent (T3) with an overall weighted accuracy of 94.69 percent across 11,967 evaluated data points in

What carries the argument

The three-level priority strategy (T1 direct text/table extraction, T2 physics-based derivation from verified governing relations, T3 figure digitization) integrated with unit normalization and physics consistency checks.

Load-bearing premise

The LLM can reliably interpret and apply physics equations for derivations and accurately read values from figures, while the consistency checks catch all remaining errors.

What would settle it

Running the workflow on a fresh set of 30 papers from the same domain and finding that overall weighted accuracy drops below 90 percent or that a substantial fraction of derived or digitized values fail the physics plausibility checks would falsify the reliability claim.

Figures

Figures reproduced from arXiv: 2604.07584 by Ali Shargh, Jaafar A. El-Awady, Jing Luo, Khalid A. El-Awady, Koushik Rameshbabu.

**Figure 2.** Figure 2: Example evidence log for Tier 1 (direct), Tier 2 (calculated), and Tier 3 (figure) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Weighted average extraction accuracy by article and aggregated at all extraction tiers [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy for each extraction tier using Gemini 3 Pro for all articles combined. The [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of figure-based extraction errors. (a) Small axis-title font can cause the LLM [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Weighted closeness score for Claude Opus 4.5 vs Gemini 3 Pro for all the papers. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: The closeness score for each extraction tier between Claude Opus 4.5 and Gemini 3 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

read the original abstract

Scientific data are widely dispersed across research articles and are often reported inconsistently across text, tables, and figures, making manual data extraction and aggregation slow and error-prone. We present a prompt-driven, hierarchical workflow that uses a large language model (LLM) to automatically extract and reconstruct structured, shot-level shock-physics experimental records by integrating information distributed across text, tables, figures, and physics-based derivations from full-text published research articles, using alloy spall strength as a representative case study. The pipeline targeted 37 experimentally relevant fields per shot and applied a three-level priority strategy: (T1) direct extraction from text/tables, (T2) physics-based derivation using verified governing relations, and (T3) digitization from figures when necessary. Extracted values were normalized to canonical units, tagged by priority for traceability, and validated with physics-based consistency and plausibility checks. Evaluated on a benchmark of 30 published research articles comprising 11,967 evaluated data points, the workflow achieved high overall accuracy, with priority-wise accuracies of 94.93% (T1), 92.04% (T2), and 83.49% (T3), and an overall weighted accuracy of 94.69%. Cross-model testing further indicated strong agreement for text/table and equation-derived fields, with lower agreement for figure-based extraction. Implementation through an API interface demonstrated the scalability of the approach, achieving consistent extraction performance and, in a subset of test cases, matching or exceeding chat-based accuracy. This workflow demonstrates a practical approach for converting unstructured technical literature into traceable, analysis-ready datasets without task-specific fine-tuning, enabling scalable database construction in materials science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical priority-tiered LLM workflow for pulling shot-level experimental records out of materials papers, with reported 94.7% weighted accuracy on 30 articles, but the ground-truth labeling for the 11k points lacks any reported agreement stats.

read the letter

The core thing to know is that this work builds a three-tier extraction pipeline for shock-physics data: direct text/table pulls at T1, physics-based derivations at T2, and figure digitization at T3, all wrapped with unit normalization, traceability tags, and consistency checks. They ran it on alloy spall strength records across 30 papers and 11,967 points, hitting 94.93% on T1, 92.04% on T2, 83.49% on T3, and 94.69% weighted overall, plus decent cross-model agreement via API without fine-tuning. That combination of priorities plus physics fill-ins is the actual new piece beyond plain LLM scraping.

Referee Report

1 major / 2 minor

Summary. The manuscript describes a prompt-driven, hierarchical LLM workflow for extracting structured shock-physics experimental data (alloy spall strength case study) from full-text papers. It integrates direct text/table extraction (T1), physics-based derivations from governing relations (T2), and figure digitization (T3) across 37 fields per shot, with unit normalization, priority tagging, and physics consistency checks. Evaluated on 30 published papers (11,967 data points), the workflow reports accuracies of 94.93% (T1), 92.04% (T2), 83.49% (T3), and 94.69% weighted overall, plus cross-model agreement and API scalability results.

Significance. If the evaluation holds after addressing ground-truth validation, the work would offer a practical, scalable method for converting dispersed materials literature into traceable property tables without fine-tuning. The priority hierarchy combined with physics-based checks and traceability tagging is a constructive approach to mitigating LLM hallucinations in technical domains. The API demonstration and cross-model tests provide evidence of robustness and deployability, which could accelerate database construction in materials science if the benchmark proves representative.

major comments (1)

[Section 4] Section 4 (Evaluation and Results): The ground-truth reference values for the 11,967 data points lack any reported inter-annotator agreement statistics (e.g., Cohen's kappa or percentage agreement) or an explicit annotation protocol describing annotator count, conflict resolution for ambiguous T2/T3 cases (figure digitization, unit normalization, physics derivations), or how single-annotator bias was mitigated. Because the headline accuracies (94.93% T1, 92.04% T2, 83.49% T3) are computed directly against this unreported reference, annotation noise could be conflated with workflow error, rendering the performance claims difficult to interpret.

minor comments (2)

[Abstract and Section 3] Abstract and Section 3: The claim of extracting '37 experimentally relevant fields per shot' is central but the fields are not enumerated or exemplified; adding a concise table or appendix listing the fields with their priority assignments and derivation rules would aid reproducibility and reader understanding.
[Section 4.3] Section 4.3 (Cross-model testing): The statement of 'strong agreement' for text/table and equation-derived fields is qualitative; providing quantitative metrics (e.g., pairwise agreement percentages or Cohen's kappa between models) would strengthen the evidence for robustness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address the single major comment point by point below, with a commitment to strengthen the manuscript where appropriate.

read point-by-point responses

Referee: [Section 4] Section 4 (Evaluation and Results): The ground-truth reference values for the 11,967 data points lack any reported inter-annotator agreement statistics (e.g., Cohen's kappa or percentage agreement) or an explicit annotation protocol describing annotator count, conflict resolution for ambiguous T2/T3 cases (figure digitization, unit normalization, physics derivations), or how single-annotator bias was mitigated. Because the headline accuracies (94.93% T1, 92.04% T2, 83.49% T3) are computed directly against this unreported reference, annotation noise could be conflated with workflow error, rendering the performance claims difficult to interpret.

Authors: We agree that the absence of a detailed annotation protocol and inter-annotator statistics in the current manuscript limits the interpretability of the reported accuracies. The ground truth was produced by a single domain-expert author following a tier-specific protocol (direct lookup for T1; equation verification against standard shock-physics relations for T2; calibrated manual digitization for T3), with unit normalization and physics-consistency checks applied uniformly. No multi-annotator process was performed, so metrics such as Cohen's kappa are unavailable. We will add a new subsection to Section 4 that explicitly describes (i) annotator qualifications and count, (ii) the step-by-step protocol for each tier including handling of ambiguous cases (e.g., figure scaling, derivation assumptions, unit conversion rules), (iii) the single-annotator bias mitigation steps that were used (source cross-referencing and post-extraction physics plausibility filters), and (iv) an explicit statement of this as a methodological limitation. These additions will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in workflow or evaluation

full rationale

The paper describes a hierarchical LLM extraction pipeline (T1 direct text/table extraction, T2 physics-based derivations from verified governing relations, T3 figure digitization) applied to 30 independent published articles. Reported accuracies (94.93% T1, 92.04% T2, 83.49% T3, 94.69% weighted) are computed against data points drawn from external literature using standard physics relations and manual verification protocols. No self-definitional loops, fitted parameters renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the methodology; the benchmark is constructed from separate papers and the derivation chain remains externally anchored rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method depends on general LLM capabilities and established shock-physics relations rather than new postulates or fitted constants.

axioms (1)

domain assumption Governing physical relations for spall strength are known, accurate, and applicable for derivation of missing values
Invoked for T2 priority derivations from verified governing relations.

pith-pipeline@v0.9.0 · 5626 in / 1230 out tokens · 52319 ms · 2026-05-10T17:30:25.788455+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

ResearchGate, Dec

Automating data extraction from scientific literature and general pdf files using llms and knime: An application in toxicology. ResearchGate, Dec. 2025. Preprint

work page 2025
[2]

F. H. Allen. (IUCr) The Cambridge Structural Database: a quarter of a million crystal structures and rising

work page
[3]

Ameri, H

A. Ameri, H. Wang, Z. Li, Z. Quadir, M. Gonzalez, P. J. Hazell, and J. P. Escobedo-Diaz. Spall strength dependence on peak stress and deformation history in lean duplex stainless steel 2101.Materials Science and Engineering: A, 831:142158, Jan. 2022

work page 2022
[4]

Ateia, U

S. Ateia, U. Kruschwitz, M. Scholz, A. Koschmider, and M. Almohaishi. Llm-based in- formation extraction to support scientific literature research and publication workflows. arXiv, Oct. 2025

work page 2025
[5]

Bauer, P

S. Bauer, P. Benner, T. Bereau, V. Blum, M. Boley, C. Carbogno, C. R. A. Catlow, G. Dehm, S. Eibl, R. Ernstorfer, ´A. Fekete, L. Foppa, P. Fratzl, C. Freysoldt, B. Gault, L. M. Ghiringhelli, S. K. Giri, A. Gladyshev, P. Goyal, J. Hattrick-Simpers, L. Kabalan, P. Karpov, M. S. Khorrami, C. T. Koch, S. Kokott, T. Kosch, I. Kowalec, K. Kremer, A. Leitherer, ...

work page 2024
[6]

Belsky, M

A. Belsky, M. Hellenbrandt, V. L. Karen, and P. Luksch. New developments in the In- organic Crystal Structure Database (ICSD): accessibility in support of materials research and design

work page
[7]

A. K. Boddorff, S. Jang, G. Kennedy, K. Taminger, and N. N. Thadhani. Spall failure of additively manufactured two-layered cu–ni bimetallic alloys.Journal of Applied Physics, 131(17):175901, May 2022

work page 2022
[8]

M. J. Buehler. MechGPT, a Language-Based Strategy for Mechanics and Materials Mod- eling That Connects Knowledge Across Scales, Disciplines, and Modalities.Applied Me- chanics Reviews, 76(021001), Jan. 2024

work page 2024
[9]

Chandrasekhar, J

A. Chandrasekhar, J. Chan, F. Ogoke, O. Ajenifujah, and A. Barati Farimani. AMGPT: A large language model for contextual querying in additive manufacturing.Additive Man- ufacturing Letters, 11:100232, Dec. 2024

work page 2024
[10]

Chandrasekhar, O

A. Chandrasekhar, O. B. Farimani, O. Ajenifujah, J. Ock, and A. B. Farimani. NANOGPT: a query-driven large language model retrieval-augmented generation system for nanotech- nology research.arXiv.org, 2025

work page 2025
[11]

S. Chen, X. Fan, B. Steingrimsson, Q. Xiong, W. Li, and P. K. Liaw. Fatigue dataset of high-entropy alloys.Scientific Data, 9(1):381, July 2022

work page 2022
[12]

X. Chen, J. R. Asay, S. K. Dwivedi, and D. P. Field. Spall behavior of aluminum with varying microstructures.Journal of Applied Physics, 99(2):023528, Jan. 2006. 23

work page 2006
[13]

Cheng, J

J. Cheng, J. Xu, X. Zhao, K. Shi, J. Li, Q. Zhang, J. Qiao, J. Huang, and S. Luo. Shock compression and spallation of a medium-entropy alloy fe40mn20cr20ni20.Materials Science and Engineering: A, 847:143311, July 2022

work page 2022
[14]

Choi and B

J. Choi and B. Lee. Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing.ACS Applied Materials & Interfaces, 16(2):1957–1968, Jan. 2024

work page 1957
[15]

Choudhary

K. Choudhary. Atomgpt: Atomistic generative pretrained transformer for forward and inverse materials design.The Journal of Physical Chemistry Letters, 15:6909–6917, 2024

work page 2024
[16]

Choudhary

K. Choudhary. MicroscopyGPT: Generating Atomic-Structure Captions from Microscopy Images of 2D Materials with Vision-Language Transformers.The Journal of Physical Chemistry Letters, 16(27):7028–7035, July 2025

work page 2025
[17]

T. G. Ciardi, A. Nihar, R. Chawla, O. Akanbi, P. K. Tripathi, Y. Wu, V. Chaudhary, and R. H. French. Materials data science using CRADLE: A distributed, data-centric approach. MRS Communications, 14(4):601–611, July 2024

work page 2024
[18]

Cotton, J

M. Cotton, J. Millett, G. Whiteman, and N. Park. Spall strength of niobium and molyb- denum. InSHOCK COMPRESSION OF CONDENSED MATTER - 2011: Proceedings of the Conference of the American Physical Society Topical Group on Shock Compression of Condensed Matter, pages 1031–1034, Chicago, Illinois, 2012

work page 2011
[19]

D. P. Dandekar and W. J. Weisgerber. Shock response of a heavy tungsten alloy

work page
[20]

Farbaniec, C

L. Farbaniec, C. Williams, L. Kecskes, R. Becker, and K. Ramesh. Spall response and failure mechanisms associated with a hot-extruded amx602 mg alloy.Materials Science and Engineering: A, 707:725–731, Nov. 2017

work page 2017
[21]

Farbaniec, C

L. Farbaniec, C. Williams, L. Kecskes, K. Ramesh, and R. Becker. Microstructural effects on the spall properties of ecae-processed az31b magnesium alloy.International Journal of Impact Engineering, 98:34–41, Dec. 2016

work page 2016
[22]

M. Fazeli. Evaluating the performance of claude 3.7 sonnet in data extraction automation for systematic literature reviews.Value in Health Regional Issues, 41:101539, 2025

work page 2025
[23]

S. J. Fensin, E. K. Walker, E. K. Cerreta, C. P. Trujillo, D. T. Martinez, and G. T. Gray. Dynamic failure in two-phase materials.Journal of Applied Physics, 118(23):235305, Dec. 2015

work page 2015
[24]

Foppiano, G

L. Foppiano, G. Lambard, T. Amagasa, and M. Ishii. Mining experimental data from materials science literature with large language models: an evaluation study.Science and Technology of Advanced Materials: Methods, 4(1):2356506, Dec. 2024

work page 2024
[25]

Gilligan, M

L. Gilligan, M. Cobelli, V. Taufour, and S. Savito. A rule-free workflow for the automated generation of databases from scientific literature.njp Computational Materials, 9:222, 2023

work page 2023
[26]

Gorsse, M

S. Gorsse, M. Goun´ e, W.-C. Lin, and L. Girard. Dataset of mechanical properties and electrical conductivity of copper-based alloys.Scientific Data, 10(1):504, July 2023

work page 2023
[27]

G. Gray, V. Livescu, P. Rigg, C. Trujillo, C. Cady, S. Chen, J. Carpenter, T. Lienert, and S. Fensin. Structure/property (constitutive and spallation response) of additively manufactured 316l stainless steel.Acta Materialia, 138:140–149, Oct. 2017. 24

work page 2017
[28]

Gupta, M

T. Gupta, M. Zaki, N. Anoop Krishnan, and Mausam. Matscibert: A materials domain language model for text mining and information extraction.njp Computational Materials, 8:102, 2022

work page 2022
[29]

Gupta, M

T. Gupta, M. Zaki, D. Khatsuriya, K. Hira, N. Anoop Krishnan, and Mausam. Discomat: Distantly supervised composition extraction from tables in materials science articles. InIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, volume 1, pages 13465–13483, 2023

work page 2023
[30]

J. Han, H. Ji, and Y. Sun. Successful Data Mining Methods for NLP. Jan. 2015

work page 2015
[31]

Hawkins, S

M. Hawkins, S. Thomas, R. Hixson, J. Gigax, N. Li, C. Liu, J. Valdez, and S. Fensin. Dynamic properties of fecrmnni, a high entropy alloy.Materials Science and Engineering: A, 840:142906, Apr. 2022

work page 2022
[32]

Hillel, L

G. Hillel, L. Meshi, S. Shimon, S. Kalabukhov, N. Frage, and E. Zaretsky. Shock wave study of precipitation hardening of beryllium copper.Materials Science and Engineering: A, 834:142599, Feb. 2022

work page 2022
[33]

H. Hu, H. J. Stirrat, A. Alayli, A. Saeki, and Y. Huang. Ai-powered workflow for construct- ing organic materials databases from the literature: Integrating large language models.ACS Omega, 10(42):49545–49556, Oct. 2025

work page 2025
[34]

Immanuel and A

J. Immanuel and A. Mahata. Enhancing materials data workflows through object-oriented design and large language models.Integrating Materials and Manufacturing Innovation, 14(4), Dec. 2025

work page 2025
[35]

Z. Jiao, Z. Li, F. Wu, Q. Wang, X. Li, L. Xu, L. Hu, Y. Liu, Y. Yu, C. Hu, and J. Hu. Phase transition, twinning, and spall damage of niti shape memory alloys under shock loading. Materials Science and Engineering: A, 869:144775, Mar. 2023

work page 2023
[36]

S. R. Kalidindi, D. B. Brough, S. Li, A. Cecen, A. L. Blekh, F. Y. P. Congo, and C. Camp- bell. Role of materials data science and informatics in accelerated materials innovation. MRS Bulletin, 41(08):596–602, Aug. 2016

work page 2016
[37]

G. I. Kanel, S. V. Razorenov, A. Bogatch, A. V. Utkin, V. E. Fortov, and D. E. Grady. Spall fracture properties of aluminum and magnesium at high temperatures.Journal of Applied Physics, 79(11):8310–8317, June 1996

work page 1996
[38]

Khalighinejad, S

G. Khalighinejad, S. Scott, O. Liu, K. Anderson, R. Stureborg, A. Tyagi, and B. Dhingra. Matvix: Multimodal information extraction from visually rich articles. InIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 3636–3655, 2025

work page 2025
[39]

C. Li, B. Li, J. Huang, H. Ma, M. Zhu, J. Zhu, and S. Luo. Spall damage of a mild carbon steel: Effects of peak stress, strain rate and pulse duration.Materials Science and Engineering: A, 660:139–147, Apr. 2016

work page 2016
[40]

C. Li, K. Yang, X. Tang, L. Lu, and S. Luo. Spall strength of a mild carbon steel: Effects of tensile stress history and shock-induced microstructure.Materials Science and Engineering: A, 754:461–469, Apr. 2019

work page 2019
[41]

Z. Li, Y. Yu, W. Gu, T. Zhu, H. Song, W. Guo, X. Yang, and Z. Zhu. Dual-llm adversarial framework for information extraction from research literature.bioRxiv, Sept. 2025. 25

work page 2025
[42]

S. Liu, T. R. Booth, Y. Ji, W. Reinhart, and P. V. Balachandran. Expert-grounded automatic prompt engineering for extracting lattice constants of high-entropy alloys from scientific publications using large language models.arXiv, Dec. 2025

work page 2025
[43]

Lu and J

Y. Lu and J. Li. Shock and spallation behavior of a compositionally complex high-strength low-alloy steel under different impact stresses.Applied Sciences, 13(6):3375, Mar. 2023

work page 2023
[44]

L. Ma, J. Liu, C. Li, Z. Zhong, L. Lu, and S. Luo. Effects of alloying element segregation bands on impact response of a 304 stainless steel.Materials Characterization, 153:294–303, July 2019

work page 2019
[45]

Millett, N

J. Millett, N. Bourne, and G. Gray. The behavior of ni, ni-60co, and ni3al during one- dimensional shock loading.Metall Mater Trans A, 39(2):322–334, Feb. 2008

work page 2008
[46]

C. Neel, S. Gibbons, R. Abrahams, and J. House. Shock and spall in the low-alloy steel af9628.J. dynamic behavior mater., 6(1):64–77, Mar. 2020

work page 2020
[47]

M. P. Polak and D. Morgan. Extracting accurate materials data from research papers with conversational language models and prompt engineering.Nature Communications, 15(1):1569, Feb. 2024

work page 2024
[48]

Rameshbabu, J

K. Rameshbabu, J. Luo, A. Shargh, K. A. El-Awady, and J. A. El-Awady. Supplemen- tary material for From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction, 2026. Supplementary material

work page 2026
[49]

Seshadri and T

R. Seshadri and T. D. Sparks. Perspective: Interactive material property databases through aggregation of literature data.APL Materials, 4(5):053206, Mar. 2016

work page 2016
[50]

Y. Shi, N. Rampal, and O. M. Yaghi. Comparison of llms in extracting synthesis conditions and generating q&a datasets for metal–organic frameworks.Digital Discovery, May 2025

work page 2025
[51]

Y. Si, W. Zhou, and J. Gai. Research and Implementation of Data Extraction Method Based on NLP. In2020 IEEE 14th International Conference on Anti-counterfeiting, Se- curity, and Identification (ASID), pages 11–15, Oct. 2020

work page 2020
[52]

P. Song, J. Liu, W. Li, and Y. Li. Plastic deformation behavior of a cu–10ta alloy under strong impact loading.Defence Technology, 32:368–382, Feb. 2024

work page 2024
[53]

S. A. Thomas, M. C. Hawkins, M. K. Matthes, G. T. Gray, and R. S. Hixson. Dynamic strength properties and alpha-phase shock hugoniot of iron and steel.Journal of Applied Physics, 123(17):175902, May 2018

work page 2018
[54]

Whelchel, T

R. Whelchel, T. Sanders, and N. Thadhani. Spall and dynamic yield behavior of an annealed aluminum–magnesium alloy.Scripta Materialia, 92:59–62, Dec. 2014

work page 2014
[55]

C. L. Williams, K. T. Ramesh, and D. P. Dandekar. Spall response of 1100-o aluminum. Journal of Applied Physics, 111(12):123528, June 2012

work page 2012
[56]

Z.-C. Xie, C. Li, H.-Y. Wang, C. Lu, and L.-H. Dai. Hydrogen induced slowdown of spallation in high entropy alloy under shock loading.International Journal of Plasticity, 139:102944, Apr. 2021

work page 2021
[57]

Y. Yang, S. Yang, and H. Wang. Effects of the phase content on dynamic damage evolution in fe50mn30co10cr10 high entropy alloy.Journal of Alloys and Compounds, 851:156883, Jan. 2021. 26

work page 2021
[58]

E. B. Zaretsky. Impact response of cobalt over the 300–1400 k temperature range.Journal of Applied Physics, 108(8):083525, Oct. 2010

work page 2010
[59]

E. B. Zaretsky, N. Frage, S. Kalabukhov, A. S. Savinykh, G. V. Garkushin, and S. V. Razorenov. Impact response of pre-strained pure vanadium.Journal of Applied Physics, 131(21):215905, June 2022

work page 2022
[60]

E. B. Zaretsky and G. I. Kanel. Plastic flow in shock-loaded silver at strain rates from 104 s-1 to 107 s-1 and temperatures from 296 k to 1233 k.Journal of Applied Physics, 110(7):073502, Oct. 2011

work page 2011
[61]

Zhang, J

N. Zhang, J. Xu, Z. Feng, Y. Sun, J. Huang, X. Zhao, X. Yao, S. Chen, L. Lu, and S. Luo. Shock compression and spallation damage of high-entropy alloy al0.1cocrfeni.Journal of Materials Science & Technology, 128:1–9, Nov. 2022. 27

work page 2022

[1] [1]

ResearchGate, Dec

Automating data extraction from scientific literature and general pdf files using llms and knime: An application in toxicology. ResearchGate, Dec. 2025. Preprint

work page 2025

[2] [2]

F. H. Allen. (IUCr) The Cambridge Structural Database: a quarter of a million crystal structures and rising

work page

[3] [3]

Ameri, H

A. Ameri, H. Wang, Z. Li, Z. Quadir, M. Gonzalez, P. J. Hazell, and J. P. Escobedo-Diaz. Spall strength dependence on peak stress and deformation history in lean duplex stainless steel 2101.Materials Science and Engineering: A, 831:142158, Jan. 2022

work page 2022

[4] [4]

Ateia, U

S. Ateia, U. Kruschwitz, M. Scholz, A. Koschmider, and M. Almohaishi. Llm-based in- formation extraction to support scientific literature research and publication workflows. arXiv, Oct. 2025

work page 2025

[5] [5]

Bauer, P

S. Bauer, P. Benner, T. Bereau, V. Blum, M. Boley, C. Carbogno, C. R. A. Catlow, G. Dehm, S. Eibl, R. Ernstorfer, ´A. Fekete, L. Foppa, P. Fratzl, C. Freysoldt, B. Gault, L. M. Ghiringhelli, S. K. Giri, A. Gladyshev, P. Goyal, J. Hattrick-Simpers, L. Kabalan, P. Karpov, M. S. Khorrami, C. T. Koch, S. Kokott, T. Kosch, I. Kowalec, K. Kremer, A. Leitherer, ...

work page 2024

[6] [6]

Belsky, M

A. Belsky, M. Hellenbrandt, V. L. Karen, and P. Luksch. New developments in the In- organic Crystal Structure Database (ICSD): accessibility in support of materials research and design

work page

[7] [7]

A. K. Boddorff, S. Jang, G. Kennedy, K. Taminger, and N. N. Thadhani. Spall failure of additively manufactured two-layered cu–ni bimetallic alloys.Journal of Applied Physics, 131(17):175901, May 2022

work page 2022

[8] [8]

M. J. Buehler. MechGPT, a Language-Based Strategy for Mechanics and Materials Mod- eling That Connects Knowledge Across Scales, Disciplines, and Modalities.Applied Me- chanics Reviews, 76(021001), Jan. 2024

work page 2024

[9] [9]

Chandrasekhar, J

A. Chandrasekhar, J. Chan, F. Ogoke, O. Ajenifujah, and A. Barati Farimani. AMGPT: A large language model for contextual querying in additive manufacturing.Additive Man- ufacturing Letters, 11:100232, Dec. 2024

work page 2024

[10] [10]

Chandrasekhar, O

A. Chandrasekhar, O. B. Farimani, O. Ajenifujah, J. Ock, and A. B. Farimani. NANOGPT: a query-driven large language model retrieval-augmented generation system for nanotech- nology research.arXiv.org, 2025

work page 2025

[11] [11]

S. Chen, X. Fan, B. Steingrimsson, Q. Xiong, W. Li, and P. K. Liaw. Fatigue dataset of high-entropy alloys.Scientific Data, 9(1):381, July 2022

work page 2022

[12] [12]

X. Chen, J. R. Asay, S. K. Dwivedi, and D. P. Field. Spall behavior of aluminum with varying microstructures.Journal of Applied Physics, 99(2):023528, Jan. 2006. 23

work page 2006

[13] [13]

Cheng, J

J. Cheng, J. Xu, X. Zhao, K. Shi, J. Li, Q. Zhang, J. Qiao, J. Huang, and S. Luo. Shock compression and spallation of a medium-entropy alloy fe40mn20cr20ni20.Materials Science and Engineering: A, 847:143311, July 2022

work page 2022

[14] [14]

Choi and B

J. Choi and B. Lee. Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing.ACS Applied Materials & Interfaces, 16(2):1957–1968, Jan. 2024

work page 1957

[15] [15]

Choudhary

K. Choudhary. Atomgpt: Atomistic generative pretrained transformer for forward and inverse materials design.The Journal of Physical Chemistry Letters, 15:6909–6917, 2024

work page 2024

[16] [16]

Choudhary

K. Choudhary. MicroscopyGPT: Generating Atomic-Structure Captions from Microscopy Images of 2D Materials with Vision-Language Transformers.The Journal of Physical Chemistry Letters, 16(27):7028–7035, July 2025

work page 2025

[17] [17]

T. G. Ciardi, A. Nihar, R. Chawla, O. Akanbi, P. K. Tripathi, Y. Wu, V. Chaudhary, and R. H. French. Materials data science using CRADLE: A distributed, data-centric approach. MRS Communications, 14(4):601–611, July 2024

work page 2024

[18] [18]

Cotton, J

M. Cotton, J. Millett, G. Whiteman, and N. Park. Spall strength of niobium and molyb- denum. InSHOCK COMPRESSION OF CONDENSED MATTER - 2011: Proceedings of the Conference of the American Physical Society Topical Group on Shock Compression of Condensed Matter, pages 1031–1034, Chicago, Illinois, 2012

work page 2011

[19] [19]

D. P. Dandekar and W. J. Weisgerber. Shock response of a heavy tungsten alloy

work page

[20] [20]

Farbaniec, C

L. Farbaniec, C. Williams, L. Kecskes, R. Becker, and K. Ramesh. Spall response and failure mechanisms associated with a hot-extruded amx602 mg alloy.Materials Science and Engineering: A, 707:725–731, Nov. 2017

work page 2017

[21] [21]

Farbaniec, C

L. Farbaniec, C. Williams, L. Kecskes, K. Ramesh, and R. Becker. Microstructural effects on the spall properties of ecae-processed az31b magnesium alloy.International Journal of Impact Engineering, 98:34–41, Dec. 2016

work page 2016

[22] [22]

M. Fazeli. Evaluating the performance of claude 3.7 sonnet in data extraction automation for systematic literature reviews.Value in Health Regional Issues, 41:101539, 2025

work page 2025

[23] [23]

S. J. Fensin, E. K. Walker, E. K. Cerreta, C. P. Trujillo, D. T. Martinez, and G. T. Gray. Dynamic failure in two-phase materials.Journal of Applied Physics, 118(23):235305, Dec. 2015

work page 2015

[24] [24]

Foppiano, G

L. Foppiano, G. Lambard, T. Amagasa, and M. Ishii. Mining experimental data from materials science literature with large language models: an evaluation study.Science and Technology of Advanced Materials: Methods, 4(1):2356506, Dec. 2024

work page 2024

[25] [25]

Gilligan, M

L. Gilligan, M. Cobelli, V. Taufour, and S. Savito. A rule-free workflow for the automated generation of databases from scientific literature.njp Computational Materials, 9:222, 2023

work page 2023

[26] [26]

Gorsse, M

S. Gorsse, M. Goun´ e, W.-C. Lin, and L. Girard. Dataset of mechanical properties and electrical conductivity of copper-based alloys.Scientific Data, 10(1):504, July 2023

work page 2023

[27] [27]

G. Gray, V. Livescu, P. Rigg, C. Trujillo, C. Cady, S. Chen, J. Carpenter, T. Lienert, and S. Fensin. Structure/property (constitutive and spallation response) of additively manufactured 316l stainless steel.Acta Materialia, 138:140–149, Oct. 2017. 24

work page 2017

[28] [28]

Gupta, M

T. Gupta, M. Zaki, N. Anoop Krishnan, and Mausam. Matscibert: A materials domain language model for text mining and information extraction.njp Computational Materials, 8:102, 2022

work page 2022

[29] [29]

Gupta, M

T. Gupta, M. Zaki, D. Khatsuriya, K. Hira, N. Anoop Krishnan, and Mausam. Discomat: Distantly supervised composition extraction from tables in materials science articles. InIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, volume 1, pages 13465–13483, 2023

work page 2023

[30] [30]

J. Han, H. Ji, and Y. Sun. Successful Data Mining Methods for NLP. Jan. 2015

work page 2015

[31] [31]

Hawkins, S

M. Hawkins, S. Thomas, R. Hixson, J. Gigax, N. Li, C. Liu, J. Valdez, and S. Fensin. Dynamic properties of fecrmnni, a high entropy alloy.Materials Science and Engineering: A, 840:142906, Apr. 2022

work page 2022

[32] [32]

Hillel, L

G. Hillel, L. Meshi, S. Shimon, S. Kalabukhov, N. Frage, and E. Zaretsky. Shock wave study of precipitation hardening of beryllium copper.Materials Science and Engineering: A, 834:142599, Feb. 2022

work page 2022

[33] [33]

H. Hu, H. J. Stirrat, A. Alayli, A. Saeki, and Y. Huang. Ai-powered workflow for construct- ing organic materials databases from the literature: Integrating large language models.ACS Omega, 10(42):49545–49556, Oct. 2025

work page 2025

[34] [34]

Immanuel and A

J. Immanuel and A. Mahata. Enhancing materials data workflows through object-oriented design and large language models.Integrating Materials and Manufacturing Innovation, 14(4), Dec. 2025

work page 2025

[35] [35]

Z. Jiao, Z. Li, F. Wu, Q. Wang, X. Li, L. Xu, L. Hu, Y. Liu, Y. Yu, C. Hu, and J. Hu. Phase transition, twinning, and spall damage of niti shape memory alloys under shock loading. Materials Science and Engineering: A, 869:144775, Mar. 2023

work page 2023

[36] [36]

S. R. Kalidindi, D. B. Brough, S. Li, A. Cecen, A. L. Blekh, F. Y. P. Congo, and C. Camp- bell. Role of materials data science and informatics in accelerated materials innovation. MRS Bulletin, 41(08):596–602, Aug. 2016

work page 2016

[37] [37]

G. I. Kanel, S. V. Razorenov, A. Bogatch, A. V. Utkin, V. E. Fortov, and D. E. Grady. Spall fracture properties of aluminum and magnesium at high temperatures.Journal of Applied Physics, 79(11):8310–8317, June 1996

work page 1996

[38] [38]

Khalighinejad, S

G. Khalighinejad, S. Scott, O. Liu, K. Anderson, R. Stureborg, A. Tyagi, and B. Dhingra. Matvix: Multimodal information extraction from visually rich articles. InIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 3636–3655, 2025

work page 2025

[39] [39]

C. Li, B. Li, J. Huang, H. Ma, M. Zhu, J. Zhu, and S. Luo. Spall damage of a mild carbon steel: Effects of peak stress, strain rate and pulse duration.Materials Science and Engineering: A, 660:139–147, Apr. 2016

work page 2016

[40] [40]

C. Li, K. Yang, X. Tang, L. Lu, and S. Luo. Spall strength of a mild carbon steel: Effects of tensile stress history and shock-induced microstructure.Materials Science and Engineering: A, 754:461–469, Apr. 2019

work page 2019

[41] [41]

Z. Li, Y. Yu, W. Gu, T. Zhu, H. Song, W. Guo, X. Yang, and Z. Zhu. Dual-llm adversarial framework for information extraction from research literature.bioRxiv, Sept. 2025. 25

work page 2025

[42] [42]

S. Liu, T. R. Booth, Y. Ji, W. Reinhart, and P. V. Balachandran. Expert-grounded automatic prompt engineering for extracting lattice constants of high-entropy alloys from scientific publications using large language models.arXiv, Dec. 2025

work page 2025

[43] [43]

Lu and J

Y. Lu and J. Li. Shock and spallation behavior of a compositionally complex high-strength low-alloy steel under different impact stresses.Applied Sciences, 13(6):3375, Mar. 2023

work page 2023

[44] [44]

L. Ma, J. Liu, C. Li, Z. Zhong, L. Lu, and S. Luo. Effects of alloying element segregation bands on impact response of a 304 stainless steel.Materials Characterization, 153:294–303, July 2019

work page 2019

[45] [45]

Millett, N

J. Millett, N. Bourne, and G. Gray. The behavior of ni, ni-60co, and ni3al during one- dimensional shock loading.Metall Mater Trans A, 39(2):322–334, Feb. 2008

work page 2008

[46] [46]

C. Neel, S. Gibbons, R. Abrahams, and J. House. Shock and spall in the low-alloy steel af9628.J. dynamic behavior mater., 6(1):64–77, Mar. 2020

work page 2020

[47] [47]

M. P. Polak and D. Morgan. Extracting accurate materials data from research papers with conversational language models and prompt engineering.Nature Communications, 15(1):1569, Feb. 2024

work page 2024

[48] [48]

Rameshbabu, J

K. Rameshbabu, J. Luo, A. Shargh, K. A. El-Awady, and J. A. El-Awady. Supplemen- tary material for From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction, 2026. Supplementary material

work page 2026

[49] [49]

Seshadri and T

R. Seshadri and T. D. Sparks. Perspective: Interactive material property databases through aggregation of literature data.APL Materials, 4(5):053206, Mar. 2016

work page 2016

[50] [50]

Y. Shi, N. Rampal, and O. M. Yaghi. Comparison of llms in extracting synthesis conditions and generating q&a datasets for metal–organic frameworks.Digital Discovery, May 2025

work page 2025

[51] [51]

Y. Si, W. Zhou, and J. Gai. Research and Implementation of Data Extraction Method Based on NLP. In2020 IEEE 14th International Conference on Anti-counterfeiting, Se- curity, and Identification (ASID), pages 11–15, Oct. 2020

work page 2020

[52] [52]

P. Song, J. Liu, W. Li, and Y. Li. Plastic deformation behavior of a cu–10ta alloy under strong impact loading.Defence Technology, 32:368–382, Feb. 2024

work page 2024

[53] [53]

S. A. Thomas, M. C. Hawkins, M. K. Matthes, G. T. Gray, and R. S. Hixson. Dynamic strength properties and alpha-phase shock hugoniot of iron and steel.Journal of Applied Physics, 123(17):175902, May 2018

work page 2018

[54] [54]

Whelchel, T

R. Whelchel, T. Sanders, and N. Thadhani. Spall and dynamic yield behavior of an annealed aluminum–magnesium alloy.Scripta Materialia, 92:59–62, Dec. 2014

work page 2014

[55] [55]

C. L. Williams, K. T. Ramesh, and D. P. Dandekar. Spall response of 1100-o aluminum. Journal of Applied Physics, 111(12):123528, June 2012

work page 2012

[56] [56]

Z.-C. Xie, C. Li, H.-Y. Wang, C. Lu, and L.-H. Dai. Hydrogen induced slowdown of spallation in high entropy alloy under shock loading.International Journal of Plasticity, 139:102944, Apr. 2021

work page 2021

[57] [57]

Y. Yang, S. Yang, and H. Wang. Effects of the phase content on dynamic damage evolution in fe50mn30co10cr10 high entropy alloy.Journal of Alloys and Compounds, 851:156883, Jan. 2021. 26

work page 2021

[58] [58]

E. B. Zaretsky. Impact response of cobalt over the 300–1400 k temperature range.Journal of Applied Physics, 108(8):083525, Oct. 2010

work page 2010

[59] [59]

E. B. Zaretsky, N. Frage, S. Kalabukhov, A. S. Savinykh, G. V. Garkushin, and S. V. Razorenov. Impact response of pre-strained pure vanadium.Journal of Applied Physics, 131(21):215905, June 2022

work page 2022

[60] [60]

E. B. Zaretsky and G. I. Kanel. Plastic flow in shock-loaded silver at strain rates from 104 s-1 to 107 s-1 and temperatures from 296 k to 1233 k.Journal of Applied Physics, 110(7):073502, Oct. 2011

work page 2011

[61] [61]

Zhang, J

N. Zhang, J. Xu, Z. Feng, Y. Sun, J. Huang, X. Zhao, X. Yao, S. Chen, L. Lu, and S. Luo. Shock compression and spallation damage of high-entropy alloy al0.1cocrfeni.Journal of Materials Science & Technology, 128:1–9, Nov. 2022. 27

work page 2022