From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction
Pith reviewed 2026-05-10 17:30 UTC · model grok-4.3
The pith
A priority-based LLM workflow extracts structured shock-physics data from full-text papers by combining direct text extraction, physics derivations, and figure digitization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The workflow targets 37 experimentally relevant fields per shot and reconstructs complete records by routing each field through a three-level priority strategy: T1 for direct extraction from text and tables, T2 for physics-based derivation using governing relations when direct values are missing, and T3 for digitization from figures when needed. Extracted values are normalized to canonical units, tagged by priority for traceability, and validated through physics-based consistency and plausibility checks, yielding priority-wise accuracies of 94.93 percent (T1), 92.04 percent (T2), and 83.49 percent (T3) with an overall weighted accuracy of 94.69 percent across 11,967 evaluated data points in
What carries the argument
The three-level priority strategy (T1 direct text/table extraction, T2 physics-based derivation from verified governing relations, T3 figure digitization) integrated with unit normalization and physics consistency checks.
Load-bearing premise
The LLM can reliably interpret and apply physics equations for derivations and accurately read values from figures, while the consistency checks catch all remaining errors.
What would settle it
Running the workflow on a fresh set of 30 papers from the same domain and finding that overall weighted accuracy drops below 90 percent or that a substantial fraction of derived or digitized values fail the physics plausibility checks would falsify the reliability claim.
Figures
read the original abstract
Scientific data are widely dispersed across research articles and are often reported inconsistently across text, tables, and figures, making manual data extraction and aggregation slow and error-prone. We present a prompt-driven, hierarchical workflow that uses a large language model (LLM) to automatically extract and reconstruct structured, shot-level shock-physics experimental records by integrating information distributed across text, tables, figures, and physics-based derivations from full-text published research articles, using alloy spall strength as a representative case study. The pipeline targeted 37 experimentally relevant fields per shot and applied a three-level priority strategy: (T1) direct extraction from text/tables, (T2) physics-based derivation using verified governing relations, and (T3) digitization from figures when necessary. Extracted values were normalized to canonical units, tagged by priority for traceability, and validated with physics-based consistency and plausibility checks. Evaluated on a benchmark of 30 published research articles comprising 11,967 evaluated data points, the workflow achieved high overall accuracy, with priority-wise accuracies of 94.93% (T1), 92.04% (T2), and 83.49% (T3), and an overall weighted accuracy of 94.69%. Cross-model testing further indicated strong agreement for text/table and equation-derived fields, with lower agreement for figure-based extraction. Implementation through an API interface demonstrated the scalability of the approach, achieving consistent extraction performance and, in a subset of test cases, matching or exceeding chat-based accuracy. This workflow demonstrates a practical approach for converting unstructured technical literature into traceable, analysis-ready datasets without task-specific fine-tuning, enabling scalable database construction in materials science.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes a prompt-driven, hierarchical LLM workflow for extracting structured shock-physics experimental data (alloy spall strength case study) from full-text papers. It integrates direct text/table extraction (T1), physics-based derivations from governing relations (T2), and figure digitization (T3) across 37 fields per shot, with unit normalization, priority tagging, and physics consistency checks. Evaluated on 30 published papers (11,967 data points), the workflow reports accuracies of 94.93% (T1), 92.04% (T2), 83.49% (T3), and 94.69% weighted overall, plus cross-model agreement and API scalability results.
Significance. If the evaluation holds after addressing ground-truth validation, the work would offer a practical, scalable method for converting dispersed materials literature into traceable property tables without fine-tuning. The priority hierarchy combined with physics-based checks and traceability tagging is a constructive approach to mitigating LLM hallucinations in technical domains. The API demonstration and cross-model tests provide evidence of robustness and deployability, which could accelerate database construction in materials science if the benchmark proves representative.
major comments (1)
- [Section 4] Section 4 (Evaluation and Results): The ground-truth reference values for the 11,967 data points lack any reported inter-annotator agreement statistics (e.g., Cohen's kappa or percentage agreement) or an explicit annotation protocol describing annotator count, conflict resolution for ambiguous T2/T3 cases (figure digitization, unit normalization, physics derivations), or how single-annotator bias was mitigated. Because the headline accuracies (94.93% T1, 92.04% T2, 83.49% T3) are computed directly against this unreported reference, annotation noise could be conflated with workflow error, rendering the performance claims difficult to interpret.
minor comments (2)
- [Abstract and Section 3] Abstract and Section 3: The claim of extracting '37 experimentally relevant fields per shot' is central but the fields are not enumerated or exemplified; adding a concise table or appendix listing the fields with their priority assignments and derivation rules would aid reproducibility and reader understanding.
- [Section 4.3] Section 4.3 (Cross-model testing): The statement of 'strong agreement' for text/table and equation-derived fields is qualitative; providing quantitative metrics (e.g., pairwise agreement percentages or Cohen's kappa between models) would strengthen the evidence for robustness.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address the single major comment point by point below, with a commitment to strengthen the manuscript where appropriate.
read point-by-point responses
-
Referee: [Section 4] Section 4 (Evaluation and Results): The ground-truth reference values for the 11,967 data points lack any reported inter-annotator agreement statistics (e.g., Cohen's kappa or percentage agreement) or an explicit annotation protocol describing annotator count, conflict resolution for ambiguous T2/T3 cases (figure digitization, unit normalization, physics derivations), or how single-annotator bias was mitigated. Because the headline accuracies (94.93% T1, 92.04% T2, 83.49% T3) are computed directly against this unreported reference, annotation noise could be conflated with workflow error, rendering the performance claims difficult to interpret.
Authors: We agree that the absence of a detailed annotation protocol and inter-annotator statistics in the current manuscript limits the interpretability of the reported accuracies. The ground truth was produced by a single domain-expert author following a tier-specific protocol (direct lookup for T1; equation verification against standard shock-physics relations for T2; calibrated manual digitization for T3), with unit normalization and physics-consistency checks applied uniformly. No multi-annotator process was performed, so metrics such as Cohen's kappa are unavailable. We will add a new subsection to Section 4 that explicitly describes (i) annotator qualifications and count, (ii) the step-by-step protocol for each tier including handling of ambiguous cases (e.g., figure scaling, derivation assumptions, unit conversion rules), (iii) the single-annotator bias mitigation steps that were used (source cross-referencing and post-extraction physics plausibility filters), and (iv) an explicit statement of this as a methodological limitation. These additions will be incorporated in the revised manuscript. revision: yes
Circularity Check
No significant circularity in workflow or evaluation
full rationale
The paper describes a hierarchical LLM extraction pipeline (T1 direct text/table extraction, T2 physics-based derivations from verified governing relations, T3 figure digitization) applied to 30 independent published articles. Reported accuracies (94.93% T1, 92.04% T2, 83.49% T3, 94.69% weighted) are computed against data points drawn from external literature using standard physics relations and manual verification protocols. No self-definitional loops, fitted parameters renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the methodology; the benchmark is constructed from separate papers and the derivation chain remains externally anchored rather than reducing to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Governing physical relations for spall strength are known, accurate, and applicable for derivation of missing values
Reference graph
Works this paper leans on
-
[1]
Automating data extraction from scientific literature and general pdf files using llms and knime: An application in toxicology. ResearchGate, Dec. 2025. Preprint
work page 2025
-
[2]
F. H. Allen. (IUCr) The Cambridge Structural Database: a quarter of a million crystal structures and rising
- [3]
- [4]
-
[5]
S. Bauer, P. Benner, T. Bereau, V. Blum, M. Boley, C. Carbogno, C. R. A. Catlow, G. Dehm, S. Eibl, R. Ernstorfer, ´A. Fekete, L. Foppa, P. Fratzl, C. Freysoldt, B. Gault, L. M. Ghiringhelli, S. K. Giri, A. Gladyshev, P. Goyal, J. Hattrick-Simpers, L. Kabalan, P. Karpov, M. S. Khorrami, C. T. Koch, S. Kokott, T. Kosch, I. Kowalec, K. Kremer, A. Leitherer, ...
work page 2024
- [6]
-
[7]
A. K. Boddorff, S. Jang, G. Kennedy, K. Taminger, and N. N. Thadhani. Spall failure of additively manufactured two-layered cu–ni bimetallic alloys.Journal of Applied Physics, 131(17):175901, May 2022
work page 2022
-
[8]
M. J. Buehler. MechGPT, a Language-Based Strategy for Mechanics and Materials Mod- eling That Connects Knowledge Across Scales, Disciplines, and Modalities.Applied Me- chanics Reviews, 76(021001), Jan. 2024
work page 2024
-
[9]
A. Chandrasekhar, J. Chan, F. Ogoke, O. Ajenifujah, and A. Barati Farimani. AMGPT: A large language model for contextual querying in additive manufacturing.Additive Man- ufacturing Letters, 11:100232, Dec. 2024
work page 2024
-
[10]
A. Chandrasekhar, O. B. Farimani, O. Ajenifujah, J. Ock, and A. B. Farimani. NANOGPT: a query-driven large language model retrieval-augmented generation system for nanotech- nology research.arXiv.org, 2025
work page 2025
-
[11]
S. Chen, X. Fan, B. Steingrimsson, Q. Xiong, W. Li, and P. K. Liaw. Fatigue dataset of high-entropy alloys.Scientific Data, 9(1):381, July 2022
work page 2022
-
[12]
X. Chen, J. R. Asay, S. K. Dwivedi, and D. P. Field. Spall behavior of aluminum with varying microstructures.Journal of Applied Physics, 99(2):023528, Jan. 2006. 23
work page 2006
- [13]
-
[14]
J. Choi and B. Lee. Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing.ACS Applied Materials & Interfaces, 16(2):1957–1968, Jan. 2024
work page 1957
- [15]
- [16]
-
[17]
T. G. Ciardi, A. Nihar, R. Chawla, O. Akanbi, P. K. Tripathi, Y. Wu, V. Chaudhary, and R. H. French. Materials data science using CRADLE: A distributed, data-centric approach. MRS Communications, 14(4):601–611, July 2024
work page 2024
-
[18]
M. Cotton, J. Millett, G. Whiteman, and N. Park. Spall strength of niobium and molyb- denum. InSHOCK COMPRESSION OF CONDENSED MATTER - 2011: Proceedings of the Conference of the American Physical Society Topical Group on Shock Compression of Condensed Matter, pages 1031–1034, Chicago, Illinois, 2012
work page 2011
-
[19]
D. P. Dandekar and W. J. Weisgerber. Shock response of a heavy tungsten alloy
-
[20]
L. Farbaniec, C. Williams, L. Kecskes, R. Becker, and K. Ramesh. Spall response and failure mechanisms associated with a hot-extruded amx602 mg alloy.Materials Science and Engineering: A, 707:725–731, Nov. 2017
work page 2017
-
[21]
L. Farbaniec, C. Williams, L. Kecskes, K. Ramesh, and R. Becker. Microstructural effects on the spall properties of ecae-processed az31b magnesium alloy.International Journal of Impact Engineering, 98:34–41, Dec. 2016
work page 2016
-
[22]
M. Fazeli. Evaluating the performance of claude 3.7 sonnet in data extraction automation for systematic literature reviews.Value in Health Regional Issues, 41:101539, 2025
work page 2025
-
[23]
S. J. Fensin, E. K. Walker, E. K. Cerreta, C. P. Trujillo, D. T. Martinez, and G. T. Gray. Dynamic failure in two-phase materials.Journal of Applied Physics, 118(23):235305, Dec. 2015
work page 2015
-
[24]
L. Foppiano, G. Lambard, T. Amagasa, and M. Ishii. Mining experimental data from materials science literature with large language models: an evaluation study.Science and Technology of Advanced Materials: Methods, 4(1):2356506, Dec. 2024
work page 2024
-
[25]
L. Gilligan, M. Cobelli, V. Taufour, and S. Savito. A rule-free workflow for the automated generation of databases from scientific literature.njp Computational Materials, 9:222, 2023
work page 2023
- [26]
-
[27]
G. Gray, V. Livescu, P. Rigg, C. Trujillo, C. Cady, S. Chen, J. Carpenter, T. Lienert, and S. Fensin. Structure/property (constitutive and spallation response) of additively manufactured 316l stainless steel.Acta Materialia, 138:140–149, Oct. 2017. 24
work page 2017
- [28]
-
[29]
T. Gupta, M. Zaki, D. Khatsuriya, K. Hira, N. Anoop Krishnan, and Mausam. Discomat: Distantly supervised composition extraction from tables in materials science articles. InIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, volume 1, pages 13465–13483, 2023
work page 2023
-
[30]
J. Han, H. Ji, and Y. Sun. Successful Data Mining Methods for NLP. Jan. 2015
work page 2015
-
[31]
M. Hawkins, S. Thomas, R. Hixson, J. Gigax, N. Li, C. Liu, J. Valdez, and S. Fensin. Dynamic properties of fecrmnni, a high entropy alloy.Materials Science and Engineering: A, 840:142906, Apr. 2022
work page 2022
- [32]
-
[33]
H. Hu, H. J. Stirrat, A. Alayli, A. Saeki, and Y. Huang. Ai-powered workflow for construct- ing organic materials databases from the literature: Integrating large language models.ACS Omega, 10(42):49545–49556, Oct. 2025
work page 2025
-
[34]
J. Immanuel and A. Mahata. Enhancing materials data workflows through object-oriented design and large language models.Integrating Materials and Manufacturing Innovation, 14(4), Dec. 2025
work page 2025
-
[35]
Z. Jiao, Z. Li, F. Wu, Q. Wang, X. Li, L. Xu, L. Hu, Y. Liu, Y. Yu, C. Hu, and J. Hu. Phase transition, twinning, and spall damage of niti shape memory alloys under shock loading. Materials Science and Engineering: A, 869:144775, Mar. 2023
work page 2023
-
[36]
S. R. Kalidindi, D. B. Brough, S. Li, A. Cecen, A. L. Blekh, F. Y. P. Congo, and C. Camp- bell. Role of materials data science and informatics in accelerated materials innovation. MRS Bulletin, 41(08):596–602, Aug. 2016
work page 2016
-
[37]
G. I. Kanel, S. V. Razorenov, A. Bogatch, A. V. Utkin, V. E. Fortov, and D. E. Grady. Spall fracture properties of aluminum and magnesium at high temperatures.Journal of Applied Physics, 79(11):8310–8317, June 1996
work page 1996
-
[38]
G. Khalighinejad, S. Scott, O. Liu, K. Anderson, R. Stureborg, A. Tyagi, and B. Dhingra. Matvix: Multimodal information extraction from visually rich articles. InIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 3636–3655, 2025
work page 2025
-
[39]
C. Li, B. Li, J. Huang, H. Ma, M. Zhu, J. Zhu, and S. Luo. Spall damage of a mild carbon steel: Effects of peak stress, strain rate and pulse duration.Materials Science and Engineering: A, 660:139–147, Apr. 2016
work page 2016
-
[40]
C. Li, K. Yang, X. Tang, L. Lu, and S. Luo. Spall strength of a mild carbon steel: Effects of tensile stress history and shock-induced microstructure.Materials Science and Engineering: A, 754:461–469, Apr. 2019
work page 2019
-
[41]
Z. Li, Y. Yu, W. Gu, T. Zhu, H. Song, W. Guo, X. Yang, and Z. Zhu. Dual-llm adversarial framework for information extraction from research literature.bioRxiv, Sept. 2025. 25
work page 2025
-
[42]
S. Liu, T. R. Booth, Y. Ji, W. Reinhart, and P. V. Balachandran. Expert-grounded automatic prompt engineering for extracting lattice constants of high-entropy alloys from scientific publications using large language models.arXiv, Dec. 2025
work page 2025
- [43]
-
[44]
L. Ma, J. Liu, C. Li, Z. Zhong, L. Lu, and S. Luo. Effects of alloying element segregation bands on impact response of a 304 stainless steel.Materials Characterization, 153:294–303, July 2019
work page 2019
-
[45]
J. Millett, N. Bourne, and G. Gray. The behavior of ni, ni-60co, and ni3al during one- dimensional shock loading.Metall Mater Trans A, 39(2):322–334, Feb. 2008
work page 2008
-
[46]
C. Neel, S. Gibbons, R. Abrahams, and J. House. Shock and spall in the low-alloy steel af9628.J. dynamic behavior mater., 6(1):64–77, Mar. 2020
work page 2020
-
[47]
M. P. Polak and D. Morgan. Extracting accurate materials data from research papers with conversational language models and prompt engineering.Nature Communications, 15(1):1569, Feb. 2024
work page 2024
-
[48]
K. Rameshbabu, J. Luo, A. Shargh, K. A. El-Awady, and J. A. El-Awady. Supplemen- tary material for From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction, 2026. Supplementary material
work page 2026
-
[49]
R. Seshadri and T. D. Sparks. Perspective: Interactive material property databases through aggregation of literature data.APL Materials, 4(5):053206, Mar. 2016
work page 2016
-
[50]
Y. Shi, N. Rampal, and O. M. Yaghi. Comparison of llms in extracting synthesis conditions and generating q&a datasets for metal–organic frameworks.Digital Discovery, May 2025
work page 2025
-
[51]
Y. Si, W. Zhou, and J. Gai. Research and Implementation of Data Extraction Method Based on NLP. In2020 IEEE 14th International Conference on Anti-counterfeiting, Se- curity, and Identification (ASID), pages 11–15, Oct. 2020
work page 2020
-
[52]
P. Song, J. Liu, W. Li, and Y. Li. Plastic deformation behavior of a cu–10ta alloy under strong impact loading.Defence Technology, 32:368–382, Feb. 2024
work page 2024
-
[53]
S. A. Thomas, M. C. Hawkins, M. K. Matthes, G. T. Gray, and R. S. Hixson. Dynamic strength properties and alpha-phase shock hugoniot of iron and steel.Journal of Applied Physics, 123(17):175902, May 2018
work page 2018
-
[54]
R. Whelchel, T. Sanders, and N. Thadhani. Spall and dynamic yield behavior of an annealed aluminum–magnesium alloy.Scripta Materialia, 92:59–62, Dec. 2014
work page 2014
-
[55]
C. L. Williams, K. T. Ramesh, and D. P. Dandekar. Spall response of 1100-o aluminum. Journal of Applied Physics, 111(12):123528, June 2012
work page 2012
-
[56]
Z.-C. Xie, C. Li, H.-Y. Wang, C. Lu, and L.-H. Dai. Hydrogen induced slowdown of spallation in high entropy alloy under shock loading.International Journal of Plasticity, 139:102944, Apr. 2021
work page 2021
-
[57]
Y. Yang, S. Yang, and H. Wang. Effects of the phase content on dynamic damage evolution in fe50mn30co10cr10 high entropy alloy.Journal of Alloys and Compounds, 851:156883, Jan. 2021. 26
work page 2021
-
[58]
E. B. Zaretsky. Impact response of cobalt over the 300–1400 k temperature range.Journal of Applied Physics, 108(8):083525, Oct. 2010
work page 2010
-
[59]
E. B. Zaretsky, N. Frage, S. Kalabukhov, A. S. Savinykh, G. V. Garkushin, and S. V. Razorenov. Impact response of pre-strained pure vanadium.Journal of Applied Physics, 131(21):215905, June 2022
work page 2022
-
[60]
E. B. Zaretsky and G. I. Kanel. Plastic flow in shock-loaded silver at strain rates from 104 s-1 to 107 s-1 and temperatures from 296 k to 1233 k.Journal of Applied Physics, 110(7):073502, Oct. 2011
work page 2011
- [61]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.