arxiv: 2604.22938 · v1 · submitted 2026-04-24 · ❄️ cond-mat.mtrl-sci · cs.CL· cs.LG

Recognition: unknown

Large language model-enabled automated data extraction for concrete materials informatics

Zhanzhao Li , Kengran Yang , Qiyao He , Kai Gong

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:13 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.CLcs.LG

keywords large language modelsdata extractionmaterials informaticsconcretescientific literatureautomated pipelinemachine learning datasetsblended cement

0 comments

The pith

An LLM pipeline extracts nearly 9,000 high-quality concrete material records from over 27,000 papers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a pipeline that uses large language models to automatically read scientific papers on concrete and pull out structured information on compositions, processing steps, and measured properties. This addresses the long-standing shortage of large, usable experimental datasets that has slowed data-driven materials work. By showing the pipeline works across different language models and delivers an F1 score of 0.97, the authors demonstrate that thousands of records can be assembled in hours rather than months of manual labor. The resulting database is then used to train machine-learning models that perform better both on familiar materials and on ones never seen during training. A reader would care because the same approach could be applied to any materials domain where literature is abundant but structured data is scarce.

Core claim

The authors present a generalizable LLM-powered pipeline that extracts and structures materials data from unstructured literature, using concrete as a test case. The pipeline performs robustly across many LLMs, reaching an F1 score up to 0.97 on composition-process-property attributes. In one hour it produces nearly 9,000 high-quality records with more than 100 attributes from over 27,000 publications, forming the largest open laboratory database for blended cement concrete. Machine-learning tests confirm that larger, more diverse extracted datasets improve both in-distribution accuracy and out-of-distribution generalization.

What carries the argument

The LLM-powered pipeline that reads papers and outputs structured records on composition, process, and property attributes.

If this is right

Materials researchers can now build large, open experimental datasets in hours instead of years of manual curation.
Machine-learning models trained on these datasets show improved accuracy on both known and previously unseen concrete formulations.
The same pipeline can be reused in other materials domains without major redesign.
Scalable literature-to-data conversion becomes a practical route to the data infrastructures needed for materials informatics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to other text-heavy scientific fields where experimental results sit in journal articles rather than databases.
Downstream users may still need targeted human checks on the most critical attributes before using the data for safety-critical predictions.
Combining this extraction step with active learning loops could let models request additional literature on the materials they predict least accurately.

Load-bearing premise

That the records extracted by the language model are free of systematic errors, omissions, or biases that would mislead later machine-learning analyses.

What would settle it

A side-by-side comparison of the pipeline's output against a human-curated gold-standard set of several hundred papers, reporting exact agreement rates per attribute and any consistent patterns of omission.

read the original abstract

The promise of data-driven materials discovery remains constrained by the scarcity of large, high-quality, and accessible experimental datasets. Here, we introduce a generalizable large language model (LLM)-powered pipeline for automated extraction and structuring of materials data from unstructured scientific literature, using concrete materials as a representative and particularly challenging example. The pipeline exhibits robust performance across a broad range of LLMs and achieves an $F_1$ score of up to 0.97 for diverse composition--process--property attributes. Within one hour, it extracts nearly 9,000 high-quality records with over 100 attributes screened from more than 27,000 publications, enabling the construction of the largest open laboratory database for blended cement concrete. Machine learning analyses underscore the importance of large, diverse, and information-rich datasets for enhancing both in-distribution accuracy and out-of-distribution generalization to unseen materials. The proposed pipeline is readily adaptable to other materials domains and accelerates the development of scalable data infrastructures for materials informatics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper scales LLM extraction to 9k concrete records from 27k papers and shows larger datasets help ML generalization, but lacks validation specifics that would make the F1 claims reliable.

read the letter

The core advance here is taking existing LLM extraction methods and running them at real scale on concrete literature, which is notoriously inconsistent in how it describes mixes, processes, and properties. They pull nearly 9,000 structured records with over 100 attributes and then demonstrate that the bigger, more diverse set improves out-of-distribution prediction. That downstream result is the part that actually matters for materials informatics work, because data scarcity is the real bottleneck in this area.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces an LLM-powered pipeline for automated extraction and structuring of composition-process-property data from unstructured concrete materials literature. It reports robust performance across multiple LLMs with F1 scores up to 0.97, extraction of nearly 9,000 high-quality records (over 100 attributes) from >27,000 publications in under an hour, construction of the largest open lab database for blended cement concrete, and downstream ML experiments demonstrating improved in-distribution accuracy and out-of-distribution generalization from larger, more diverse datasets. The pipeline is presented as generalizable to other materials domains.

Significance. If the reported extraction quality holds under rigorous validation, the work would provide a scalable, domain-adaptable tool that directly addresses data scarcity in materials informatics. The scale of the extracted dataset and the explicit demonstration that larger/diverse data improves ML generalization are concrete strengths; the open release of the resulting database would further amplify impact. The approach could accelerate similar efforts in other subfields where literature is abundant but structured data is sparse.

major comments (3)

[§3 and §4] §3 (Methods) and §4 (Results): The F1 score of up to 0.97 is presented as the central performance metric, yet the manuscript provides no numerical size for the human validation set, no inter-annotator agreement statistic, and no breakdown of error types (e.g., omission of ambiguous w/c ratios or fly-ash replacement clauses). Without these quantities, it is impossible to assess whether the headline performance claim is robust against the known ambiguities in concrete literature.
[§4.2] §4.2 (Post-processing and filtering): The criteria used to select the final ~9,000 “high-quality” records from the raw LLM outputs are not specified (e.g., confidence thresholds, attribute completeness rules, or manual review fraction). This choice directly affects the claim that the extracted database is suitable for downstream ML analyses and must be documented with quantitative justification.
[§5] §5 (ML analyses): The out-of-distribution generalization experiments rely on the extracted records being unbiased; however, no sensitivity analysis is shown that quantifies how plausible systematic extraction errors (e.g., under-reporting of low-strength mixes) would propagate into the reported accuracy gains.

minor comments (2)

[Figure 2, Table 1] Figure 2 and Table 1: axis labels and legend entries use inconsistent abbreviations for attributes (e.g., “w/c” vs. “water-cement ratio”); standardize notation for readability.
[Abstract and §4] The abstract states “within one hour” but the main text does not report wall-clock time or hardware details for the 27k-paper run; add this information to support the scalability claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, indicating where revisions have been made to address the concerns.

read point-by-point responses

Referee: [§3 and §4] §3 (Methods) and §4 (Results): The F1 score of up to 0.97 is presented as the central performance metric, yet the manuscript provides no numerical size for the human validation set, no inter-annotator agreement statistic, and no breakdown of error types (e.g., omission of ambiguous w/c ratios or fly-ash replacement clauses). Without these quantities, it is impossible to assess whether the headline performance claim is robust against the known ambiguities in concrete literature.

Authors: We agree that these details are necessary for a complete assessment of our validation results. Although the validation process was described at a high level, the specific quantities were not reported. In the revised manuscript, we have added the size of the human validation set, the inter-annotator agreement statistic, and a breakdown of error types to §3 and §4. This includes discussion of how errors related to ambiguous clauses in the literature were handled. revision: yes
Referee: [§4.2] §4.2 (Post-processing and filtering): The criteria used to select the final ~9,000 “high-quality” records from the raw LLM outputs are not specified (e.g., confidence thresholds, attribute completeness rules, or manual review fraction). This choice directly affects the claim that the extracted database is suitable for downstream ML analyses and must be documented with quantitative justification.

Authors: We thank the referee for pointing this out. The selection criteria were applied but not fully detailed in the original submission. We have revised §4.2 to explicitly state the post-processing and filtering criteria, including any confidence thresholds, completeness requirements, and the extent of manual review, supported by quantitative metrics on how these choices impacted the final dataset. revision: yes
Referee: [§5] §5 (ML analyses): The out-of-distribution generalization experiments rely on the extracted records being unbiased; however, no sensitivity analysis is shown that quantifies how plausible systematic extraction errors (e.g., under-reporting of low-strength mixes) would propagate into the reported accuracy gains.

Authors: This is a valid concern regarding the robustness of our ML findings. While we believe the large scale of the dataset mitigates some biases, we have added a sensitivity analysis to §5 in the revised manuscript. This analysis simulates the effects of potential systematic errors in the extracted data and confirms that the improvements in out-of-distribution generalization remain significant. revision: yes

Circularity Check

0 steps flagged

No circularity detected; performance claims rest on external validation

full rationale

The paper describes an empirical LLM pipeline for literature data extraction and reports measured F1 scores and record counts. No equations, derivations, or self-referential definitions appear in the abstract or summary. Performance is stated as evaluated against ground truth rather than being forced by internal fits or self-citations. The central claims therefore remain independent of the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on the domain assumption that LLMs can reliably parse materials-science text into structured attributes without introducing systematic bias; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption Large language models can accurately extract structured composition-process-property data from unstructured concrete literature across diverse paper styles
Central to the pipeline's claimed robustness and F1 scores; invoked implicitly throughout the abstract.

pith-pipeline@v0.9.0 · 5478 in / 1271 out tokens · 21750 ms · 2026-05-08T11:13:39.565486+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

120 extracted references · 7 canonical work pages · 3 internal anchors

[1]

& Choudhary, A

Agrawal, A. & Choudhary, A. Perspective: Materials informatics and big data: Realization of the ”fourth paradigm” of science in materials science. APL Mater. 4 (2016)

2016
[2]

& Jacobs, R

Morgan, D. & Jacobs, R. Opportunities and Challenges for Machine Learning in Materials Science. Annu. Rev. Mater. Res . 50, 71–103 (2020)

2020
[3]

& Ramprasad, R

Batra, R., Song, L. & Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater . 6, 655–678 (2021)

2021
[4]

& Kim, C

Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: Recent applications and prospects. npj Comput. Mater . 3 (2017)

2017
[5]

Olivetti, E. A. et al. Data-driven materials research enabled by natural language processing and information extraction. Appl. Phys. Rev . 7 (2020)

2020
[6]

Schilling-Wilhelmi, M. et al. From Text to Insight: Large Language Models for Materials Science Data Extraction. Chem. Soc. Rev. 54, 1125–1150 (2025)

2025
[7]

Curtarolo, S. et al. AFLOW: An automatic framework for high-throughput materials discovery.Comput. Mater. Sci. 58, 218–226 (2012)

2012
[8]

E., Kirklin, S., Aykol, M., Meredig, B

Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). Jom 65, 1501–1509 (2013)

2013
[9]

Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1 (2013)

2013
[10]

& Scheffler, M

Draxl, C. & Scheffler, M. The NOMAD laboratory: From data sharing to artificial intelligence. JPhys Mater. 2 (2019)

2019
[11]

Choudhary, K. et al. The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. npj Comput. Mater . 6 (2020)

2020
[12]

Talirz, L. et al. Materials Cloud, a platform for open computational science. Sci. Data 7, 1–12 (2020)

2020
[13]

Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15 (2024)

2024
[14]

Kononova, O. et al. Opportunities and challenges of text mining in materials research. iScience 24, 102155 (2021)

2021
[15]

Swain, M. C. & Cole, J. M. ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. J. Chem. Inf. Model . 56, 1894–1904 (2016)

1904
[16]

Shetty, P. et al. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Comput. Mater . 9, 1–12 (2023)

2023
[17]

Jensen, Z. et al. A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction. ACS Cent. Sci . (2019). 16

2019
[18]

Kononova, O. et al. Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6, 1–11 (2019)

2019
[19]

Kim, E. et al. Machine-learned and codified synthesis parameters of oxide materials. Sci. data 4, 170127 (2017)

2017
[20]

Wang, W. et al. Automated pipeline for superalloy data by text mining. npj Comput. Mater . 8, 1–12 (2022)

2022
[21]

& Cole, J

Huang, S. & Cole, J. M. BatteryDataExtractor: battery-aware text-mining software embedded with BERT models. Chem. Sci. 13 (2022)

2022
[22]

Wilary, D. M. & Cole, J. M. ReactionDataExtractor: A Tool for Automated Extraction of Information from Chemical Reaction Schemes. J. Chem. Inf. Model . 61, 4962–4974 (2021)

2021
[23]

J., Isazawa, T., Elliott, S

Mavraˇ ci´ c, J., Court, C. J., Isazawa, T., Elliott, S. R. & Cole, J. M. ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science. J. Chem. Inf. Model . 61, 4280–4289 (2021)

2021
[24]

Gupta, T., Zaki, M., Krishnan, N. M. & Mausam. MatSciBERT: A materials domain language model for text mining and information extraction. npj Comput. Mater . 8, 1–11 (2022)

2022
[25]

Polak, M. P. & Morgan, D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nat. Commun. 15, 1–11 (2024)

2024
[26]

Jiang, X. et al. Applications of natural language processing and large language models in materials discovery. npj Comput. Mater . 11, 1–15 (2025)

2025
[27]

& Krishnan, N

Miret, S. & Krishnan, N. M. Enabling large language models for real-world materials discovery. Nat. Mach. Intell . 7, 991–998 (2025)

2025
[28]

Polak, M. P. et al. Flexible, model-agnostic method for materials data extraction from text using general purpose language models. Digit. Discov. 3, 1221–1235 (2024)

2024
[29]

Lee, S. et al. Data-driven analysis of text-mined seed-mediated syntheses of gold nanoparticles. Digit. Discov. (2024)

2024
[30]

& Anatole von Lilienfeld, O

Lee, S., Heinen, S., Khan, D. & Anatole von Lilienfeld, O. Autonomous data extraction from peer reviewed literature for training machine learning models of oxidation potentials. Mach. Learn. Sci. Technol. 5, 1–11 (2024)

2024
[31]

Zhang, Y. et al. GPTArticleExtractor: An automated workflow for magnetic material database construction. J. Magn. Magn. Mater . 597, 172001 (2024)

2024
[32]

& Moosavi, S

Ansari, M. & Moosavi, S. M. Agent-based Learning of Materials Datasets from Scientific Literature. Digit. Discov. 2607–2617 (2024)

2024
[33]

& Ramprasad, R

Gupta, S., Mahmood, A., Shetty, P., Adeboye, A. & Ramprasad, R. Data extraction from polymer literature using large language models. Commun. Mater. 5, 1–11 (2024)

2024
[34]

K., Knowles, T

Yang, Z., Yorke, S. K., Knowles, T. P. & Buehler, M. J. Learning the rules of peptide self-assembly through data mining with large language models. Sci. Adv. 11, 1–11 (2025)

2025
[35]

Rihm, S. D. et al. Extraction of chemical synthesis information using the World Avatar. Digit. Discov. (2025)

2025
[36]

Shi, Y. et al. Comparison of LLMs in extracting synthesis conditions and generating Q&A datasets for metal-organic frameworks. Digit. Discov. (2025)

2025
[37]

Wei, C. et al. Large Language Models Assisted Materials Development: Case of Predictive Analytics for Oxygen Evolution Reaction Catalysts of (Oxy)hydroxides. ACS Sustain. Chem. Eng . (2025)

2025
[38]

Zheng, Z., Zhang, O., Borgs, C., Chayes, J. T. & Yaghi, O. M. ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis. J. Am. Chem. Soc . 145, 18048–18062 (2023)

2023
[39]

& Todorovi´ c, M

Sipil¨ a, M., Mehryary, F., Pyysalo, S., Ginter, F. & Todorovi´ c, M. Question Answering models for information extraction from perovskite materials science literature. Commun. Mater. 6 (2025)

2025
[40]

Odobesku, R. et al. Agent-based multimodal information extraction for nanomaterials. npj Comput. Mater. 11, 1–11 (2025)

2025
[41]

Kang, Y. et al. Harnessing Large Language Models to Collect and Analyze Metal-Organic Framework Property Data Set. J. Am. Chem. Soc . 147, 3943–3958 (2025)

2025
[42]

& Zang, J

Itani, S., Zhang, Y. & Zang, J. Large Language Model-Driven Database for Thermoelectric Materials. Comput. Mater. Sci . 253, 113855 (2024)

2024
[43]

& Brinson, L

Circi, D., Khalighinejad, G., Chen, A., Dhingra, B. & Brinson, L. C. How Well Do Large Language Models Understand Tables in Materials Science? Integrating Mater. Manuf. Innov. 13, 669–687 (2024)

2024
[44]

Mahjoubi, S. et al. Data-driven material screening of secondary and natural cementitious precursors. 17 Commun. Mater. 6, 99 (2025)

2025
[45]

J., Miller, S

Monteiro, P. J., Miller, S. A. & Horvath, A. Towards sustainable concrete. Nat. Mater . 16, 698–699 (2017)

2017
[46]

Technology Roadmap - Low-Carbon Transition in the Cement Industry

International Energy Agency. Technology Roadmap - Low-Carbon Transition in the Cement Industry. Tech. Rep. (2018)

2018
[47]

A., Kasprzyk, J

DeRousseau, M. A., Kasprzyk, J. R. & Srubar, W. V. Computational design optimization of concrete mixtures: A review. Cem. Concr. Res . 109, 42–53 (2018)

2018
[48]

K., Casilio, J

Buffenbarger, J. K., Casilio, J. M., AzariJafari, H. & Szoke, S. S. Role of Mixture Overdesign in the Sustainability of Concrete: Current State and Future Perspective. ACI Mater. J . 120, 89–100 (2023)

2023
[49]

Pfeiffer, O. P. et al. Bayesian design of concrete with amortized Gaussian processes and multi-objective optimization. Cem. Concr. Res . 177, 107406 (2024)

2024
[50]

Li, Z. et al. Machine learning in concrete science: applications, challenges, and best practices. npj Comput. Mater. 8, 1–17 (2022)

2022
[51]

& Radlinska, A

Li, Z. & Radlinska, A. Artificial intelligence in concrete materials: A scientometric view. In Naser, M. Z. (ed.) Leveraging Artificial Intelligence in Engineering, Management, and Safety of Infrastructure , 161–183 (CRC Press, 2022)

2022
[52]

& Nehdi, M

Ben Chaabene, W., Flah, M. & Nehdi, M. L. Machine learning prediction of mechanical properties of concrete: Critical review. Constr. Build. Mater . 260, 119889 (2020)

2020
[53]

& Nehdi, M

Nunez, I., Marani, A., Flah, M. & Nehdi, M. L. Estimating compressive strength of modern concrete mixtures using computational intelligence : A systematic review. Constr. Build. Mater . 310, 125279 (2021)

2021
[54]

Yeh, I. C. Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28, 1797–1808 (1998)

1998
[55]

Concrete Compressive Strength

Yeh, I.-C. Concrete Compressive Strength [Dataset] (2007). URL https://doi.org/10.24432/C5PK67

work page doi:10.24432/c5pk67 2007
[56]

A., Hall, A., Pilon, L., Gupta, P

Young, B. A., Hall, A., Pilon, L., Gupta, P. & Sant, G. Can the compressive strength of concrete be estimated from knowledge of the mixture proportions?: New insights from statistical analysis and machine learning methods. Cem. Concr. Res . 115, 379–388 (2019)

2019
[57]

A., Laftchiev, E., Kasprzyk, J

DeRousseau, M. A., Laftchiev, E., Kasprzyk, J. R., Rajagopalan, B. & Srubar, W. V. A comparison of machine learning methods for predicting the compressive strength of field-placed concrete. Constr. Build. Mater. 228, 116661 (2019)

2019
[58]

Zhang, X., Akber, M. Z. & Zheng, W. Prediction of seven-day compressive strength of field concrete. Constr. Build. Mater . 305, 124604 (2021)

2021
[59]

& Elsen, J

Snellings, R., Mertens, G. & Elsen, J. Supplementary cementitious materials. Rev. Mineral. Geochem. 74, 211–278 (2012)

2012
[60]

C., Snellings, R

Juenger, M. C., Snellings, R. & Bernal, S. A. Supplementary cementitious materials: New sources, characterization, and performance insights. Cem. Concr. Res . 122, 257–273 (2019)

2019
[61]

& Skibsted, J

Snellings, R., Suraneni, P. & Skibsted, J. Future and emerging supplementary cementitious materials. Cem. Concr. Res . 171 (2023)

2023
[62]

211.1-91 Standard Practice for Selecting Proportions for Normal, Heavyweight, and Mass Concrete (Reapproved 2009) (2002)

ACI Committee 211. 211.1-91 Standard Practice for Selecting Proportions for Normal, Heavyweight, and Mass Concrete (Reapproved 2009) (2002)

2009
[63]

& Foster, I

Hong, Z., Ward, L., Chard, K., Blaiszik, B. & Foster, I. Challenges and Advances in Information Extraction from Scientific Literature: a Review. Jom 73, 3383–3400 (2021)

2021
[64]

& Cole, J

Zhu, M. & Cole, J. M. PDFDataExtractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format. J. Chem. Inf. Model . 62, 1633–1643 (2022)

2022
[65]

System Card: Claude Sonnet 4.5

Anthropic. System Card: Claude Sonnet 4.5. Tech. Rep. September (2025)

2025
[66]

GPT-4o System Card

OpenAI. GPT-4o System Card. Tech. Rep. (2024)

2024
[67]

Jiang, Y. et al. Prediction of time-dependent concrete mechanical properties based on advanced deep learning models considering complex variables. Case Stud. Constr. Mater . 21, e03629 (2024)

2024
[68]

Imran, M., Khushnood, R. A. & Fawad, M. A hybrid data-driven and metaheuristic optimization approach for the compressive strength prediction of high-performance concrete. Case Stud. Constr. Mater. 18, e01890 (2023)

2023
[69]

Liu, X., Mei, S., Wang, X. & Li, X. Estimation of compressive strength of concrete with manufactured sand and natural sand using interpretable artificial intelligence. Case Stud. Constr. Mater . 21, e03840 18 (2024)

2024
[70]

& Behnood, A

Mohammadi Golafshani, E., Arashpour, M. & Behnood, A. Predicting the compressive strength of green concretes using Harris hawks optimization-based data-driven methods. Constr. Build. Mater . 318, 125944 (2022)

2022
[71]

M., Behnood, A

Golafshani, E. M., Behnood, A. & Arashpour, M. Predicting the compressive strength of normal and High-Performance Concretes using ANN and ANFIS hybridized with Grey Wolf Optimizer. Constr. Build. Mater. 232, 117266 (2020)

2020
[72]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint at http://arxiv.org/abs/1802.03426 (2020)

work page internal anchor Pith review arXiv 2020
[73]

Li, Z. et al. Can domain knowledge benefit machine learning for concrete property prediction? J. Am. Ceram. Soc. 107, 1582–1602 (2024)

2024
[74]

& Guestrin, C

Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785–794 (2016)

2016
[75]

Random forests

Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001)

2001
[76]

Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 3147–3155 (2017)

2017
[77]

E., Hinton, G

Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

1986
[78]

& Vapnik, V

Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995)

1995
[79]

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems , 4768–4777 (2017)

2017
[80]

& Visintin, P

Xie, T. & Visintin, P. A unified approach for mix design of concrete containing supplementary cementitious materials based on reactivity moduli. J. Clean. Prod. 203, 68–82 (2018)

2018

Showing first 80 references.