LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong; Jorge Colindres

arxiv: 2604.07649 · v4 · pith:Y4AG2MRSnew · submitted 2026-04-08 · 💻 cs.IR

LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong , Jorge Colindres This is my paper

Pith reviewed 2026-05-21 09:05 UTC · model grok-4.3

classification 💻 cs.IR

keywords benchmarkinformation extractionscientific literaturematerials sciencelarge language modelsexperimental datadata aggregation

0 comments

The pith

Frontier language models extract full experiments from papers more accurately than existing pipelines by correctly linking measurements to processing steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LitXBench as a new way to test how well automated methods can pull complete experimental records out of scientific articles rather than isolated property values. It supplies LitXAlloy, a collection of 1426 measurements drawn from 19 alloy papers, with each entry kept as a Python object so that errors can be checked and the data can be validated by code. When frontier models are compared against current multi-turn extraction systems on this benchmark, the models reach up to 0.37 higher F1 score. The authors trace the difference to the fact that pipelines tend to attach measurements only to a material's composition while the models also capture the processing steps that actually define the material.

Core claim

Frontier language models such as Gemini 3.1 Pro Preview outperform existing multi-turn extraction pipelines by up to 0.37 F1 on the LitXAlloy benchmark of 1426 measurements from 19 alloy papers. The performance gap occurs because extraction pipelines associate measurements with compositions rather than the processing steps that define a material.

What carries the argument

LitXBench, a benchmarking framework that stores experimental measurements as Python objects instead of text formats to support auditability and programmatic validation, applied to the LitXAlloy dataset.

If this is right

Aggregating complete experimental records enables materials scientists to train stronger property-prediction models.
Python-object storage makes it possible to run automatic checks and audits that CSV or JSON formats cannot support as easily.
Extraction systems must explicitly track processing steps if they are to match the accuracy of frontier language models.
The benchmark supplies a concrete testbed for developing and comparing future literature-extraction tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same benchmark style could be applied to other domains such as biology or chemistry to reveal whether the same performance pattern holds.
Existing pipelines could be revised to track processing steps explicitly, which might narrow the gap with language models without requiring full model replacement.
Large-scale use of this extraction approach would support the rapid assembly of structured experimental datasets for training scientific AI systems.

Load-bearing premise

The 19 alloy papers and 1426 measurements form a representative sample of experimental literature and that F1 score plus Python-object storage adequately measures extraction quality and usefulness.

What would settle it

Evaluating the same models and pipelines on a new collection of papers from a different scientific domain and finding either no performance advantage for language models or a different explanation for any observed gap.

Figures

Figures reproduced from arXiv: 2604.07649 by Curtis Chong, Jorge Colindres.

**Figure 1.** Figure 1: Pareto front of experiment extraction methods. Therefore, a more practical approach is to mine experiments from literature, as researchers can control the amount and fidelity of data acquired. Although manually aggregated datasets exist, they are impractical to scale, as for example, the OBELiX (Therrien et al., 2026) and MPEA (Borg et al., 2020) datasets contain only ∼600 and 1545 entries, respectively. … view at source ↗

**Figure 2.** Figure 2: LitXBench Principles for Accurate Extraction and Benchmarking. (1) To accurately capture a material’s properties, measurements must be linked to its processing lineage, rather than just its composition. (2) Categorical values should be mapped to canonical identifiers to disambiguate similar values, as multiple papers may reference different properties with the same term. (3) Extracted materials are more ed… view at source ↗

**Figure 3.** Figure 3: Schema of extracted materials in LitXAlloy. Each material is identified by its process steps, which are outlined by the arrow notation. Measurements performed on the material follow. CompMeasurements are various composition measurements performed on the sample. Configuration measurements correlate to microstructure and other features typically visible through an electron microscope. Further schema specific… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Definition of each Synthesis Group. Each material defines which group of synthesis events it undergoes through the arrow notation group1→group2. Groups that accept parameters (such as Hours) enable annotators to reuse synthesis groups across materials that differ by slight experimental parameters. lower ontological fidelity for measurement properties. For example, it maps compressive and tensile fracture s… view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Aggregating experimental data from papers enables materials scientists to build better property prediction models and to facilitate scientific discovery. Recently, interest has grown in extracting not only single material properties but also entire experimental measurements. To support this shift, we introduce LitXBench, a framework for benchmarking methods that extract experiments from literature. We also present LitXAlloy, a dense benchmark comprising 1426 total measurements from 19 alloy papers. By storing the benchmark's entries as Python objects, rather than text-based formats such as CSV or JSON, we improve auditability and enable programmatic data validation. We find that frontier language models, such as Gemini 3.1 Pro Preview, outperform existing multi-turn extraction pipelines by up to 0.37 F1. Our results suggest that this performance gap arises because extraction pipelines associate measurements with compositions rather than the processing steps that define a material.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LitXBench adds a new benchmark and Python-stored dataset for full experiment extraction but the 19-paper alloy slice limits how much the 0.37 F1 gap and its explanation can be generalized.

read the letter

The key takeaway is that this paper introduces LitXBench, a benchmarking framework, and LitXAlloy, a dataset of 1426 measurements drawn from 19 alloy papers, to evaluate methods for pulling complete experimental records out of scientific literature. They store the entries as Python objects to make auditing and validation easier than with text formats like CSV or JSON. The results show frontier language models such as Gemini 3.1 Pro Preview beating multi-turn extraction pipelines by as much as 0.37 F1, with the authors suggesting the gap comes from pipelines tying measurements to compositions instead of the defining processing steps. What the work does well is create a focused, dense resource for this task and provide a direct head-to-head comparison using a reproducible storage format. The emphasis on full experiments rather than isolated properties aligns with needs in materials property prediction. The main soft spot is the limited scope. Relying on only 19 papers from alloys means the observed performance difference and its proposed explanation might not hold in other experimental literatures that use different descriptive styles. For instance, papers on thin films or catalysis could present processing information in ways that change how pipelines or models perform. Without more on how the papers were chosen or measures like inter-annotator agreement, it's difficult to assess how solid the benchmark ground truth really is. The stress test note about generalization is worth checking against the full text. This paper is for people in information extraction and materials informatics who need test sets for experiment parsing. A reader looking for a new dataset to benchmark their extraction system would get practical value from it. The work shows clear thinking in setting up the benchmark and comparing approaches, so it deserves a serious referee even if revisions are needed to address the domain narrowness. I would recommend engaging with it in peer review.

Referee Report

3 major / 1 minor

Summary. The paper introduces LitXBench, a benchmarking framework for methods that extract experiments from scientific literature, along with LitXAlloy, a dataset of 1426 measurements from 19 alloy papers stored as Python objects for improved auditability and validation. It evaluates frontier language models such as Gemini 3.1 Pro Preview against existing multi-turn extraction pipelines and reports that LMs outperform by up to 0.37 F1, attributing the gap to pipelines linking measurements to compositions instead of processing steps.

Significance. If the central comparison holds under a transparent protocol, the work would provide a useful resource for advancing automated extraction of structured experimental data in materials science, supporting better property prediction models. The Python-object storage format is a clear strength for auditability and programmatic checks. The suggested causal diagnosis of pipeline failures could inform future system design, though its generality remains to be tested.

major comments (3)

The selection of the 19 alloy papers, the annotation procedure that produced the 1426 measurements, and any inter-annotator agreement statistics are not described in sufficient detail. These omissions are load-bearing for the headline 0.37 F1 claim, because the performance delta and the attributed cause (composition vs. processing-step linkage) cannot be independently verified without knowing how the ground truth was constructed and how representative the sample is.
The evaluation protocol is underspecified: the manuscript does not define how F1 is computed over the Python-object representation, what constitutes a correct extraction of a measurement, or whether human validation was used to confirm the automatic scores. This directly affects the reliability of the comparison between frontier LMs and multi-turn pipelines.
The claim that the observed gap arises because pipelines associate measurements with compositions rather than processing steps is supported only by the alloy subset. The paper should test or discuss whether the same failure mode appears in other experimental literatures (e.g., thin films or catalysis) whose narrative conventions differ; otherwise the causal explanation and the generalization that LMs are broadly superior remain provisional.

minor comments (1)

The abstract states an 'up to 0.37 F1' improvement but does not identify which specific LM–pipeline pair achieves the maximum; adding this information would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. We have addressed each of the major comments by making revisions to improve the transparency and scope of our work, as outlined in the point-by-point responses below.

read point-by-point responses

Referee: The selection of the 19 alloy papers, the annotation procedure that produced the 1426 measurements, and any inter-annotator agreement statistics are not described in sufficient detail. These omissions are load-bearing for the headline 0.37 F1 claim, because the performance delta and the attributed cause (composition vs. processing-step linkage) cannot be independently verified without knowing how the ground truth was constructed and how representative the sample is.

Authors: We agree that the manuscript would benefit from more detailed descriptions of the benchmark construction to support independent verification. In the revised version, we have added an expanded methods section detailing the selection of the 19 alloy papers based on their relevance to processing experiments, the annotation workflow where experts converted paper content into Python objects, and the inter-annotator agreement achieved during the process. These revisions address the concerns regarding the reliability of the 0.37 F1 claim and the attributed causes. revision: yes
Referee: The evaluation protocol is underspecified: the manuscript does not define how F1 is computed over the Python-object representation, what constitutes a correct extraction of a measurement, or whether human validation was used to confirm the automatic scores. This directly affects the reliability of the comparison between frontier LMs and multi-turn pipelines.

Authors: We acknowledge the need for a more precise definition of the evaluation protocol. We have revised the paper to include explicit definitions of how F1 is calculated for the Python object representations, specifying that a match requires equivalence in all structured fields after appropriate normalization. Additionally, we clarify that human validation was conducted to verify a subset of the automatic evaluations, ensuring the robustness of the comparison between models and pipelines. revision: yes
Referee: The claim that the observed gap arises because pipelines associate measurements with compositions rather than processing steps is supported only by the alloy subset. The paper should test or discuss whether the same failure mode appears in other experimental literatures (e.g., thin films or catalysis) whose narrative conventions differ; otherwise the causal explanation and the generalization that LMs are broadly superior remain provisional.

Authors: We appreciate this point regarding the scope of our causal analysis. While the detailed error analysis was performed on the alloy papers, we have added a discussion in the revised manuscript exploring how the identified failure mode in multi-turn pipelines—prioritizing composition over processing steps—may apply to other domains such as thin films and catalysis. We note that narrative conventions can vary, but the fundamental challenge of capturing sequential experimental information remains relevant. We have moderated our claims about broad superiority to reflect this and suggest future benchmarks in additional domains as valuable extensions. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark study with direct measurements; no derivations or self-referential predictions

full rationale

The paper introduces LitXBench and the LitXAlloy dataset (19 alloy papers, 1426 measurements) then reports measured F1 scores for frontier LMs versus extraction pipelines. These F1 numbers are computed directly against the authors' own annotated benchmark rather than derived from equations, fitted parameters renamed as predictions, or self-citation chains. No load-bearing step reduces to a prior result by the same authors; the evaluation is externally falsifiable by re-running the models on the released Python objects. Generalization concerns about the narrow alloy domain affect external validity but do not create circularity within the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark paper rather than a theoretical derivation; no free parameters, axioms, or invented entities are required for the central claim.

pith-pipeline@v0.9.0 · 5672 in / 1144 out tokens · 34492 ms · 2026-05-21T09:05:28.865843+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the extraction must identify samples with unique processing conditions as distinct materials
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LitXAlloy contains 1426 total measurements from 19 alloy papers... stored as Python objects

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

APL Materials , author =

Version 0.25.2. Haas, S., Manzoni, A. M., Krieg, F., and Glatzel, U. Mi- crostructure and mechanical properties of precipitate strengthened high entropy alloy al10co25cr8fe15ni36ti6 with additions of hafnium and molybdenum.Entropy, 21 (2):169, 2019. He, T., Sun, W., Huo, H., Kononova, O., Rong, Z., Tshi- toyan, V ., Botari, T., and Ceder, G. Similarity of...

work page doi:10.1063/1.4812323 2019
[2]

elements

‘raw_materials‘ (required): map each initial input name (for example ‘"elements"‘ or ‘"powders"‘) to ‘RawMaterial‘. - Populate ‘kind‘ with ‘RawMaterialKind‘ (usually ‘Ingot‘, ‘Powder‘, or ‘ Unspecified‘). - Populate ‘description‘ and ‘source‘ whenever the paper states purity, supplier, or precursor details

work page
[3]

annealing[Temp]

‘synthesis_groups‘ (required): a dict of named synthesis stages to lists of ‘ ProcessEvent‘. - Use reusable stages and process variables when appropriate (for example ‘" annealing[Temp]"‘). - Each ‘ProcessEvent‘ should include ‘kind‘ (a ‘ProcessKind‘ enum member), and include ‘temperature‘ (as ‘Quantity‘, e.g. ‘Quantity(value=1200, unit=Celsius)‘), ‘durat...

work page
[4]

elements->creation

‘output_materials‘ (required): list of ‘Material‘. - Populate ‘Material.process‘ using dataset process notation such as ‘"elements->creation"‘ or ‘"base->annealing[Temp=700]->quenching"‘. - The first segment (before the first ‘->‘) is a comma-separated list of input raw materials or named materials. Use commas to combine multiple inputs: ‘"elements, reinf...

work page
[5]

450 +- 20

Measurements: - Use ‘Measurement(kind=AlloyMeasurementKind.<kind>, value=<number>, unit=<unit>)‘. - If uncertainty is reported (e.g. "450 +- 20"), set ‘value=450.0‘ and ‘uncertainty =20.0‘. - If temperature or pressure is tied to a measurement, set ‘temperature=Quantity (...)‘ or ‘pressure=Measurement(...)‘. - Assume room temperature is ˜23 C when the pap...

work page
[6]

BCC phase

GlobalLatticeParam (for XRD lattice parameters and crystal structure): - Use ‘GlobalLatticeParam‘ when the paper reports lattice parameters from XRD for the overall material. - ‘lattice‘: wrap a pymatgen ‘Lattice‘ in ‘LatticeMeasurement(...)‘. Required parameters depend on type: - ‘Lattice.cubic(a)‘ - requires ‘a‘ - ‘Lattice.hexagonal(a, c)‘ - requires ‘a...

work page
[7]

hardness at the center region was 210 HV

Configuration (for microstructural features): - Use ‘Configuration‘ to describe microstructural features like dendrites, precipitates, phases, lamellae, or regions of interest with distinct microstructure (e.g. a Cr-rich region, an interdendritic zone). - Do NOT use Configuration merely to record where on the bulk material a measurement was taken. If the ...

work page
[8]

Microhardness measured with Vickers hardness tester at 500 gf load for 15 s

‘descriptions‘ (optional): list of ‘AlloyDescriptionGroup‘ for recording contextual information about measurement methods and equipment, or process-related descriptions that apply to all materials. - Use this field for information about HOW measurements were performed (instruments, testing conditions, specimen dimensions, strain rates) and general descrip...

work page
[9]

balance notation

‘balance_composition(main_element, additions)‘ - for "balance notation" compositions. Use when the paper writes compositions like Ti-6Al-4V, meaning the main element (Ti) makes up the balance (remainder to 100 wt%) after accounting for the other additions (6 wt% Al, 4 wt% V). - ‘main_element‘: string name of the balance element (e.g. ‘"Ti"‘). - ‘additions...

work page
[10]

add X wt% of Y to base alloy

‘composition_with_weight_additions(base, additions, addition_wt_frac)‘ - for when the paper says "add X wt% of Y to base alloy". - ‘base‘: the original alloy composition before additions (usually atomic-fraction style). - ‘additions‘: the additive recipe expressed by weight ratio; use ‘Composition. from_weight_dict(...)‘ for this. - ‘addition_wt_frac‘: de...

work page
[11]

raw_materials

"raw_materials" (required): map each initial input name (e.g. "elements" or " powders") to a raw material object. - "kind": one of the RawMaterialKind values (usually "Ingot", "Powder", or " Unspecified"). - Populate "description" and "source" whenever the paper states purity, supplier, or precursor details

work page
[12]

synthesis_groups

"synthesis_groups" (required): an object mapping named synthesis stages to arrays of process event objects. - Use reusable stages and process variables when appropriate (e.g. "annealing[ Temp]"). - Each process event MUST include "kind" (a ProcessKind member name). Optionally include "temperature", "duration", "description", "source" when available. If yo...

work page
[13]

output_materials

"output_materials" (required): array of material objects. 29 LitXBench: A Benchmark for Extracting Experiments from Scientific Literature - "process": use process notation such as "elements->creation" or "base-> annealing[Temp=700]->quenching". - The first segment (before the first "->") is a comma-separated list of input raw materials or named materials....

work page
[14]

measurements

Measurements - each item in the "measurements" array must have a "_type" field: - "_type": "composition" - for composition. Include "composition" (formula string or element dict) and optionally "method". - "_type": "measurement" - for a single measurement. REQUIRED: "kind", "value", " unit" (all three must be present). Optional: "uncertainty", "measuremen...

work page
[15]

_type":

Lattice parameters (for XRD-determined crystal structure): - Use "_type": "lattice_param" with a "lattice" object. Required parameters depend on type: - "cubic": {"type": "cubic", "a": ...} (requires "a") - "hexagonal": {"type": "hexagonal", "a": ..., "c": ...} (requires "a" and "c") - "tetragonal": {"type": "tetragonal", "a": ..., "c": ...} (requires "a"...

work page
[16]

_type":

Configuration (for microstructural features): - Use "_type": "configuration" to describe dendrites, precipitates, phases, lamellae, or regions with distinct microstructure. - Do NOT use configuration merely to record where on the bulk material a measurement was taken. - "name": identifies the feature (e.g. "dendrite", "FCC matrix", "B2 precipitates "). - ...

work page
[17]

descriptions

"descriptions" (optional): array of description group objects for recording contextual information about measurement methods and equipment, or process-related descriptions. - Use this for information about HOW measurements were performed (instruments, testing conditions). - "kinds": array of AlloyMeasurementKind, PhaseMeasurementKind, ProcessKind, or Meas...

work page
[18]

balance notation

Balance composition - for "balance notation" (e.g. Ti-6Al-4V): ‘‘‘json {"_helper": "balance_composition", "main_element": "Ti", "additions": {"Al": 6, " V": 4}} ‘‘‘ Ti is the balance element (90 wt%), Al is 6 wt%, V is 4 wt%

work page
[19]

_helper":

From weight dict - create composition from weight percentages: ‘‘‘json {"_helper": "from_weight_dict", "weights": {"Ni": 60, "Co": 20, "Cr": 20}} ‘‘‘

work page
[20]

_helper":

Weight additions - add X wt% of a mix to a base alloy: ‘‘‘json {"_helper": "weight_additions", "base": "NbTaTiZr", "additions_weights": {"Mo": 50, "W": 50}, "fraction": 0.05} ‘‘‘ Adds 5 wt% of a 50/50 Mo/W mix to equiatomic NbTaTiZr. "fraction" is a decimal: 5 wt% = 0.05, 2.5 wt% = 0.025. Use these helpers inside the "composition" field of a composition m...

work page

[1] [1]

APL Materials , author =

Version 0.25.2. Haas, S., Manzoni, A. M., Krieg, F., and Glatzel, U. Mi- crostructure and mechanical properties of precipitate strengthened high entropy alloy al10co25cr8fe15ni36ti6 with additions of hafnium and molybdenum.Entropy, 21 (2):169, 2019. He, T., Sun, W., Huo, H., Kononova, O., Rong, Z., Tshi- toyan, V ., Botari, T., and Ceder, G. Similarity of...

work page doi:10.1063/1.4812323 2019

[2] [2]

elements

‘raw_materials‘ (required): map each initial input name (for example ‘"elements"‘ or ‘"powders"‘) to ‘RawMaterial‘. - Populate ‘kind‘ with ‘RawMaterialKind‘ (usually ‘Ingot‘, ‘Powder‘, or ‘ Unspecified‘). - Populate ‘description‘ and ‘source‘ whenever the paper states purity, supplier, or precursor details

work page

[3] [3]

annealing[Temp]

‘synthesis_groups‘ (required): a dict of named synthesis stages to lists of ‘ ProcessEvent‘. - Use reusable stages and process variables when appropriate (for example ‘" annealing[Temp]"‘). - Each ‘ProcessEvent‘ should include ‘kind‘ (a ‘ProcessKind‘ enum member), and include ‘temperature‘ (as ‘Quantity‘, e.g. ‘Quantity(value=1200, unit=Celsius)‘), ‘durat...

work page

[4] [4]

elements->creation

‘output_materials‘ (required): list of ‘Material‘. - Populate ‘Material.process‘ using dataset process notation such as ‘"elements->creation"‘ or ‘"base->annealing[Temp=700]->quenching"‘. - The first segment (before the first ‘->‘) is a comma-separated list of input raw materials or named materials. Use commas to combine multiple inputs: ‘"elements, reinf...

work page

[5] [5]

450 +- 20

Measurements: - Use ‘Measurement(kind=AlloyMeasurementKind.<kind>, value=<number>, unit=<unit>)‘. - If uncertainty is reported (e.g. "450 +- 20"), set ‘value=450.0‘ and ‘uncertainty =20.0‘. - If temperature or pressure is tied to a measurement, set ‘temperature=Quantity (...)‘ or ‘pressure=Measurement(...)‘. - Assume room temperature is ˜23 C when the pap...

work page

[6] [6]

BCC phase

GlobalLatticeParam (for XRD lattice parameters and crystal structure): - Use ‘GlobalLatticeParam‘ when the paper reports lattice parameters from XRD for the overall material. - ‘lattice‘: wrap a pymatgen ‘Lattice‘ in ‘LatticeMeasurement(...)‘. Required parameters depend on type: - ‘Lattice.cubic(a)‘ - requires ‘a‘ - ‘Lattice.hexagonal(a, c)‘ - requires ‘a...

work page

[7] [7]

hardness at the center region was 210 HV

Configuration (for microstructural features): - Use ‘Configuration‘ to describe microstructural features like dendrites, precipitates, phases, lamellae, or regions of interest with distinct microstructure (e.g. a Cr-rich region, an interdendritic zone). - Do NOT use Configuration merely to record where on the bulk material a measurement was taken. If the ...

work page

[8] [8]

Microhardness measured with Vickers hardness tester at 500 gf load for 15 s

‘descriptions‘ (optional): list of ‘AlloyDescriptionGroup‘ for recording contextual information about measurement methods and equipment, or process-related descriptions that apply to all materials. - Use this field for information about HOW measurements were performed (instruments, testing conditions, specimen dimensions, strain rates) and general descrip...

work page

[9] [9]

balance notation

‘balance_composition(main_element, additions)‘ - for "balance notation" compositions. Use when the paper writes compositions like Ti-6Al-4V, meaning the main element (Ti) makes up the balance (remainder to 100 wt%) after accounting for the other additions (6 wt% Al, 4 wt% V). - ‘main_element‘: string name of the balance element (e.g. ‘"Ti"‘). - ‘additions...

work page

[10] [10]

add X wt% of Y to base alloy

‘composition_with_weight_additions(base, additions, addition_wt_frac)‘ - for when the paper says "add X wt% of Y to base alloy". - ‘base‘: the original alloy composition before additions (usually atomic-fraction style). - ‘additions‘: the additive recipe expressed by weight ratio; use ‘Composition. from_weight_dict(...)‘ for this. - ‘addition_wt_frac‘: de...

work page

[11] [11]

raw_materials

"raw_materials" (required): map each initial input name (e.g. "elements" or " powders") to a raw material object. - "kind": one of the RawMaterialKind values (usually "Ingot", "Powder", or " Unspecified"). - Populate "description" and "source" whenever the paper states purity, supplier, or precursor details

work page

[12] [12]

synthesis_groups

"synthesis_groups" (required): an object mapping named synthesis stages to arrays of process event objects. - Use reusable stages and process variables when appropriate (e.g. "annealing[ Temp]"). - Each process event MUST include "kind" (a ProcessKind member name). Optionally include "temperature", "duration", "description", "source" when available. If yo...

work page

[13] [13]

output_materials

"output_materials" (required): array of material objects. 29 LitXBench: A Benchmark for Extracting Experiments from Scientific Literature - "process": use process notation such as "elements->creation" or "base-> annealing[Temp=700]->quenching". - The first segment (before the first "->") is a comma-separated list of input raw materials or named materials....

work page

[14] [14]

measurements

Measurements - each item in the "measurements" array must have a "_type" field: - "_type": "composition" - for composition. Include "composition" (formula string or element dict) and optionally "method". - "_type": "measurement" - for a single measurement. REQUIRED: "kind", "value", " unit" (all three must be present). Optional: "uncertainty", "measuremen...

work page

[15] [15]

_type":

Lattice parameters (for XRD-determined crystal structure): - Use "_type": "lattice_param" with a "lattice" object. Required parameters depend on type: - "cubic": {"type": "cubic", "a": ...} (requires "a") - "hexagonal": {"type": "hexagonal", "a": ..., "c": ...} (requires "a" and "c") - "tetragonal": {"type": "tetragonal", "a": ..., "c": ...} (requires "a"...

work page

[16] [16]

_type":

Configuration (for microstructural features): - Use "_type": "configuration" to describe dendrites, precipitates, phases, lamellae, or regions with distinct microstructure. - Do NOT use configuration merely to record where on the bulk material a measurement was taken. - "name": identifies the feature (e.g. "dendrite", "FCC matrix", "B2 precipitates "). - ...

work page

[17] [17]

descriptions

"descriptions" (optional): array of description group objects for recording contextual information about measurement methods and equipment, or process-related descriptions. - Use this for information about HOW measurements were performed (instruments, testing conditions). - "kinds": array of AlloyMeasurementKind, PhaseMeasurementKind, ProcessKind, or Meas...

work page

[18] [18]

balance notation

Balance composition - for "balance notation" (e.g. Ti-6Al-4V): ‘‘‘json {"_helper": "balance_composition", "main_element": "Ti", "additions": {"Al": 6, " V": 4}} ‘‘‘ Ti is the balance element (90 wt%), Al is 6 wt%, V is 4 wt%

work page

[19] [19]

_helper":

From weight dict - create composition from weight percentages: ‘‘‘json {"_helper": "from_weight_dict", "weights": {"Ni": 60, "Co": 20, "Cr": 20}} ‘‘‘

work page

[20] [20]

_helper":

Weight additions - add X wt% of a mix to a base alloy: ‘‘‘json {"_helper": "weight_additions", "base": "NbTaTiZr", "additions_weights": {"Mo": 50, "W": 50}, "fraction": 0.05} ‘‘‘ Adds 5 wt% of a 50/50 Mo/W mix to equiatomic NbTaTiZr. "fraction" is a decimal: 5 wt% = 0.05, 2.5 wt% = 0.025. Use these helpers inside the "composition" field of a composition m...

work page