GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data

Jiaxin Xiao; Margarita Belova; Niraj K. Jha; Shikhar Tuli

arxiv: 2510.09580 · v2 · submitted 2025-10-10 · 💻 cs.AI · cs.CL

GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data

Margarita Belova , Jiaxin Xiao , Shikhar Tuli , Niraj K. Jha This is my paper

Pith reviewed 2026-05-18 07:33 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords knowledge graphsneurosymbolic AIencoder-only modeldistillationunstructured datafactual accuracyvalidity

0 comments

The pith

A compact graphical model extracts more reliable knowledge graphs from text than much larger language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GraphMERT as an 80M-parameter encoder-only model that turns unstructured text into factual and ontology-consistent knowledge graphs. It claims this creates the first scalable neurosymbolic system by pairing neural abstraction learning with explicit symbolic representations for verifiable reasoning. A sympathetic reader would care because it addresses the long-standing problem of building trustworthy, interpretable KGs without the hallucinations and prompt sensitivity common in large LLMs. The work demonstrates this on PubMed diabetes papers, where the small model produces KGs with substantially higher factual accuracy and validity than a 32B baseline.

Core claim

GraphMERT is a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations, forming a modular neurosymbolic stack in which neural components learn abstractions while symbolic KGs support verifiable reasoning, and it is the first efficient system to reach state-of-the-art benchmark accuracy with superior symbolic quality.

What carries the argument

GraphMERT, the 80M-parameter graphical encoder-only model that produces domain-specific KGs which are both factual with provenance and valid with ontology-consistent relations.

If this is right

Neurosymbolic systems can scale efficiently by using compact neural models to generate explicit symbolic KGs.
Domain-specific KGs for fields like medicine become practical to build automatically with higher reliability than current LLM methods.
Prompt sensitivity and hallucinated relations in KG extraction are reduced by distilling from both text and internal model representations.
Verifiable reasoning becomes available in applications that combine the neural encoder with the resulting symbolic graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The modular design could extend to other scientific domains by retraining GraphMERT on domain corpora while keeping the same symbolic validation layer.
Integration with existing ontologies might further boost validity without increasing model size.
The efficiency gains suggest testing whether similar small models can replace larger ones in additional neurosymbolic pipelines.

Load-bearing premise

The chosen metrics and baseline prompting fully capture KG reliability without hidden advantages for the proposed model.

What would settle it

An experiment that optimizes prompting and any post-processing for the 32B LLM baseline on the identical PubMed diabetes corpus and shows it matching or exceeding GraphMERT's FActScore and ValidityScore.

Figures

Figures reproduced from arXiv: 2510.09580 by Jiaxin Xiao, Margarita Belova, Niraj K. Jha, Shikhar Tuli.

**Figure 2.** Figure 2: Overview of the GraphMERT framework. It is trained on the fusion of syntactic and semantic examples (II) and augments syntactic data with semantic tails (I); an LLM helps determine the linguistic structure of tails proposed by GraphMERT (III). (I): Chain graph (Ic) combines syntactic knowledge from text corpora (Ib) with semantic examples and relations from a seed KG (Ia): Roots hold syntactic knowledge (i… view at source ↗

**Figure 3.** Figure 3: Chain graph. Roots are in orange, leaves are in blue. Conceptual representation (A, B): term [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Main GraphMERT architectural components. GraphMERT is a RoBERTa transformer with two modifications. (I) In the embedding layer, H-GAT encodes semantic triples. (IA) There are leaves connected to a root node; hence, the node feature is equal to the token embedding. (IB) There are leaves connected to a root node; H-GAT fuses leaves, relations, and head embeddings resulting in fused node feature. (II) In the … view at source ↗

**Figure 5.** Figure 5: Semantic embedding derivation on leaves (only three leaves are shown). [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Relation embedding training. The sequence with updated leaf embeddings is passed to the trans [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Data preparation for GraphMERT. To find the most relevant triples, we perform semantic similarity matching of triples to dataset sequences. The triple head should almost literally match one of the entities discovered in Step (I); from them, we pick the top triples whose tails are semantically close to the sequence. All matched triples are subject to the injection algorithm (III), which selects the top-scor… view at source ↗

**Figure 8.** Figure 8: GraphMERT Pipeline flowchart with temporal execution ordering of the main components. PubMed papers (training dataset) PubMed papers (training dataset) PubMed papers (training dataset) 5. Form final triples for this head (with multi-token tails) PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD MASK PAD PAD PAD PAD PAD PAD Sequence with all leaves empty Sequence with one masked leaf k tokens 3. Predict the masked le… view at source ↗

**Figure 9.** Figure 9: Prediction of triple tails. The trained GraphMERT predicts the top k tokens for a masked leaf and the chosen relation, resulting in a set of raw triples with the same head. related) triples that may not be explicitly mentioned in the sequence. However, if β is set too low, the output becomes flooded with triples that merely restate general truths, reflecting statistically dominant statements in the trainin… view at source ↗

**Figure 10.** Figure 10: Leafy chain graph encoded sequentially: 7-leaf case. The sequence has a fixed length of 1024. [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: I. Forming triple tails for a given sequence with [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Validity check with GPT-5 Thinking, 100 random triples per keyword. The keywords are lined on [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗

**Figure 13.** Figure 13: GraphRAG accuracies for different α and β. Bubble size corresponds to absolute accuracy and color indicates the accuracy gain relative to the LLM KG baseline (red denotes positive gain, blue denotes negative gain) [PITH_FULL_IMAGE:figures/full_fig_p035_13.png] view at source ↗

read the original abstract

Researchers have pursued neurosymbolic artificial intelligence (AI) applications for nearly three decades. A marriage of the neural and symbolic components can lead to rapid advancements in AI. Yet, the field has not realized this promise since most neurosymbolic AI frameworks fail to scale. In addition, the implicit representations and approximate reasoning of purely neural approaches limit interpretability and trust. Knowledge graphs (KGs), a gold-standard representation of explicit semantic knowledge, can address the symbolic side of the problem. However, automatically deriving reliable KGs from text corpora remains an open problem. We address these challenges by introducing GraphMERT, a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations. GraphMERT and its equivalent KG form a modular neurosymbolic stack: neural learning of abstractions; symbolic KGs for verifiable reasoning. GraphMERT + KG is the first efficient and scalable neurosymbolic model to achieve state-of-the-art benchmark accuracy along with superior symbolic representations relative to baselines. Concretely, we target reliable domain-specific KGs that are both (1) factual (with provenance) and (2) valid (ontology-consistent relations with domain-appropriate semantics). When a large language model (LLM), e.g., Qwen3-32B, generates domain-specific KGs, it falls short on reliability due to prompt sensitivity, shallow domain expertise, and hallucinated relations. On text obtained from PubMed papers on diabetes, our 80M-parameter GraphMERT yields a KG with a 69.8% FActScore; a 32B-parameter baseline LLM yields a KG that achieves only 40.2% FActScore. The GraphMERT KG also attains a higher ValidityScore of 68.8%, versus 43.0% for the LLM baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphMERT puts forward a compact graphical encoder for distilling domain KGs from text with reported gains over a large LLM baseline, but the comparison lacks prompting details and the methods section is too thin to verify the reliability claims.

read the letter

The main point on this paper is that an 80M-parameter graphical encoder called GraphMERT is said to produce more factual and valid knowledge graphs from PubMed diabetes text than a 32B LLM, hitting 69.8% FActScore and 68.8% ValidityScore against 40.2% and 43.0%. If the gap is real, it offers a smaller-scale route to usable symbolic output in neurosymbolic setups where scaling has been a sticking point.

Referee Report

2 major / 1 minor

Summary. The paper introduces GraphMERT, an 80M-parameter graphical encoder-only model that distills reliable knowledge graphs from unstructured text. It positions GraphMERT + KG as the first efficient scalable neurosymbolic system achieving SOTA benchmark accuracy with superior symbolic output, concretely reporting 69.8% FActScore and 68.8% ValidityScore on PubMed diabetes text versus 40.2% and 43.0% for a 32B LLM baseline.

Significance. If the performance claims are substantiated with full experimental details, the work would be significant for neurosymbolic AI by showing that a compact graphical encoder can produce more factual and ontology-consistent KGs than much larger LLMs, offering a practical path to scalable, interpretable knowledge extraction from domain text.

major comments (2)

[Abstract] Abstract: The abstract reports specific numerical results (69.8% FActScore, 68.8% ValidityScore for GraphMERT vs. 40.2%/43.0% for Qwen3-32B) but supplies no description of training procedure, architecture details, data splits, or statistical significance. This absence prevents verification of the central reliability advantage.
[Abstract] Abstract: The superiority claim rests on the LLM baseline being prompted representatively, yet the text provides no prompt text, system instructions, few-shot examples, temperature settings, or ablation results on prompting variants, despite the paper's own emphasis on LLM prompt sensitivity and hallucinated relations. This leaves open whether the gap isolates GraphMERT's contribution or reflects unequal elicitation effort.

minor comments (1)

[Abstract] The abstract could clarify the exact definition and computation of FActScore and ValidityScore, including any post-processing steps, to allow readers to assess how fully they capture reliability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript. We value the feedback on improving the clarity and verifiability of our claims regarding GraphMERT. We address each major comment below and commit to revisions that strengthen the presentation without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract reports specific numerical results (69.8% FActScore, 68.8% ValidityScore for GraphMERT vs. 40.2%/43.0% for Qwen3-32B) but supplies no description of training procedure, architecture details, data splits, or statistical significance. This absence prevents verification of the central reliability advantage.

Authors: We acknowledge that the abstract's brevity precludes inclusion of full methodological details, which limits immediate verification from the abstract alone. The complete manuscript details the training procedure in Section 3, architecture in Section 2, data splits in Section 4.1, and statistical significance testing in Section 5.3. To directly address this concern, we will revise the abstract to include concise references to these elements and explicit pointers to the relevant sections for full verification. revision: yes
Referee: [Abstract] Abstract: The superiority claim rests on the LLM baseline being prompted representatively, yet the text provides no prompt text, system instructions, few-shot examples, temperature settings, or ablation results on prompting variants, despite the paper's own emphasis on LLM prompt sensitivity and hallucinated relations. This leaves open whether the gap isolates GraphMERT's contribution or reflects unequal elicitation effort.

Authors: We agree this is an important point for ensuring a fair and reproducible comparison, particularly given the manuscript's discussion of LLM prompt sensitivity. The current version does not include the exact prompt configuration used for the Qwen3-32B baseline. In the revised manuscript, we will add the full prompt template, system instructions, any few-shot examples, temperature settings, and results from prompting ablations to the appendix or a dedicated experimental subsection. This will substantiate that the reported performance gap reflects GraphMERT's advantages rather than differences in elicitation effort. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on new model and empirical benchmarks without reduction to inputs by construction

full rationale

The paper introduces GraphMERT as a new 80M-parameter graphical encoder-only model for distilling KGs from unstructured text, then reports concrete empirical results on PubMed diabetes text using FActScore (69.8% vs. 40.2%) and ValidityScore (68.8% vs. 43.0%) against a 32B LLM baseline. No derivation chain, equations, or self-referential steps appear in the provided text that would make any prediction equivalent to fitted inputs or prior self-citations by construction. The neurosymbolic stack description and SOTA claim are presented as outcomes of the new architecture and evaluation, not as tautological renamings or load-bearing self-citations. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities beyond the model name itself; all supporting details on training objectives, ontology definitions, and scoring functions are absent, leaving the ledger empty pending full text.

pith-pipeline@v0.9.0 · 5881 in / 1414 out tokens · 32076 ms · 2026-05-18T07:33:46.142514+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GraphMERT is a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations... MLM + MNM objectives
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ValidityScore... ontology-consistent relations... FActScore

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

ISSN 0004-3702. doi: 10.1016/0004-3702(93)90068-M. ChuangLiu, ZelinYao, YibingZhan, XueqiMa, ShiruiPan, andWenbinHu. Gradformer: Graphtransformer with exponential decay, 2024a. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, M. Lewi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/0004-3702(93)90068-m 2019
[2]

Large Language Models: A Survey

ISSN 0004-3702. doi: 10.1016/0004-3702(80)90011-9. Special Issue on Non-Monotonic Logic. Dhruv Mehrotra and Tim Marchman. Perplexity is a bullshit machine, 2024. URLhttps://www.wired. com/story/perplexity-is-a-bullshit-machine/.WIRED, investigation documenting data scraping and multiple hallucinations/misattributions. Sewon Min, Kalpesh Krishna, Xinxi Lyu...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/0004-3702(80)90011-9 2024
[3]

doi: 10.1016/j.fmre.2021.09.003

ISSN 2667-3258. doi: 10.1016/j.fmre.2021.09.003. Xiaoye Wang, Nicole Xi Zhang, Hongyu He, Trang Nguyen, Kun-Hsing Yu, Hao Deng, Cynthia Brandt, Danielle S. Bitterman, Ling Pan, Ching-Yu Cheng, James Zou, and Dianbo Liu. Safety challenges of AI in medicine in the era of large language models, 2025b. arXiv:2409.18968 [cs.CY]. Yuhao Wang, Ruiyang Ren, Junyi ...

work page doi:10.1016/j.fmre.2021.09.003 2021
[4]

Drop all triples with a score less than thresholdα

work page
[5]

Triple selection for each head:To balance contextual relevance with relation diversity: 1st priority: maximize injection score, 2nd priority: maintain relation diversity

Make all triples unique: If a triple matches multiple sequences, retain the triple˜Twith the highest score, i.e., in the sequence to which the triple is most relevant: ˜T= arg max seq score(T(seq)) The second preprocessing step prevents overfitting in the semantic space on common triples. Triple selection for each head:To balance contextual relevance with...

work page
[6]

Split relations into relation buckets based on the number of unique triples at stepkand assume that within each bucket all relations are equally diverse (e.g.,k= 20implies that relations with #triples 100-120, 120-140.., are treated as equally diverse)

work page
[7]

Within each relation bucket, sort all triples by score regardless of relation

work page
[8]

Start with the lowest-numbered bucket (rarest relations). Within it, start with the triple with the highest score and retain only it for its head, removing all other matched triples, which may have a higher score but may be in a higher relation bucket. As a result, one of the rarest possible relations in the dataset would survive for this head, increasing...

work page
[9]

Order triples by score

work page
[10]

Split into score buckets: Assume that within each score bucket, triples are equally good

work page
[11]

isa” prevails in the seed KG, theGraphMERTKG is heavily skewed towards “associated_with

Then, within each score bucket, apply Maximize diversity. Altogether, we group triples by how “low” the score is (higher scores are assigned to lower-bucket IDs). Then, within each score bucket, we favor relation types that are less frequent. Finally, we choose the highest-scoring triple for each head. The algorithm is implemented using the Pandas framewo...

work page 2019
[12]

myocardial infarction,

Select a precise and medically-specific span (e.g., “myocardial infarction,” not “infarction”). Avoid generic terms like “disease,” “condition,” “patients,” and “comorbidity” without a specific context. When encountering vague descriptors like “complication,” “symptom,” or “effect,” always prefer explicitly named conditions or symptoms directly linked to ...

work page
[13]

Keep original spelling, casing, and abbreviations from the sequence

work page
[14]

Do not include COVID- related terms

Choose only entities that add meaningful medical knowledge to the diabetes KG. Do not include COVID- related terms. Do not include head entities that describe findings in animal models (mice, rats, etc.)

work page
[15]

60+” is too context-dependent). •“anxiety,

A few examples of low-value entities youshould notinclude: •‘≥10 % weight reduction’(too context-dependent). •‘nhanes 2015 - 2018’(dataset/survey, not a medical entity). •‘semaglutide 2.4 mg’(includes a dosage, which can vary). •‘60+ women’(“60+” is too context-dependent). •“anxiety,” “home births,” “pregnant women,” “neonatal deaths,” “general practition...

work page 2015
[16]

Output (Incorrect)

If it is not clear whether a term adds diabetes-specific knowledge, look at the context. If the text explicitly links the term to a diabetes-specific concept, include it. Otherwise, exclude it when mentioned only in a generic context. Include such terms when the sequence clearly links them to a diabetes-relevant gene, pathway, cell type, or therapeutic ef...

work page
[17]

Identify candidate spans

work page
[18]

Filter by medical precision and relevance rules

work page
[19]

Diabetic retinopathy

Confirm the entity’s relevance and contribution to the diabetes KG; discard low-value entities. Input format:sequence Output format:[‘head1’, ‘head2’, ...]. If none, output[]. 59 Few-shot Example for Entity Discovery Prompt Input: sequence: ..., its upstream regulator has the opposite effect (Han et al., 2013). Previous studies suggest that CHOP deteriora...

work page 2013
[20]

•For each head, find explicit mentions in the text

Understand Input •Clearly understand the biomedical context from the sequence. •For each head, find explicit mentions in the text. •Check if each head is explicitly linked to other concepts or relations

work page
[21]

Evaluate each head individually

Use the list of allowed relations. Evaluate each head individually. Do not overuse the relation associated_with— apply it only when appropriate

work page
[22]

key regulator of inflammation,

For each head, list only plausible and supported relations. Return[]if none apply. Think concisely within⟨think⟩...⟨/think⟩. Immediately after, output JSON. 61 Few-shot Example for the Relation Matching Prompt Input: ...interleukin-1 R6, and receptor activator of nuclear factor kappa-B (RANK). Together, proteomic data suggest the targeting of several key ...

work page
[23]

Organism: Plant; Fungus; Virus; Bacterium; Archaeon; Eukaryote; Vertebrate; Amphibian; Bird; Fish; Reptile; Mammal; Human

work page
[24]

Anatomical Structure: Embryonic Structure; Anatomical Abnormality; Congenital Abnormality; Acquired Abnormality; Fully Formed Anatomical Structure; Body Part, Organ, or Organ Component; Tissue; Cell; Cell Component; Gene or Genome

work page
[25]

Manufactured Object: Medical Device; Drug Delivery Device; Research Device; Clinical Drug

work page
[26]

Substance: Chemical; Pharmacologic Substance; Antibiotic; Biomedical or Dental Material; Biologically Active Substance; Hormone; Enzyme; Vitamin; Immunologic Factor; Receptor; Indicator, Reagent, or Diagnostic Aid; Organic Chemical; Nucleic Acid, Nucleoside, or Nucleotide; Amino Acid, Peptide, or Protein; Inorganic Chemical; Element, Ion, or Isotope; Body...

work page
[27]

Conceptual Entity: Idea or Concept; Body System; Body Space or Junction; Body Location or Region; Molecular Sequence; Nucleotide Sequence; Amino Acid Sequence; Carbohydrate Sequence; Geographic Area; Finding; Laboratory or Test Result; Sign or Symptom; Organism Attribute; Clinical Attribute; Intellectual Product; Occupation or Discipline; Organization; Gr...

work page
[28]

Identify all entities corresponding to one of the 5 main entity types and relevant to diabetes, using the subcategory examples as guidance for classification. For each identified entity, extract the following information: - entity_name: Name of the entity, lowercase - entity_type: One of the following types: Organism, Anatomical Structure, Manufactured Ob...

work page
[29]

Only use the 35 relationships that are in the predefined list

From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other according to the given text, and are medically meaningful. Only use the 35 relationships that are in the predefined list. Avoid relationships that are attached to entities that are too general, for example: patients, bodily f...

work page
[30]

Use ##as the list delimiter

Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ##as the list delimiter

work page
[31]

This is an example sentence supported by multiple data references[Data: <dataset name> (record ids); <dataset name> (record ids)]

When finished, output<|COMPLETE|> 66 - Constraints and Guidelines - Strict Textual Grounding: Base all extractions only on the provided medical abstract. Do not use external knowledge or make assumptions beyond what is written. - Entity Filtering: Only extract the entities whose type is present in the provided 5 Entity Type, and only extract entities that...

work page

[1] [1]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

ISSN 0004-3702. doi: 10.1016/0004-3702(93)90068-M. ChuangLiu, ZelinYao, YibingZhan, XueqiMa, ShiruiPan, andWenbinHu. Gradformer: Graphtransformer with exponential decay, 2024a. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, M. Lewi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/0004-3702(93)90068-m 2019

[2] [2]

Large Language Models: A Survey

ISSN 0004-3702. doi: 10.1016/0004-3702(80)90011-9. Special Issue on Non-Monotonic Logic. Dhruv Mehrotra and Tim Marchman. Perplexity is a bullshit machine, 2024. URLhttps://www.wired. com/story/perplexity-is-a-bullshit-machine/.WIRED, investigation documenting data scraping and multiple hallucinations/misattributions. Sewon Min, Kalpesh Krishna, Xinxi Lyu...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/0004-3702(80)90011-9 2024

[3] [3]

doi: 10.1016/j.fmre.2021.09.003

ISSN 2667-3258. doi: 10.1016/j.fmre.2021.09.003. Xiaoye Wang, Nicole Xi Zhang, Hongyu He, Trang Nguyen, Kun-Hsing Yu, Hao Deng, Cynthia Brandt, Danielle S. Bitterman, Ling Pan, Ching-Yu Cheng, James Zou, and Dianbo Liu. Safety challenges of AI in medicine in the era of large language models, 2025b. arXiv:2409.18968 [cs.CY]. Yuhao Wang, Ruiyang Ren, Junyi ...

work page doi:10.1016/j.fmre.2021.09.003 2021

[4] [4]

Drop all triples with a score less than thresholdα

work page

[5] [5]

Triple selection for each head:To balance contextual relevance with relation diversity: 1st priority: maximize injection score, 2nd priority: maintain relation diversity

Make all triples unique: If a triple matches multiple sequences, retain the triple˜Twith the highest score, i.e., in the sequence to which the triple is most relevant: ˜T= arg max seq score(T(seq)) The second preprocessing step prevents overfitting in the semantic space on common triples. Triple selection for each head:To balance contextual relevance with...

work page

[6] [6]

Split relations into relation buckets based on the number of unique triples at stepkand assume that within each bucket all relations are equally diverse (e.g.,k= 20implies that relations with #triples 100-120, 120-140.., are treated as equally diverse)

work page

[7] [7]

Within each relation bucket, sort all triples by score regardless of relation

work page

[8] [8]

Start with the lowest-numbered bucket (rarest relations). Within it, start with the triple with the highest score and retain only it for its head, removing all other matched triples, which may have a higher score but may be in a higher relation bucket. As a result, one of the rarest possible relations in the dataset would survive for this head, increasing...

work page

[9] [9]

Order triples by score

work page

[10] [10]

Split into score buckets: Assume that within each score bucket, triples are equally good

work page

[11] [11]

isa” prevails in the seed KG, theGraphMERTKG is heavily skewed towards “associated_with

Then, within each score bucket, apply Maximize diversity. Altogether, we group triples by how “low” the score is (higher scores are assigned to lower-bucket IDs). Then, within each score bucket, we favor relation types that are less frequent. Finally, we choose the highest-scoring triple for each head. The algorithm is implemented using the Pandas framewo...

work page 2019

[12] [12]

myocardial infarction,

Select a precise and medically-specific span (e.g., “myocardial infarction,” not “infarction”). Avoid generic terms like “disease,” “condition,” “patients,” and “comorbidity” without a specific context. When encountering vague descriptors like “complication,” “symptom,” or “effect,” always prefer explicitly named conditions or symptoms directly linked to ...

work page

[13] [13]

Keep original spelling, casing, and abbreviations from the sequence

work page

[14] [14]

Do not include COVID- related terms

Choose only entities that add meaningful medical knowledge to the diabetes KG. Do not include COVID- related terms. Do not include head entities that describe findings in animal models (mice, rats, etc.)

work page

[15] [15]

60+” is too context-dependent). •“anxiety,

A few examples of low-value entities youshould notinclude: •‘≥10 % weight reduction’(too context-dependent). •‘nhanes 2015 - 2018’(dataset/survey, not a medical entity). •‘semaglutide 2.4 mg’(includes a dosage, which can vary). •‘60+ women’(“60+” is too context-dependent). •“anxiety,” “home births,” “pregnant women,” “neonatal deaths,” “general practition...

work page 2015

[16] [16]

Output (Incorrect)

If it is not clear whether a term adds diabetes-specific knowledge, look at the context. If the text explicitly links the term to a diabetes-specific concept, include it. Otherwise, exclude it when mentioned only in a generic context. Include such terms when the sequence clearly links them to a diabetes-relevant gene, pathway, cell type, or therapeutic ef...

work page

[17] [17]

Identify candidate spans

work page

[18] [18]

Filter by medical precision and relevance rules

work page

[19] [19]

Diabetic retinopathy

Confirm the entity’s relevance and contribution to the diabetes KG; discard low-value entities. Input format:sequence Output format:[‘head1’, ‘head2’, ...]. If none, output[]. 59 Few-shot Example for Entity Discovery Prompt Input: sequence: ..., its upstream regulator has the opposite effect (Han et al., 2013). Previous studies suggest that CHOP deteriora...

work page 2013

[20] [20]

•For each head, find explicit mentions in the text

Understand Input •Clearly understand the biomedical context from the sequence. •For each head, find explicit mentions in the text. •Check if each head is explicitly linked to other concepts or relations

work page

[21] [21]

Evaluate each head individually

Use the list of allowed relations. Evaluate each head individually. Do not overuse the relation associated_with— apply it only when appropriate

work page

[22] [22]

key regulator of inflammation,

For each head, list only plausible and supported relations. Return[]if none apply. Think concisely within⟨think⟩...⟨/think⟩. Immediately after, output JSON. 61 Few-shot Example for the Relation Matching Prompt Input: ...interleukin-1 R6, and receptor activator of nuclear factor kappa-B (RANK). Together, proteomic data suggest the targeting of several key ...

work page

[23] [23]

Organism: Plant; Fungus; Virus; Bacterium; Archaeon; Eukaryote; Vertebrate; Amphibian; Bird; Fish; Reptile; Mammal; Human

work page

[24] [24]

Anatomical Structure: Embryonic Structure; Anatomical Abnormality; Congenital Abnormality; Acquired Abnormality; Fully Formed Anatomical Structure; Body Part, Organ, or Organ Component; Tissue; Cell; Cell Component; Gene or Genome

work page

[25] [25]

Manufactured Object: Medical Device; Drug Delivery Device; Research Device; Clinical Drug

work page

[26] [26]

Substance: Chemical; Pharmacologic Substance; Antibiotic; Biomedical or Dental Material; Biologically Active Substance; Hormone; Enzyme; Vitamin; Immunologic Factor; Receptor; Indicator, Reagent, or Diagnostic Aid; Organic Chemical; Nucleic Acid, Nucleoside, or Nucleotide; Amino Acid, Peptide, or Protein; Inorganic Chemical; Element, Ion, or Isotope; Body...

work page

[27] [27]

Conceptual Entity: Idea or Concept; Body System; Body Space or Junction; Body Location or Region; Molecular Sequence; Nucleotide Sequence; Amino Acid Sequence; Carbohydrate Sequence; Geographic Area; Finding; Laboratory or Test Result; Sign or Symptom; Organism Attribute; Clinical Attribute; Intellectual Product; Occupation or Discipline; Organization; Gr...

work page

[28] [28]

Identify all entities corresponding to one of the 5 main entity types and relevant to diabetes, using the subcategory examples as guidance for classification. For each identified entity, extract the following information: - entity_name: Name of the entity, lowercase - entity_type: One of the following types: Organism, Anatomical Structure, Manufactured Ob...

work page

[29] [29]

Only use the 35 relationships that are in the predefined list

From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other according to the given text, and are medically meaningful. Only use the 35 relationships that are in the predefined list. Avoid relationships that are attached to entities that are too general, for example: patients, bodily f...

work page

[30] [30]

Use ##as the list delimiter

Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ##as the list delimiter

work page

[31] [31]

This is an example sentence supported by multiple data references[Data: <dataset name> (record ids); <dataset name> (record ids)]

When finished, output<|COMPLETE|> 66 - Constraints and Guidelines - Strict Textual Grounding: Base all extractions only on the provided medical abstract. Do not use external knowledge or make assumptions beyond what is written. - Entity Filtering: Only extract the entities whose type is present in the provided 5 Entity Type, and only extract entities that...

work page