GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

Peter Fernandes; Ria Kanjilal

arxiv: 2605.20815 · v1 · pith:VFNZ4VDPnew · submitted 2026-05-20 · 💻 cs.CL · cs.AI· cs.IR· cs.LG

GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

Peter Fernandes , Ria Kanjilal This is my paper

Pith reviewed 2026-05-21 04:59 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IRcs.LG

keywords GraphRAGlocal LLMsEHR schema retrievalconsumer hardwarehealthcare dataknowledge graphretrieval augmented generationhallucination

0 comments

The pith

GraphRAG runs on consumer hardware with open-source LLMs of 7B parameters or larger for EHR schema retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether GraphRAG, which builds a knowledge graph to support structured reasoning over documents, can operate locally on everyday computers for pulling details from Electronic Health Record schemas. It deploys four open-source models on a single consumer GPU with 8 GB VRAM and compares indexing speed, graph quality, query response time, answer accuracy, and hallucination rates between global summarization and local retrieval modes. Results indicate that models at or above roughly 7 billion parameters complete the full pipeline with valid outputs while local retrieval yields faster and more grounded answers than global mode. The findings address privacy and cost concerns in healthcare by showing a path to avoid cloud services for regulated data.

Core claim

GraphRAG is feasible on consumer hardware with open-source LLMs of approximately 7B parameters or larger for EHR schema retrieval. Local retrieval outperforms global summarization in latency and factual grounding with reduced hallucination. Models below this threshold fail to produce valid structured outputs. Indexing and answer quality decouple across models, and local retrieval consistently outperforms global summarization.

What carries the argument

The Microsoft GraphRAG pipeline run locally via Ollama on consumer GPU hardware, with knowledge graph construction from real-world EHR schema documentation and comparison of global versus local retrieval modes for query answering.

If this is right

Local retrieval mode reduces latency and hallucinations relative to global summarization.
Answer quality and indexing richness vary independently by model choice.
Models smaller than approximately 7B parameters cannot reliably finish the pipeline with valid structured outputs.
The approach supports privacy-compliant GraphRAG use in regulated healthcare settings without cloud dependence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local GraphRAG setup could be applied to schema retrieval in other regulated domains such as legal or financial documents.
Selecting retrieval mode by query complexity might yield further gains in speed and accuracy.
The decoupling of graph construction quality from final answer quality points to separate model optimizations for indexing versus querying stages.

Load-bearing premise

The selected real-world EHR schema documentation represents typical complex regulated healthcare data and manual scoring of answer quality gives a reliable unbiased performance measure.

What would settle it

A 3.8B-parameter model completing the full GraphRAG pipeline on the same EHR documentation and generating valid structured outputs for every test query would disprove the claimed capacity threshold around 7B parameters.

Figures

Figures reproduced from arXiv: 2605.20815 by Peter Fernandes, Ria Kanjilal.

read the original abstract

Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency, answer quality, and hallucination under both global and local retrieval modes. Our results reveal substantial differences: Llama 3.1 produces the richest knowledge graph (1,172 entities), Qwen 2.5 achieves the best answer quality (3.3/5), Phi-4-mini fails to complete the pipeline due to structured-output errors, and Mistral exhibits degenerate repetition behavior. We further show that GraphRAG exhibits a practical capacity threshold, where models below approximately 7B parameters fail to reliably produce valid structured outputs and cannot complete the pipeline. In addition, indexing and answer quality are decoupled across models, and local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination. These findings demonstrate that GraphRAG is feasible on consumer hardware while highlighting the importance of model selection and retrieval design for robust deployment in regulated settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphRAG runs on 7B-class local models for EHR schema work on consumer hardware, with local retrieval showing some edge over global, but the quality measurements rest on thin documentation.

read the letter

The main thing to know is that this paper tests the Microsoft GraphRAG pipeline on real EHR schema documents using four open-source models run locally via Ollama on an 8 GB GPU. Models at or above roughly 7B parameters completed the pipeline and produced knowledge graphs, while the 3.8B model failed on structured outputs. Local retrieval mode came out ahead on latency and on the manual answer quality scores, with less apparent hallucination than the global summarization path. Indexing success and final answer quality also did not track together across the models tested.

Referee Report

2 major / 2 minor

Summary. The paper benchmarks GraphRAG implemented with four local open-source LLMs (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B, Phi-4-mini 3.8B) via Ollama on a single consumer GPU (8 GB VRAM) for EHR schema retrieval on real-world healthcare documentation. It measures indexing efficiency, knowledge-graph construction (e.g., 1,172 entities for Llama 3.1), query latency, answer quality (peak 3.3/5 for Qwen 2.5), and hallucination under global summarization versus local retrieval modes, concluding that models at or above ~7B parameters succeed, local retrieval is superior in latency and factual grounding, and smaller models fail on structured outputs.

Significance. If the empirical results prove robust, the work offers timely practical guidance for privacy-preserving, resource-constrained deployments of structured RAG in regulated domains such as healthcare. It supplies direct measurements of pipeline behavior on consumer hardware and highlights a model-capacity threshold together with the advantage of local retrieval, which could inform model selection and retrieval design choices without requiring cloud services.

major comments (2)

[Results and Evaluation] Evaluation / Results: The reported answer quality of 3.3/5 and claims of reduced hallucination with local retrieval rest on manual scoring, yet the manuscript supplies neither the scoring rubric, the number of test queries, inter-annotator agreement, nor quantitative hallucination rates or statistical comparisons between modes. These omissions make it impossible to verify the asserted superiority in factual grounding.
[Abstract and Model Evaluation] Abstract and § on model comparison: The capacity threshold claim (models below ~7B fail to produce valid structured outputs) is illustrated by Phi-4-mini’s failure, but without details on the exact output schema, prompt templates, or parsing logic used, the threshold cannot be assessed for generality or reproduced.

minor comments (2)

A summary table aggregating entity counts, latency, quality scores, and failure modes across all four models and both retrieval modes would improve readability and allow direct comparison.
The manuscript would benefit from explicit citation to the original Microsoft GraphRAG paper and from a brief description of the EHR schema’s scale (number of tables, relationships, regulatory constraints).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of reproducibility and verifiability. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Results and Evaluation] Evaluation / Results: The reported answer quality of 3.3/5 and claims of reduced hallucination with local retrieval rest on manual scoring, yet the manuscript supplies neither the scoring rubric, the number of test queries, inter-annotator agreement, nor quantitative hallucination rates or statistical comparisons between modes. These omissions make it impossible to verify the asserted superiority in factual grounding.

Authors: We agree that the current description of the manual evaluation lacks sufficient detail for independent verification. We will revise the Results and Evaluation sections to include the complete scoring rubric, the exact number of test queries used, clarification on annotator procedures (including any inter-annotator agreement metrics or justification for single-annotator design), quantitative hallucination rates, and statistical comparisons (e.g., significance tests) between global summarization and local retrieval modes. These additions will directly support the claims regarding factual grounding and reduced hallucination. revision: yes
Referee: [Abstract and Model Evaluation] Abstract and § on model comparison: The capacity threshold claim (models below ~7B fail to produce valid structured outputs) is illustrated by Phi-4-mini’s failure, but without details on the exact output schema, prompt templates, or parsing logic used, the threshold cannot be assessed for generality or reproduced.

Authors: We acknowledge that greater transparency on implementation details is needed to allow readers to evaluate the generality of the observed capacity threshold. In the revised manuscript we will add the precise output schema required from the models, the full prompt templates for entity/relationship extraction and other pipeline stages, and the parsing/validation logic used to detect and handle invalid structured outputs. This will improve reproducibility and permit assessment of whether the ~7B threshold holds under alternative schemas or prompts. revision: yes

Circularity Check

0 steps flagged

No significant circularity: pure empirical benchmarking with direct measurements

full rationale

The paper conducts a direct experimental evaluation of the Microsoft GraphRAG pipeline on four open-source LLMs deployed locally via Ollama, reporting observed quantities such as entity counts (e.g., 1,172 for Llama 3.1), answer quality scores (3.3/5 for Qwen 2.5), latency differences between local and global retrieval modes, and failure thresholds for models below ~7B parameters. No mathematical derivations, equations, fitted parameters, or predictions are defined in terms of the study's own outputs. Claims rest on measured pipeline runs rather than self-referential reductions, self-citation chains, or imported uniqueness results. The work is self-contained against external benchmarks of model behavior and retrieval performance.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions about data representativeness and evaluation validity rather than introducing new free parameters or postulated entities.

axioms (1)

domain assumption The chosen EHR schema documentation is representative of real-world complex healthcare data.
Invoked when applying the pipeline to real-world EHR schema documentation for the benchmark.

pith-pipeline@v0.9.0 · 5872 in / 1318 out tokens · 44136 ms · 2026-05-21T04:59:38.710071+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models... local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 9 internal anchors

[1]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[2]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

From louvain to leiden: guaranteeing well- connected communities.Scientific reports, 9(1):5233, 2019

Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well- connected communities.Scientific reports, 9(1):5233, 2019

work page 2019
[4]

Enterprise information integration: successes, challenges and controversies

Alon Y Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arnon Rosen- thal, and Vishal Sikka. Enterprise information integration: successes, challenges and controversies. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 778–787, 2005

work page 2005
[5]

The end of an architectural era: it’s time for a complete rewrite

Michael Stonebraker, Samuel Madden, Daniel J Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. The end of an architectural era: it’s time for a complete rewrite. InMaking Databases Work: the Pragmatic Wisdom of Michael Stonebraker, pages 463–489. 2018

work page 2018
[6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, et al. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2(1):32, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Observational health data sciences and informatics (ohdsi): opportunities for observational researchers.Studies in health technology and informatics, 216:574, 2015

George Hripcsak, Jon D Duke, Nigam H Shah, Christian G Reich, V ojtech Huser, Martijn J Schuemie, Marc A Suchard, Rae Woong Park, Ian Chi Kei Wong, Peter R Rijnbeek, et al. Observational health data sciences and informatics (ohdsi): opportunities for observational researchers.Studies in health technology and informatics, 216:574, 2015

work page 2015
[8]

Large language models encode clinical knowledge

Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023

work page 2023
[9]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 8

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Be- ichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2):1–124, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Privacy in the age of medical big data.Nature medicine, 25(1): 37–43, 2019

W Nicholson Price and I Glenn Cohen. Privacy in the age of medical big data.Nature medicine, 25(1): 37–43, 2019

work page 2019
[12]

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[13]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

6g non-terrestrial networks enabled low-altitude economy: Opportunities and challenges

Yihang Jiang, Xiaoyang Li, Guangxu Zhu, Hang Li, Jing Deng, Kaifeng Han, Chao Shen, Qingjiang Shi, and Rui Zhang. 6g non-terrestrial networks enabled low-altitude economy: Opportunities and challenges. arXiv preprint arXiv:2311.09047, 2023

work page arXiv 2023
[16]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Mollangbench: A comprehensive benchmark for language-prompted molecular structure recognition, editing, and generation.arXiv preprint arXiv:2505.15054, 2025

Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, and Feng Luo. Mollangbench: A comprehensive benchmark for language-prompted molecular structure recognition, editing, and generation.arXiv preprint arXiv:2505.15054, 2025

work page arXiv 2025
[18]

Haoyu Han, Li Ma, Yu Wang, Harry Shomer, Yongjia Lei, Zhisheng Qi, Kai Guo, Zhigang Hua, Bo Long, Hui Liu, et al. Rag vs. graphrag: A systematic evaluation and key insights.arXiv preprint arXiv:2502.11371, 2025

work page arXiv 2025
[19]

Unifying large language models and knowledge graphs: A roadmap.IEEE Transactions on Knowledge and Data Engineering, 36 (7):3580–3599, 2024

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap.IEEE Transactions on Knowledge and Data Engineering, 36 (7):3580–3599, 2024

work page 2024
[20]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Zach Nussbaum, John X Morris, Brandon Duderstadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024

work page internal anchor Pith review arXiv 2024
[21]

Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023. A Pipeline Configuration Details The GraphRAG configuration used across all experiments sets chunk size to 512 tokens with 256- token ...

work page 2023

[1] [1]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020

[2] [2]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

From louvain to leiden: guaranteeing well- connected communities.Scientific reports, 9(1):5233, 2019

Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well- connected communities.Scientific reports, 9(1):5233, 2019

work page 2019

[4] [4]

Enterprise information integration: successes, challenges and controversies

Alon Y Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arnon Rosen- thal, and Vishal Sikka. Enterprise information integration: successes, challenges and controversies. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 778–787, 2005

work page 2005

[5] [5]

The end of an architectural era: it’s time for a complete rewrite

Michael Stonebraker, Samuel Madden, Daniel J Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. The end of an architectural era: it’s time for a complete rewrite. InMaking Databases Work: the Pragmatic Wisdom of Michael Stonebraker, pages 463–489. 2018

work page 2018

[6] [6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, et al. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2(1):32, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Observational health data sciences and informatics (ohdsi): opportunities for observational researchers.Studies in health technology and informatics, 216:574, 2015

George Hripcsak, Jon D Duke, Nigam H Shah, Christian G Reich, V ojtech Huser, Martijn J Schuemie, Marc A Suchard, Rae Woong Park, Ian Chi Kei Wong, Peter R Rijnbeek, et al. Observational health data sciences and informatics (ohdsi): opportunities for observational researchers.Studies in health technology and informatics, 216:574, 2015

work page 2015

[8] [8]

Large language models encode clinical knowledge

Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023

work page 2023

[9] [9]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 8

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Be- ichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2):1–124, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Privacy in the age of medical big data.Nature medicine, 25(1): 37–43, 2019

W Nicholson Price and I Glenn Cohen. Privacy in the age of medical big data.Nature medicine, 25(1): 37–43, 2019

work page 2019

[12] [12]

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[13] [13]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

6g non-terrestrial networks enabled low-altitude economy: Opportunities and challenges

Yihang Jiang, Xiaoyang Li, Guangxu Zhu, Hang Li, Jing Deng, Kaifeng Han, Chao Shen, Qingjiang Shi, and Rui Zhang. 6g non-terrestrial networks enabled low-altitude economy: Opportunities and challenges. arXiv preprint arXiv:2311.09047, 2023

work page arXiv 2023

[16] [16]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Mollangbench: A comprehensive benchmark for language-prompted molecular structure recognition, editing, and generation.arXiv preprint arXiv:2505.15054, 2025

Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, and Feng Luo. Mollangbench: A comprehensive benchmark for language-prompted molecular structure recognition, editing, and generation.arXiv preprint arXiv:2505.15054, 2025

work page arXiv 2025

[18] [18]

Haoyu Han, Li Ma, Yu Wang, Harry Shomer, Yongjia Lei, Zhisheng Qi, Kai Guo, Zhigang Hua, Bo Long, Hui Liu, et al. Rag vs. graphrag: A systematic evaluation and key insights.arXiv preprint arXiv:2502.11371, 2025

work page arXiv 2025

[19] [19]

Unifying large language models and knowledge graphs: A roadmap.IEEE Transactions on Knowledge and Data Engineering, 36 (7):3580–3599, 2024

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap.IEEE Transactions on Knowledge and Data Engineering, 36 (7):3580–3599, 2024

work page 2024

[20] [20]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Zach Nussbaum, John X Morris, Brandon Duderstadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024

work page internal anchor Pith review arXiv 2024

[21] [21]

Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023. A Pipeline Configuration Details The GraphRAG configuration used across all experiments sets chunk size to 512 tokens with 256- token ...

work page 2023