Hallucination rates and reference accuracy of ChatGPT and bard for systematic reviews: Comparative analysis

Chelli, M · 2024 · DOI 10.2196/53164

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation

cs.DL · 2026-04-03 · conditional · novelty 7.0

Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.

citing papers explorer

Showing 2 of 2 citing papers.

BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation cs.DL · 2026-04-03 · conditional · none · ref 8
Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance cs.AI · 2026-05-11 · unverdicted · none · ref 3
The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.

Hallucination rates and reference accuracy of ChatGPT and bard for systematic reviews: Comparative analysis

fields

years

verdicts

representative citing papers

citing papers explorer