Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
arXiv:2405.15739
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.
citing papers explorer
-
BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
-
Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems
LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.