CITETRACE dataset and evaluation framework show 30.6% of citations distort sources and 27.1% use domain-inappropriate sources in search-augmented LLMs, with provider differences explaining 88-96% of quality variance.
S., Turc, I., and Reitter, D
8 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
Introduces claim-conditioned re-scoring (SIFT) and warranted supports proportion (WSP) metric, reporting accuracy recovery up to 27.6 points and WSP calibration at AUC 0.92 on FEVER, SciFact and other benchmarks.
Active Indexing with synthetic data augmentation for bidirectional fact-source binding during pretraining yields up to 30.2% higher citation precision than passive identifier appending on CitePretrainBench for Qwen models.
Strict generation directly from Task-Method-Knowledge models yields 96.5% grounded and 92.6% usable QA pairs across 23 topics, outperforming transcript-first and TMK-aware alternatives on representational grounding.
FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.
An LLM pipeline with verbatim grounding processes 4,322 Digital Fairness Act submissions to produce 15,368 topic annotations and an interactive dashboard for traceable analysis.
The authors propose an evaluation framework for LLM-generated structured search summaries and describe plans for implementing and testing it.
citing papers explorer
-
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Active Indexing with synthetic data augmentation for bidirectional fact-source binding during pretraining yields up to 30.2% higher citation precision than passive identifier appending on CitePretrainBench for Qwen models.