CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation

Qingyun Wang; Xuehang Guo; Yee Man Choi; Yi R. Fung

arxiv: 2510.17853 · v4 · submitted 2025-10-15 · 💻 cs.DL

CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation

Yee Man Choi , Xuehang Guo , Yi R. Fung , Qingyun Wang This is my paper

Pith reviewed 2026-05-18 07:03 UTC · model grok-4.3

classification 💻 cs.DL

keywords citation validationLLM faithfulnessretrieval-augmented agentscientific writingcitation attributionCiteME benchmarkcitation alignment

0 comments

The pith

CiteGuard is a retrieval-aware agent that validates LLM citations by checking alignment with the citations a human author would choose for the same text.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models used for scientific writing frequently produce citations that are inaccurate or unfaithful to the generated claims. The paper reframes citation checking as a problem of attribution alignment: does the model's selected reference match what a human writer would have cited for that exact passage. CiteGuard solves this by retrieving candidate papers and using an agent to judge the match. On the CiteME benchmark the approach lifts accuracy ten points above the previous baseline and reaches 68.1 percent, within one point of human performance at 69.2 percent. The same system also surfaces other valid citations and works on citation tasks drawn from new scientific domains.

Core claim

By treating citation evaluation as citation attribution alignment, CiteGuard supplies a retrieval-augmented agent framework that grounds validation in retrieved documents and thereby produces more faithful citation judgments than prior LLM-as-a-judge methods.

What carries the argument

CiteGuard, the retrieval-aware agent framework that fetches relevant papers and judges whether an LLM-generated citation matches human attribution choices for the same text segment.

If this is right

The framework surfaces multiple alternative citations that remain valid for a given text segment.
Accuracy holds when the task shifts to citation attribution in scientific domains different from the training data.
Performance approaches the level achieved by human evaluators on the same benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding the agent inside an LLM writing assistant could flag or replace questionable citations while the text is being generated.
The retrieval-validation pattern could be reused to check factual grounding for non-citation claims in generated scientific prose.
Extending the agent to track citations across a full paper draft might reveal consistency problems that single-segment checks miss.

Load-bearing premise

The CiteME benchmark and the retrieval corpus used by the agent are representative proxies for the citation decisions humans actually make when writing scientific papers.

What would settle it

Running CiteGuard on a fresh set of scientific papers and citations drawn from sources outside the original retrieval corpus and benchmark, then observing whether accuracy falls well below 68 percent, would indicate the gains depend on the specific test data.

read the original abstract

Large Language Models (LLMs) have emerged as powerful assistants for scientific writing. However, concerns remain about the quality and reliability of the generated text, including citation accuracy and faithfulness. While most recent work relies on methods such as LLM-as-a-Judge, the reliability of LLM-as-a-Judge alone is also in doubt. In this work, we reframe citation evaluation as a problem of citation attribution alignment, which assesses whether LLM-generated citations match those a human author would include for the same text. We propose CiteGuard, a retrieval-aware agent framework designed to provide more faithful grounding for citation validation. CiteGuard improves over the prior baseline by 10 percentage points and achieves up to 68.1% accuracy on the CiteME benchmark, approaching human performance (69.2%). It also identifies alternative valid citations and demonstrates generalization ability for cross-domain citation attribution. Our code is available at https://github.com/KathCYM/CiteGuard.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CiteGuard adds a retrieval-aware agent to align LLM citations with human choices and reports a 10-point gain to 68.1% on CiteME, but the results stand or fall on how well that benchmark captures real citation decisions.

read the letter

The main thing to know is that CiteGuard introduces a retrieval-augmented agent to align LLM-generated citations with what humans would choose, reporting a 10 percentage point improvement to 68.1% accuracy on the CiteME benchmark, nearly reaching the human baseline of 69.2%.

This approach is new in how it combines retrieval with an agent framework specifically for attribution alignment instead of relying solely on LLM judgment. The paper shows the system can identify alternative valid citations and demonstrates some ability to generalize across domains. Making the code available helps others test and extend the work.

Where it is softer is in the evaluation details. The gains rest on CiteME serving as a solid proxy for real-world human citation decisions in scientific papers. If the benchmark was built from a limited set of domains or annotation protocols, the results might not transfer as well. The abstract highlights the numbers but does not include much on experimental controls or statistical tests, so the strength of the performance claim is not fully clear without the full methods section.

This paper will interest people working on LLM reliability for academic tasks, particularly those dealing with citations in generated scientific text. A reader focused on evaluation methods or RAG applications could extract useful ideas from it. Given the concrete proposal, reported results, and open code, it has enough substance to go through peer review rather than a desk reject.

I would recommend putting it in front of referees. They can sort out the benchmark questions and see if the framework holds up.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CiteGuard, a retrieval-aware agent framework that reframes LLM citation evaluation as a citation attribution alignment task. It reports that CiteGuard achieves up to 68.1% accuracy on the CiteME benchmark (a 10 percentage point gain over the prior baseline) while approaching human performance at 69.2%, and additionally identifies alternative valid citations and demonstrates cross-domain generalization. Code is released at https://github.com/KathCYM/CiteGuard.

Significance. If the performance gains are shown to be robust, CiteGuard could improve the reliability of LLM-assisted scientific writing by providing retrieval-grounded validation that mitigates issues with standalone LLM-as-a-Judge methods. Public code release supports reproducibility and is a clear strength.

major comments (2)

[Abstract] Abstract: The central claim of a 10pp improvement to 68.1% accuracy (approaching human 69.2%) is load-bearing, yet no details are provided on baseline implementation, experimental controls, statistical significance tests, or potential confounds; this prevents full assessment of whether the gains reflect genuine attribution alignment.
[Evaluation] Evaluation (CiteME results): The representativeness of the CiteME benchmark for real human citation decisions is a key unverified assumption underlying the near-human performance claim; without discussion of benchmark construction protocol, domain coverage, annotator pool, or potential artifacts, it is unclear whether CiteGuard exploits benchmark-specific features rather than generalizing.

minor comments (1)

[Abstract] The abstract states generalization ability for cross-domain citation attribution but does not specify the domains, evaluation metrics, or quantitative results supporting this claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the presentation of our results and the discussion of the CiteME benchmark.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of a 10pp improvement to 68.1% accuracy (approaching human 69.2%) is load-bearing, yet no details are provided on baseline implementation, experimental controls, statistical significance tests, or potential confounds; this prevents full assessment of whether the gains reflect genuine attribution alignment.

Authors: We agree that the abstract would benefit from greater specificity to support the central performance claim. The baseline corresponds to a standard LLM-as-a-Judge setup using the identical underlying model without retrieval augmentation. In the revised manuscript we expand the abstract to note the baseline implementation, the retrieval-augmented components of CiteGuard, and the use of statistical significance testing (paired tests confirming the 10pp gain). We have also added a dedicated paragraph in the experimental section that enumerates controls for prompt variation and retrieval quality, together with an analysis of potential confounds. These changes allow readers to better evaluate whether the observed gains arise from the attribution-alignment framing. revision: yes
Referee: [Evaluation] Evaluation (CiteME results): The representativeness of the CiteME benchmark for real human citation decisions is a key unverified assumption underlying the near-human performance claim; without discussion of benchmark construction protocol, domain coverage, annotator pool, or potential artifacts, it is unclear whether CiteGuard exploits benchmark-specific features rather than generalizing.

Authors: We concur that explicit discussion of the benchmark's properties is necessary to substantiate the near-human performance claim. CiteME was introduced in prior work and our original manuscript references its source; however, we have now added a new subsection that summarizes the benchmark construction protocol, its primary domain coverage within computer science, the annotator pool (graduate students and researchers with domain expertise), and potential artifacts such as citation-style or recency biases. We further strengthen the generalization argument by highlighting the cross-domain experiments already present in the paper, which show consistent performance outside the benchmark's core domains. These additions clarify that CiteGuard's results are not limited to benchmark-specific features. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation uses external CiteME benchmark against human labels

full rationale

The paper introduces CiteGuard as a retrieval-augmented framework for citation attribution alignment and reports empirical accuracy on the CiteME benchmark (68.1%, +10pp over baseline, near human 69.2%). No equations, fitted parameters, or self-referential definitions appear in the abstract or described claims that would reduce any result to its own inputs by construction. The central performance claim rests on comparison to an external benchmark with human annotations rather than any internal derivation or self-citation chain, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions of retrieval systems and LLM prompting without introducing new free parameters, axioms beyond domain conventions, or invented entities.

axioms (1)

domain assumption Retrieval from a fixed corpus can surface papers that a human author would consider relevant for a given text span.
Implicit in the retrieval-augmented validation design described in the abstract.

pith-pipeline@v0.9.0 · 5699 in / 1143 out tokens · 35548 ms · 2026-05-18T07:03:17.653220+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents
cs.CL 2026-05 unverdicted novelty 7.0

A new framework parses and evaluates citations in LLM deep research reports across link validity, relevance, and factuality, finding 94%+ link success but only 39-77% factual accuracy.
BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
cs.DL 2026-04 conditional novelty 7.0

Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.