Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models

Michael F\"arber; Tim Schopf; Tobias Schreieder

arxiv: 2508.15396 · v2 · submitted 2025-08-21 · 💻 cs.CL

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models

Tobias Schreieder , Tim Schopf , Michael F\"arber This is my paper

Pith reviewed 2026-05-18 22:11 UTC · model grok-4.3

classification 💻 cs.CL

keywords evidence-based text generationlarge language modelsattributioncitationquotationtaxonomyevaluation metricssurvey

0 comments

The pith

A survey of 134 papers creates a unified taxonomy for evidence-based text generation with large language models that relies on citations, attribution, and quotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper works to reduce inconsistency in how researchers ensure large language model outputs can be traced back to supporting evidence. It reviews 134 papers focused on citations, attribution, or quotations and proposes one shared taxonomy to group the different methods. The authors also examine 300 evaluation metrics across seven dimensions to show how these approaches are currently tested and what representative techniques look like. A reader would care because clearer organization could help build more reliable systems where model claims are easier to verify. This would address growing worries about untraceable or invented content from language models.

Core claim

By systematically reviewing 134 papers, the authors establish a unified taxonomy of evidence-based text generation with large language models that focuses on methods using citations, attribution, or quotations, while surveying 300 evaluation metrics in seven key dimensions to describe distinctive characteristics, representative methods, open challenges, and promising future directions.

What carries the argument

The unified taxonomy that classifies approaches to evidence-based text generation relying on citations, attribution, or quotations.

If this is right

Researchers gain a common structure for organizing and comparing work on verifiable language model outputs.
Evaluation can draw on the surveyed set of 300 metrics across seven dimensions for more consistent testing.
Open challenges highlighted in the survey can direct future efforts toward improving traceability.
Representative methods become easier to identify and extend in new systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could support creation of shared benchmarks that measure how effectively different methods link outputs to evidence.
It might clarify links to related problems such as reducing unverified claims in language model responses.
Later surveys in nearby areas like automated fact checking could adapt the same organizational structure.
Testing the taxonomy against papers published after this review would check whether it remains comprehensive.

Load-bearing premise

The 134 selected papers sufficiently represent the field and the taxonomy organizes all relevant approaches without major gaps or selection biases.

What would settle it

A follow-up review that identifies many papers or methods on the topic that fall outside the proposed taxonomy categories would show the survey is incomplete.

Figures

Figures reproduced from arXiv: 2508.15396 by Michael F\"arber, Tim Schopf, Tobias Schreieder.

**Figure 2.** Figure 2: Number of studies per year. 106 in 2024, a 6.6-fold rise. Over 86% of studies were published after 2023, highlighting limitations of earlier surveys like Li et al. (2023) that omits a substantial portion of the recent literature. Due to a data cutoff in February, our dataset includes only 10 publications from 2025. Given the trend, we anticipate continued growth throughout 2025, underscoring the sustain… view at source ↗

**Figure 3.** Figure 3: shows the distribution of publications by contribution type. We observe that most studies propose novel approaches for evidence-based text generation. A substantial number also introduce new resources and focus on evaluation, underscoring the growing attention these aspects receive within the community. Further, this highlights the necessity of not only reviewing methodological contributions but also syst… view at source ↗

**Figure 4.** Figure 4: Taxonomy of evidence-based text generation with LLMs. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Number of studies per attribution approach. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Number of studies per citation style. or used during generation, especially in attribution settings (Muller et al., 2023). This style reflects the specific text segments that the LLM indirectly quoted or relied upon. Few methods use narrative citations, which incorporate references into the natural flow of texts (e.g., "Author et al. argue that..."), enhancing contextual clarity (Shaier et al., 2024). Th… view at source ↗

**Figure 7.** Figure 7: Evaluation metrics and frameworks for evidence-based text generation. The numbers in parentheses [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

The increasing adoption of large language models (LLMs) has raised serious concerns about their reliability and trustworthiness. As a result, a growing body of research focuses on evidence-based text generation with LLMs, aiming to link model outputs to supporting evidence to ensure traceability and verifiability. However, the field is fragmented due to inconsistent terminology, isolated evaluation practices, and a lack of unified benchmarks. To bridge this gap, we systematically analyze 134 papers, introduce a unified taxonomy of evidence-based text generation with LLMs, and investigate 300 evaluation metrics across seven key dimensions. Thereby, we focus on approaches that use citations, attribution, or quotations for evidence-based text generation. Building on this, we examine the distinctive characteristics and representative methods in the field. Finally, we highlight open challenges and outline promising directions for future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents a systematic survey of evidence-based text generation with large language models, focusing on approaches that employ citations, attribution, or quotations. It analyzes 134 papers, proposes a unified taxonomy, catalogs and investigates 300 evaluation metrics across seven dimensions, reviews representative methods and their characteristics, and identifies open challenges along with future research directions.

Significance. If the paper selection and synthesis hold, this work provides a useful consolidation of a fragmented area by offering a shared taxonomy and an extensive catalog of metrics. The scale of the review (134 papers and 300 metrics) and the explicit scoping to citation/attribution/quotation methods constitute a concrete organizational contribution that can help standardize evaluation practices and highlight gaps in trustworthy LLM generation.

major comments (1)

[§2] §2 (Survey Methodology): the description of the literature search strategy, databases, keywords, time range, and inclusion/exclusion criteria is insufficiently detailed to support the claim of a systematic analysis of 134 papers. Reproducibility and assessment of coverage bias require explicit reporting of these steps.

minor comments (2)

[Abstract] The abstract states that metrics are examined 'across seven key dimensions' but does not enumerate them; adding the list would improve immediate readability.
[Figures 2 and 4] Figure captions and axis labels in the taxonomy overview and metric summary figures should be expanded to be self-contained, as some rely on abbreviations defined only in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and recommendation for minor revision. We are encouraged by the recognition of the survey's scope and organizational contributions. We address the single major comment below and will incorporate the requested details in the revised manuscript.

read point-by-point responses

Referee: [§2] §2 (Survey Methodology): the description of the literature search strategy, databases, keywords, time range, and inclusion/exclusion criteria is insufficiently detailed to support the claim of a systematic analysis of 134 papers. Reproducibility and assessment of coverage bias require explicit reporting of these steps.

Authors: We thank the referee for highlighting this point. Section 2 currently provides a high-level overview of our paper selection process leading to the 134 papers, but we agree that greater specificity is needed for full reproducibility. In the revised manuscript, we will expand §2 with explicit details on: the databases and repositories searched (arXiv, Google Scholar, ACL Anthology, and NeurIPS/ICLR/EMNLP proceedings); the keyword combinations and Boolean queries employed (e.g., “large language model” AND (“citation” OR “attribution” OR “quotation” OR “evidence-based generation”)); the time range (primarily 2018–2024, with emphasis on post-2022 works); and the inclusion/exclusion criteria (must involve LLMs for text generation, incorporate explicit evidence mechanisms such as citations/attributions/quotations, be in English, and report empirical results; exclude non-LLM methods, purely theoretical papers without generation components, and works focused solely on retrieval without generation). These additions will enable readers to evaluate coverage and potential biases while preserving the integrity of the taxonomy and the analysis of 300 metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a literature survey paper that systematically reviews 134 external papers, proposes a taxonomy for evidence-based LLM text generation via citations/attribution/quotations, and catalogs 300 metrics across seven dimensions. The contribution consists of descriptive synthesis and organization of existing work rather than any derivations, equations, fitted parameters, or predictions. No load-bearing step reduces to the paper's own inputs by construction, self-citation, or ansatz smuggling. The scoping and methodology are explicitly stated as a review process drawing on independent sources, rendering the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions about literature review methodology and the representativeness of the selected papers rather than introducing new free parameters or entities.

axioms (1)

domain assumption The field of evidence-based text generation with LLMs is fragmented due to inconsistent terminology, isolated evaluation practices, and a lack of unified benchmarks.
This premise is stated directly in the abstract as the core motivation for conducting the survey.

pith-pipeline@v0.9.0 · 5675 in / 1166 out tokens · 39305 ms · 2026-05-18T22:11:58.112247+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we systematically analyze 134 papers, introduce a unified taxonomy of evidence-based text generation with LLMs, and investigate 300 evaluation metrics across seven key dimensions
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

focus on approaches that use citations, attribution, or quotations for evidence-based text generation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation
cs.CL 2026-04 unverdicted novelty 5.0

Enforcing sentence-level citations degrades LLM attribution quality by 16-276% versus paragraph-level, with larger models penalized more due to disrupted semantic synthesis.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · cited by 1 Pith paper

[1]

In Pro- ceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing , pages 8113–8140, Miami, Florida, USA

Attribute or abstain: Large language models as long document assistants. In Pro- ceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing , pages 8113–8140, Miami, Florida, USA. Asso- ciation for Computational Linguistics. Courtni Byun, Piper Vasicek, and Kevin Seppi

work page 2024
[2]

In Proceedings of the Third Workshop on Bridg- ing Human–Computer Interaction and Natu- ral Language Processing, pages 28–39, Mexico City, Mexico

This reference does not exist: An explo- ration of LLM citation accuracy and relevance. In Proceedings of the Third Workshop on Bridg- ing Human–Computer Interaction and Natu- ral Language Processing, pages 28–39, Mexico City, Mexico. Association for Computational Linguistics. Anthony Chen, Panupong Pasupat, Sameer Singh, Hongrae Lee, and Kelvin Guu. 2023...

work page 2023
[3]

In Findings of the Association for Computational Linguistics: NAACL 2025 , pages 1308–1330, Albuquerque, New Mexico

CORAL: Benchmarking multi-turn con- versational retrieval-augmented generation. In Findings of the Association for Computational Linguistics: NAACL 2025 , pages 1308–1330, Albuquerque, New Mexico. Association for Computational Linguistics. Zheng Chu, Jingchang Chen, Zhongjie Wang, Guo Tang, Qianglong Chen, Ming Liu, and Bing Qin. 2025. Towards faithful mu...

work page 2025
[4]

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14169–14187, Bangkok, Thailand

EWEK-QA : Enhanced web and efficient knowledge graph retrieval for citation-based question answering systems. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14169–14187, Bangkok, Thailand. Association for Computational Linguistics. Haolin Deng, Chang Wang, Li Xin, Dezhang Yuan, Jun...

work page 2024
[5]

In Interna- tional Conference on Learning Representations

Wizard of wikipedia: Knowledge- powered conversational agents. In Interna- tional Conference on Learning Representations. Hyo Jin Do, Rachel Ostrand, Justin D. Weisz, Casey Dugan, Prasanna Sattigeri, Dennis Wei, Keerthiram Murugesan, and Werner Geyer

work page
[6]

Nouha Dziri, Hannah Rashkin, Tal Linzen, and David Reitter

Facilitating human-llm collaboration through factuality scores and source attribu- tions. Nouha Dziri, Hannah Rashkin, Tal Linzen, and David Reitter. 2022. Evaluating attribution in dialogue systems: The BEGIN benchmark. Transactions of the Association for Computa- tional Linguistics, 10:1066–1083. Shahul Es, Jithin James, Luis Espinosa Anke, and Steven S...

work page 2022
[7]

In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 3558–3567, Florence, Italy

ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 3558–3567, Florence, Italy. Association for Computational Linguistics. Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A survey on rag meeting llms: Towards retri...

work page 2024
[8]

In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024) , pages 118–144, Mexico City, Mexico

HGOT: Hierarchical graph of thoughts for retrieval-augmented in-context learning in factuality evaluation. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024) , pages 118–144, Mexico City, Mexico. Association for Compu- tational Linguistics. Michael Färber and Adam Jatowt. 2020. Cita- tion recommendation: approach...

work page arXiv 2024
[9]

In Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 17061–17090, Vienna, Austria

RAG-RewardBench: Benchmarking re- ward models in retrieval augmented generation for preference alignment. In Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 17061–17090, Vienna, Austria. As- sociation for Computational Linguistics. Kristiina Jokinen. 2024. The need for grounding in LLM-based dialogue systems. In Proceed- ings ...

work page 2025
[10]

ACM Trans

From matching to generation: A sur- vey on generative information retrieval. ACM Trans. Inf. Syst., 43(3). Xinze Li, Yixin Cao, Liangming Pan, Yubo Ma, and Aixin Sun. 2024d. Towards verifiable gen- eration: A benchmark for knowledge-aware language model attribution. In Findings of the Association for Computational Linguistics: ACL 2024, pages 493–516, Ban...

work page 2024
[11]

In Proceedings of the 2021 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, pages 4262–4273, Online

On learning text style transfer with di- rect rewards. In Proceedings of the 2021 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, pages 4262–4273, Online. Association for Computational Linguis- tics. Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. S2ORC:...

work page 2021
[12]

ExpertQA: Expert-curated questions and attributed answers. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Papers) , pages 3025–3045, Mexico City, Mexico. Association for Computational Linguistics. Alex Mallen, Akari Asai, Victor Zhong, Ra- ...

work page 2024
[13]

In Proceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 12076–12100, Singa- pore

FActScore: Fine-grained atomic eval- uation of factual precision in long form text generation. In Proceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 12076–12100, Singa- pore. Association for Computational Linguis- tics. Mazda Moayeri, Elham Tabassi, and Soheil Feizi

work page 2023
[14]

In Proceed- ings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 1211–1228, New York, NY , USA

Worldbench: Quantifying geographic disparities in llm factual recall. In Proceed- ings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 1211–1228, New York, NY , USA. Associ- ation for Computing Machinery. Benjamin Muller, John Wieting, Jonathan Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Soares, Roee Aharoni, J...

work page 2024
[15]

Gpt-4 technical report. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Chris- tiano, Jan Leike, and Ryan Lowe. 2022. Train- ing language models to foll...

work page 2022
[16]

Towards improved multi-source attribu- tion for long-form answer generation. In Pro- ceedings of the 2024 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Tech- nologies (Volume 1: Long Papers), pages 3906– 3919, Mexico City, Mexico. Association for Computational Linguistics. Vinzent Penzkofer and...

work page 2024
[17]

Association for Computational Linguistics

Are large language model temporally grounded? In Proceedings of the 2024 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies (Volume 1: Long Papers), pages 7064–7083, Mexico City, Mex- ico. Association for Computational Linguistics. Pritika Ramu, Koustava Goswami, Apoorv Sax- ena, and ...

work page 2024
[18]

In Findings of the Association for Computational Linguistics: EMNLP 2024 , pages 11709–11724, Miami, Florida, USA

A comprehensive survey of hallucination in large language, image, video and audio foun- dation models. In Findings of the Association for Computational Linguistics: EMNLP 2024 , pages 11709–11724, Miami, Florida, USA. As- sociation for Computational Linguistics. Phillip Schneider, Tim Schopf, Juraj Vladika, Mikhail Galkin, Elena Simperl, and Florian Matth...

work page 2024
[19]

In Proceedings of the 2024 Conference on Empir- ical Methods in Natural Language Processing , pages 17226–17239, Miami, Florida, USA

Adaptive question answering: Enhanc- ing language model proficiency for addressing knowledge conflicts with source citations. In Proceedings of the 2024 Conference on Empir- ical Methods in Natural Language Processing , pages 17226–17239, Miami, Florida, USA. As- sociation for Computational Linguistics. Jiajun Shen, Tong Zhou, Yubo Chen, and Kang Liu. 202...

work page 2024
[20]

Effective large language model adapta- tion for improved grounding and citation gener- ation. In Proceedings of the 2024 Conference of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies (Volume 1: Long Pa- pers), pages 6237–6251, Mexico City, Mexico. Association for Computational Linguistics. Wenhao ...

work page 2024
[21]

ACM Comput

A survey of knowledge-enhanced text generation. ACM Comput. Surv., 54(11s). Jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng, and Juanzi Li. 2025a. LongCite: Enabling LLMs to generate fine-grained citations in long-context QA. In Findings of the Association for Computational Linguistics: ACL ...

work page 2025
[22]

Chatgpt hallucinates when attributing answers. In Proceedings of the Annual Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , SIGIR-AP ’23, page 46–51, New York, NY , USA. Association for Computing Machinery. A Supplementary Material The appendix provides supplementary material supportin...

work page 2022

[1] [1]

In Pro- ceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing , pages 8113–8140, Miami, Florida, USA

Attribute or abstain: Large language models as long document assistants. In Pro- ceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing , pages 8113–8140, Miami, Florida, USA. Asso- ciation for Computational Linguistics. Courtni Byun, Piper Vasicek, and Kevin Seppi

work page 2024

[2] [2]

In Proceedings of the Third Workshop on Bridg- ing Human–Computer Interaction and Natu- ral Language Processing, pages 28–39, Mexico City, Mexico

This reference does not exist: An explo- ration of LLM citation accuracy and relevance. In Proceedings of the Third Workshop on Bridg- ing Human–Computer Interaction and Natu- ral Language Processing, pages 28–39, Mexico City, Mexico. Association for Computational Linguistics. Anthony Chen, Panupong Pasupat, Sameer Singh, Hongrae Lee, and Kelvin Guu. 2023...

work page 2023

[3] [3]

In Findings of the Association for Computational Linguistics: NAACL 2025 , pages 1308–1330, Albuquerque, New Mexico

CORAL: Benchmarking multi-turn con- versational retrieval-augmented generation. In Findings of the Association for Computational Linguistics: NAACL 2025 , pages 1308–1330, Albuquerque, New Mexico. Association for Computational Linguistics. Zheng Chu, Jingchang Chen, Zhongjie Wang, Guo Tang, Qianglong Chen, Ming Liu, and Bing Qin. 2025. Towards faithful mu...

work page 2025

[4] [4]

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14169–14187, Bangkok, Thailand

EWEK-QA : Enhanced web and efficient knowledge graph retrieval for citation-based question answering systems. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14169–14187, Bangkok, Thailand. Association for Computational Linguistics. Haolin Deng, Chang Wang, Li Xin, Dezhang Yuan, Jun...

work page 2024

[5] [5]

In Interna- tional Conference on Learning Representations

Wizard of wikipedia: Knowledge- powered conversational agents. In Interna- tional Conference on Learning Representations. Hyo Jin Do, Rachel Ostrand, Justin D. Weisz, Casey Dugan, Prasanna Sattigeri, Dennis Wei, Keerthiram Murugesan, and Werner Geyer

work page

[6] [6]

Nouha Dziri, Hannah Rashkin, Tal Linzen, and David Reitter

Facilitating human-llm collaboration through factuality scores and source attribu- tions. Nouha Dziri, Hannah Rashkin, Tal Linzen, and David Reitter. 2022. Evaluating attribution in dialogue systems: The BEGIN benchmark. Transactions of the Association for Computa- tional Linguistics, 10:1066–1083. Shahul Es, Jithin James, Luis Espinosa Anke, and Steven S...

work page 2022

[7] [7]

In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 3558–3567, Florence, Italy

ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 3558–3567, Florence, Italy. Association for Computational Linguistics. Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A survey on rag meeting llms: Towards retri...

work page 2024

[8] [8]

In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024) , pages 118–144, Mexico City, Mexico

HGOT: Hierarchical graph of thoughts for retrieval-augmented in-context learning in factuality evaluation. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024) , pages 118–144, Mexico City, Mexico. Association for Compu- tational Linguistics. Michael Färber and Adam Jatowt. 2020. Cita- tion recommendation: approach...

work page arXiv 2024

[9] [9]

In Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 17061–17090, Vienna, Austria

RAG-RewardBench: Benchmarking re- ward models in retrieval augmented generation for preference alignment. In Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 17061–17090, Vienna, Austria. As- sociation for Computational Linguistics. Kristiina Jokinen. 2024. The need for grounding in LLM-based dialogue systems. In Proceed- ings ...

work page 2025

[10] [10]

ACM Trans

From matching to generation: A sur- vey on generative information retrieval. ACM Trans. Inf. Syst., 43(3). Xinze Li, Yixin Cao, Liangming Pan, Yubo Ma, and Aixin Sun. 2024d. Towards verifiable gen- eration: A benchmark for knowledge-aware language model attribution. In Findings of the Association for Computational Linguistics: ACL 2024, pages 493–516, Ban...

work page 2024

[11] [11]

In Proceedings of the 2021 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, pages 4262–4273, Online

On learning text style transfer with di- rect rewards. In Proceedings of the 2021 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, pages 4262–4273, Online. Association for Computational Linguis- tics. Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. S2ORC:...

work page 2021

[12] [12]

ExpertQA: Expert-curated questions and attributed answers. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Papers) , pages 3025–3045, Mexico City, Mexico. Association for Computational Linguistics. Alex Mallen, Akari Asai, Victor Zhong, Ra- ...

work page 2024

[13] [13]

In Proceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 12076–12100, Singa- pore

FActScore: Fine-grained atomic eval- uation of factual precision in long form text generation. In Proceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 12076–12100, Singa- pore. Association for Computational Linguis- tics. Mazda Moayeri, Elham Tabassi, and Soheil Feizi

work page 2023

[14] [14]

In Proceed- ings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 1211–1228, New York, NY , USA

Worldbench: Quantifying geographic disparities in llm factual recall. In Proceed- ings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 1211–1228, New York, NY , USA. Associ- ation for Computing Machinery. Benjamin Muller, John Wieting, Jonathan Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Soares, Roee Aharoni, J...

work page 2024

[15] [15]

Gpt-4 technical report. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Chris- tiano, Jan Leike, and Ryan Lowe. 2022. Train- ing language models to foll...

work page 2022

[16] [16]

Towards improved multi-source attribu- tion for long-form answer generation. In Pro- ceedings of the 2024 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Tech- nologies (Volume 1: Long Papers), pages 3906– 3919, Mexico City, Mexico. Association for Computational Linguistics. Vinzent Penzkofer and...

work page 2024

[17] [17]

Association for Computational Linguistics

Are large language model temporally grounded? In Proceedings of the 2024 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies (Volume 1: Long Papers), pages 7064–7083, Mexico City, Mex- ico. Association for Computational Linguistics. Pritika Ramu, Koustava Goswami, Apoorv Sax- ena, and ...

work page 2024

[18] [18]

In Findings of the Association for Computational Linguistics: EMNLP 2024 , pages 11709–11724, Miami, Florida, USA

A comprehensive survey of hallucination in large language, image, video and audio foun- dation models. In Findings of the Association for Computational Linguistics: EMNLP 2024 , pages 11709–11724, Miami, Florida, USA. As- sociation for Computational Linguistics. Phillip Schneider, Tim Schopf, Juraj Vladika, Mikhail Galkin, Elena Simperl, and Florian Matth...

work page 2024

[19] [19]

In Proceedings of the 2024 Conference on Empir- ical Methods in Natural Language Processing , pages 17226–17239, Miami, Florida, USA

Adaptive question answering: Enhanc- ing language model proficiency for addressing knowledge conflicts with source citations. In Proceedings of the 2024 Conference on Empir- ical Methods in Natural Language Processing , pages 17226–17239, Miami, Florida, USA. As- sociation for Computational Linguistics. Jiajun Shen, Tong Zhou, Yubo Chen, and Kang Liu. 202...

work page 2024

[20] [20]

Effective large language model adapta- tion for improved grounding and citation gener- ation. In Proceedings of the 2024 Conference of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies (Volume 1: Long Pa- pers), pages 6237–6251, Mexico City, Mexico. Association for Computational Linguistics. Wenhao ...

work page 2024

[21] [21]

ACM Comput

A survey of knowledge-enhanced text generation. ACM Comput. Surv., 54(11s). Jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng, and Juanzi Li. 2025a. LongCite: Enabling LLMs to generate fine-grained citations in long-context QA. In Findings of the Association for Computational Linguistics: ACL ...

work page 2025

[22] [22]

Chatgpt hallucinates when attributing answers. In Proceedings of the Annual Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , SIGIR-AP ’23, page 46–51, New York, NY , USA. Association for Computing Machinery. A Supplementary Material The appendix provides supplementary material supportin...

work page 2022