Enabling Large Language Models to Generate Text with Citations

Gao, T · 2023 · DOI 10.18653/v1/2023.emnlp-main.398

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Evaluating Commercial AI Chatbots as News Intermediaries

cs.CL · 2026-05-21 · conditional · novelty 7.0

Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

cs.AI · 2025-06-21 · conditional · novelty 6.0

Active Indexing with synthetic data augmentation for bidirectional fact-source binding during pretraining yields up to 30.2% higher citation precision than passive identifier appending on CitePretrainBench for Qwen models.

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

RUBEN discovers minimal rule sets explaining RAG LLM outputs via novel pruning and applies them to evaluate LLM safety against adversarial injections.

VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering

cs.IR · 2026-01-16 · unverdicted · novelty 5.0

VerifAI is an open-source biomedical QA system that decomposes generated answers into claims and verifies them with a fine-tuned NLI engine to reduce hallucinations and provide traceable citations.

From Binary Groundedness to Support Relations: Towards a Reader-Centred Taxonomy for Comprehension of AI Output

cs.HC · 2026-04-09 · unverdicted · novelty 4.0

Binary groundedness judgments in AI evaluations should be replaced by a reader-centered taxonomy of support relations that distinguishes syntactic and interpretive moves between generated statements and source documents.

citing papers explorer

Showing 6 of 6 citing papers.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 117
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
Evaluating Commercial AI Chatbots as News Intermediaries cs.CL · 2026-05-21 · conditional · none · ref 14
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models cs.AI · 2025-06-21 · conditional · none · ref 16
Active Indexing with synthetic data augmentation for bidirectional fact-source binding during pretraining yields up to 30.2% higher citation precision than passive identifier appending on CitePretrainBench for Qwen models.
RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems cs.CL · 2026-05-11 · unverdicted · none · ref 41
RUBEN discovers minimal rule sets explaining RAG LLM outputs via novel pruning and applies them to evaluate LLM safety against adversarial injections.
VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering cs.IR · 2026-01-16 · unverdicted · none · ref 15
VerifAI is an open-source biomedical QA system that decomposes generated answers into claims and verifies them with a fine-tuned NLI engine to reduce hallucinations and provide traceable citations.
From Binary Groundedness to Support Relations: Towards a Reader-Centred Taxonomy for Comprehension of AI Output cs.HC · 2026-04-09 · unverdicted · none · ref 11
Binary groundedness judgments in AI evaluations should be replaced by a reader-centered taxonomy of support relations that distinguishes syntactic and interpretive moves between generated statements and source documents.

Enabling Large Language Models to Generate Text with Citations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer