Survey of hallucination in natural language generation

· 2023 · arXiv 2202.03629

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Geographic Blind Spots in AI Control Monitors: A Cross-National Audit of Claude Opus 4.6

cs.CY · 2026-03-20 · unverdicted · novelty 7.0

Claude Opus 4.6 fabricates more answers on Global North AI contexts than Global South ones, creating an exploitable vulnerability in AI control monitors.

Evaluating Object Hallucination in Large Vision-Language Models

cs.CV · 2023-05-17 · accept · novelty 7.0

Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

citing papers explorer

Showing 3 of 3 citing papers.

Geographic Blind Spots in AI Control Monitors: A Cross-National Audit of Claude Opus 4.6 cs.CY · 2026-03-20 · unverdicted · none · ref 20
Claude Opus 4.6 fabricates more answers on Global North AI contexts than Global South ones, creating an exploitable vulnerability in AI control monitors.
Evaluating Object Hallucination in Large Vision-Language Models cs.CV · 2023-05-17 · accept · none · ref 19
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 104
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Survey of hallucination in natural language generation

fields

years

verdicts

representative citing papers

citing papers explorer