pith. machine review for the scientific record. sign in

arxiv: 2309.05922 · v1 · submitted 2023-09-12 · 💻 cs.AI · cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

A Survey of Hallucination in Large Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 15:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.IR
keywords hallucinationlarge foundation modelssurveyevaluation criteriamitigation strategiesAI reliabilityfabricated content
0
0 comments X

The pith

Hallucination in large foundation models falls into specific types that support targeted evaluation criteria and mitigation strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys hallucination in large foundation models, defined as generating content that deviates from factual reality or fabricates information. It classifies hallucination phenomena unique to these models and sets up evaluation criteria to measure their extent. The survey reviews existing mitigation approaches and outlines future research directions. A sympathetic reader would care because reliable AI outputs depend on understanding and reducing such deviations. The work positions the classification as a foundation for better handling of the issue in practice.

Core claim

The paper establishes that hallucination in large foundation models can be systematically classified into distinct phenomena, with corresponding evaluation criteria and mitigation strategies reviewed across the literature, providing a structured overview that clarifies challenges and points toward future solutions.

What carries the argument

The classification of hallucination types specific to large foundation models, which organizes phenomena to enable evaluation and mitigation.

If this is right

  • Evaluation benchmarks can be built around the identified hallucination types for more precise measurement.
  • Mitigation techniques can be developed or refined to target specific categories of hallucination.
  • Future model training and alignment efforts gain structured guidance from the reviewed strategies.
  • Deployment decisions for large foundation models can incorporate type-based risk assessments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The classification framework may extend to new model architectures by testing whether emerging behaviors fit existing categories.
  • Connections to broader AI safety could arise if hallucination types are linked to specific failure modes in real-world use.
  • Empirical studies could validate the survey by applying the criteria to outputs from multiple foundation models and measuring coverage.

Load-bearing premise

The reviewed literature is representative and the proposed classification captures the full range of hallucination phenomena in large foundation models.

What would settle it

Discovery of a consistent hallucination behavior in a deployed large foundation model that cannot be placed into any of the survey's defined categories would undermine the classification's completeness.

read the original abstract

Hallucination in a foundation model (FM) refers to the generation of content that strays from factual reality or includes fabricated information. This survey paper provides an extensive overview of recent efforts that aim to identify, elucidate, and tackle the problem of hallucination, with a particular focus on ``Large'' Foundation Models (LFMs). The paper classifies various types of hallucination phenomena that are specific to LFMs and establishes evaluation criteria for assessing the extent of hallucination. It also examines existing strategies for mitigating hallucination in LFMs and discusses potential directions for future research in this area. Essentially, the paper offers a comprehensive examination of the challenges and solutions related to hallucination in LFMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper is a survey on hallucination in Large Foundation Models (LFMs). It defines hallucination as generation of content straying from factual reality, classifies hallucination phenomena specific to LFMs, establishes evaluation criteria, examines mitigation strategies, and discusses future research directions, claiming to provide a comprehensive examination of challenges and solutions.

Significance. If the taxonomy and literature coverage hold, the survey would organize a rapidly growing area, helping researchers navigate types of hallucinations, benchmarks, and mitigation techniques in LFMs. It could serve as a reference point for standardizing evaluation and highlighting open problems, provided the selection of works is representative.

major comments (1)
  1. [Abstract / Literature Review Approach] The manuscript provides no explicit search protocol, databases searched, date range, keywords, or inclusion/exclusion criteria (see Abstract and the opening of the literature review section). This is load-bearing for the central claim of a 'comprehensive examination,' as it leaves open the possibility that recent work on multimodal, vision-language, or agentic models is underrepresented and that the proposed taxonomy misses boundary phenomena.
minor comments (2)
  1. [Taxonomy section] Clarify whether the taxonomy is intended to be exhaustive or illustrative; add a short discussion of how new hallucination types (e.g., sycophancy induced by RLHF) would be accommodated.
  2. [Mitigation Strategies] Ensure figure captions and tables listing mitigation methods include the publication year of each cited work to aid readers in tracking recency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript to strengthen the literature review methodology.

read point-by-point responses
  1. Referee: The manuscript provides no explicit search protocol, databases searched, date range, keywords, or inclusion/exclusion criteria (see Abstract and the opening of the literature review section). This is load-bearing for the central claim of a 'comprehensive examination,' as it leaves open the possibility that recent work on multimodal, vision-language, or agentic models is underrepresented and that the proposed taxonomy misses boundary phenomena.

    Authors: We agree that an explicit search protocol is necessary to support the claim of comprehensiveness. In the revised manuscript we will insert a new subsection 'Literature Search and Selection Methodology' immediately after the abstract. It will document: databases (arXiv, Google Scholar, ACL Anthology, NeurIPS/ICLR/CVPR proceedings), date range (January 2018–September 2023), keyword combinations (e.g., 'hallucination' AND ('large language model' OR 'foundation model' OR 'vision-language model')), and inclusion criteria (empirical studies on models with ≥1B parameters that directly measure or mitigate hallucination). We will also perform an additional targeted search for recent multimodal and agentic work and add relevant citations to ensure boundary cases are covered; the taxonomy in Section 3 is intentionally extensible and already references vision-language phenomena, but we will make this coverage explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesis with no derivations or self-referential reductions

full rationale

The paper is a literature survey that classifies hallucination types in LFMs, reviews evaluation criteria and mitigation strategies, and outlines future directions. No equations, fitted parameters, predictions, or uniqueness theorems appear. Central claims rest on synthesis of external literature rather than any step that reduces by construction to the paper's own inputs or self-citations. Self-citations, if present, are not load-bearing for any derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no free parameters, axioms, or invented entities; it rests entirely on the body of previously published work it cites.

pith-pipeline@v0.9.0 · 5410 in / 989 out tokens · 37869 ms · 2026-05-16T15:16:29.922405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

    cs.CV 2026-04 unverdicted novelty 8.0

    3D-VCD reduces hallucinations in 3D-LLM embodied agents by contrasting predictions from original and distorted 3D scene representations at inference time.

  2. Evaluating Patient Safety Risks in Generative AI: Development and Validation of a FMECA Framework for Generated Clinical Content

    cs.CY 2026-04 unverdicted novelty 7.0

    A novel FMECA-based framework was developed and validated for systematic assessment of patient safety risks in LLM-generated clinical discharge summaries, demonstrating moderate-to-substantial inter-rater agreement an...

  3. GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning

    cs.AI 2026-03 unverdicted novelty 7.0

    GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference t...

  4. Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

    cs.CL 2026-05 unverdicted novelty 6.0

    Dimension-level evaluation reveals that 25-58% of LLM outputs with perfect holistic scores still show measurable intent deficits across languages and domains.

  5. Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

    cs.CL 2026-05 unverdicted novelty 6.0

    DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.

  6. Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    CoE applies vision-language models directly to document screenshots to deliver pixel-level bounding-box attribution for evidence in iterative retrieval-augmented generation, outperforming text baselines on visual-layo...

  7. Online Self-Calibration Against Hallucination in Vision-Language Models

    cs.CV 2026-05 unverdicted novelty 6.0

    OSCAR exploits the generative-discriminative gap in LVLMs to build online preference data with MCTS and dual-granularity rewards for DPO-based calibration, claiming SOTA hallucination reduction and improved multimodal...

  8. Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

    cs.CL 2026-04 unverdicted novelty 6.0

    SHADE adaptively combines coverage and spectral signals to estimate semantic alphabet size from few LLM samples, yielding better performance than baselines in low-sample regimes for alphabet estimation and QA error detection.

  9. SinkTrack: Attention Sink based Context Anchoring for Large Language Models

    cs.CV 2026-04 unverdicted novelty 6.0

    SinkTrack uses attention sink at the BOS token to anchor LLMs to initial context, reducing hallucination and forgetting with reported gains on benchmarks like SQuAD2.0 and M3CoT.

  10. Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

    cs.CL 2026-03 unverdicted novelty 6.0

    Hallucination neurons in LLMs are domain-specific, with cross-domain classifiers dropping from AUROC 0.783 within-domain to 0.563 across domains.

  11. The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

    cs.SE 2026-05 unverdicted novelty 5.0

    LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.

  12. Beyond Accuracy: LLM Variability in Evidence Screening for Software Engineering SLRs

    cs.SE 2026-04 unverdicted novelty 5.0

    LLMs exhibit substantial heterogeneity and non-determinism in SLR evidence screening, abstracts are decisive for performance, and they show no reliable superiority over classical classifiers on two real SLRs.

  13. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    cs.CL 2023-11 unverdicted novelty 5.0

    The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

  14. Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

    cs.AI 2026-04 unverdicted novelty 4.0

    DAVinCI combines claim attribution to model internals and external sources with entailment-based verification to improve LLM factual reliability by 5-20% on fact-checking datasets.

  15. A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

    cs.CL 2026-04 unverdicted novelty 4.0

    Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

  16. A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

    cs.CL 2024-01 accept novelty 4.0

    A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.

  17. A Survey on the Memory Mechanism of Large Language Model based Agents

    cs.AI 2024-04 accept novelty 3.0

    A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

Reference graph

Works this paper leans on

127 extracted references · 127 canonical work pages · cited by 17 Pith papers · 20 internal anchors

  1. [1]

    Kyle Wiggers , title=

  2. [4]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Putting people in their place: Affordance-aware human insertion into scenes , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  3. [10]

    The American Surgeon , volume=

    Severe sepsis attributable to community-associated methicillin-resistant Staphylococcus aureus: an emerging fatal problem , author=. The American Surgeon , volume=. 2007 , publisher=

  4. [11]

    Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=

    Microsoft coco: Common objects in context , author=. Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=. 2014 , organization=

  5. [12]

    Proceedings of the IEEE international conference on computer vision , pages=

    Dense-captioning events in videos , author=. Proceedings of the IEEE international conference on computer vision , pages=

  6. [14]

    arXiv preprint arXiv:2308.10168 , year=

    Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? AKA Will LLMs Replace Knowledge Graphs? , author=. arXiv preprint arXiv:2308.10168 , year=

  7. [15]

    Workshop on Efficient Systems for Foundation Models@ ICML2023 , year=

    Audio-Journey: Efficient Visual+ LLM-aided Audio Encodec Diffusion , author=. Workshop on Efficient Systems for Foundation Models@ ICML2023 , year=

  8. [16]

    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Rarr: Researching and revising what language models say, using language models , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  9. [21]

    HuggingFace\_InferenceAPI , title=

  10. [22]

    Fine-Tuning Language Models from Human Preferences

    Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving , title =. CoRR , volume =. 2019 , url =. 1909.08593 , timestamp =

  11. [23]

    Wikipedia\_Krippendorff's\_Alpha , title=

  12. [24]

    Wikipedia\_Fleiss's\_Kappa , title=

  13. [25]

    Wikipedia\_zscore , title=

  14. [26]

    Wikipedia\_min\_max , title=

  15. [27]

    The SNLI corpus , author=

  16. [28]

    Google Search , title=

  17. [29]

    Rishi Bommasani and Kevin Klyman and Daniel Zhang and Percy Liang , title =

  18. [30]

    2020 , eprint=

    HuggingFace's Transformers: State-of-the-art Natural Language Processing , author=. 2020 , eprint=

  19. [31]

    databricks , year = "2023", title=

  20. [32]

    On Faithfulness and Factuality in Abstractive Summarization

    Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan. On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.173

  21. [33]

    The Curious Case of Hallucinations in Neural Machine Translation

    Raunak, Vikas and Menezes, Arul and Junczys-Dowmunt, Marcin. The Curious Case of Hallucinations in Neural Machine Translation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.92

  22. [34]

    ArXiv , year=

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. ArXiv , year=

  23. [35]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  24. [36]

    2020 , eprint=

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , author=. 2020 , eprint=

  25. [37]

    A large annotated corpus for learning natural language inference

    A large annotated corpus for learning natural language inference , author=. arXiv preprint arXiv:1508.05326 , year=

  26. [38]

    2022 , month=

    OpenAI , title=. 2022 , month=

  27. [39]

    Politifact , publisher=

    Politifact , title=. Politifact , publisher=

  28. [40]

    The New York Times , publisher=

    NYT , title=. The New York Times , publisher=

  29. [41]

    Fast Transformer Decoding: One Write-Head is All You Need

    Fast transformer decoding: One write-head is all you need , author=. arXiv preprint arXiv:1911.02150 , year=

  30. [42]

    Learning Whom to Trust with MACE

    Hovy, Dirk and Berg-Kirkpatrick, Taylor and Vaswani, Ashish and Hovy, Eduard. Learning Whom to Trust with MACE. Proceedings of the 2013 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013

  31. [43]

    2023 , eprint=

    Do Language Models Know When They're Hallucinating References? , author=. 2023 , eprint=

  32. [44]

    Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence

    Copyright-Office , journal=. Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence. 2023 , month=

  33. [45]

    2023 , month=

    Blueprint for an AI Bill of Rights: Making Automated Systems Work For the American People , author=. 2023 , month=

  34. [46]

    2023 , month=

    GPTZero , author=. 2023 , month=

  35. [47]

    2023 , month=

    Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE (ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION LEGISLATIVE ACTS , author=. 2023 , month=

  36. [48]

    2023 , url=

    Our approach to AI safety , author=. 2023 , url=

  37. [49]

    2023 , eprint=

    HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models , author=. 2023 , eprint=

  38. [50]

    2023 , eprint=

    Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation , author=. 2023 , eprint=

  39. [51]

    2023 , eprint=

    mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations , author=. 2023 , eprint=

  40. [52]

    2023 , eprint=

    PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions , author=. 2023 , eprint=

  41. [53]

    2023 , eprint=

    Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment , author=. 2023 , eprint=

  42. [54]

    2023 , eprint=

    How Language Model Hallucinations Can Snowball , author=. 2023 , eprint=

  43. [55]

    When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

    Ladhak, Faisal and Durmus, Esin and Suzgun, Mirac and Zhang, Tianyi and Jurafsky, Dan and McKeown, Kathleen and Hashimoto, Tatsunori. When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023

  44. [60]

    2023 IEEE International Conference on Assured Autonomy (ICAA) , pages=

    Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting , author=. 2023 IEEE International Conference on Assured Autonomy (ICAA) , pages=. 2023 , organization=

  45. [67]

    ACM Computing Surveys , volume=

    Survey of hallucination in natural language generation , author=. ACM Computing Surveys , volume=. 2023 , publisher=

  46. [68]

    arXiv preprint arXiv:2306.08302 , year=

    Unifying Large Language Models and Knowledge Graphs: A Roadmap , author=. arXiv preprint arXiv:2306.08302 , year=

  47. [69]

    arXiv preprint arXiv:2308.06374 , year=

    Large Language Models and Knowledge Graphs: Opportunities and Challenges , author=. arXiv preprint arXiv:2308.06374 , year=

  48. [71]

    2023 , eprint=

    SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , author=. 2023 , eprint=

  49. [72]

    YouTube , publisher=

    The future of computational linguistics , url=. YouTube , publisher=. 2023 , month=

  50. [73]

    2022 , eprint=

    RoFormer: Enhanced Transformer with Rotary Position Embedding , author=. 2022 , eprint=

  51. [74]

    2019 , eprint=

    Root Mean Square Layer Normalization , author=. 2019 , eprint=

  52. [75]

    2019 , eprint=

    Deep Learning using Rectified Linear Units (ReLU) , author=. 2019 , eprint=

  53. [76]

    2020 , eprint=

    GLU Variants Improve Transformer , author=. 2020 , eprint=

  54. [77]

    2019 , eprint=

    Fast Transformer Decoding: One Write-Head is All You Need , author=. 2019 , eprint=

  55. [78]

    Smith and Mike Lewis , title =

    Ofir Press and Noah A. Smith and Mike Lewis , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

  56. [79]

    Advances in Neural Information Processing Systems , volume=

    Flashattention: Fast and memory-efficient exact attention with io-awareness , author=. Advances in Neural Information Processing Systems , volume=

  57. [80]

    Self-Instruct: Aligning Language Models with Self-Generated Instructions

    Self-Instruct: Aligning Language Model with Self Generated Instructions , author=. arXiv preprint arXiv:2212.10560 , year=

  58. [81]

    Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

    Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation , author=. arXiv preprint arXiv:2305.01210 , year=

  59. [82]

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Bloom: A 176b-parameter open-access multilingual language model , author=. arXiv preprint arXiv:2211.05100 , year=

  60. [83]

    The Eleventh International Conference on Learning Representations , year=

    Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning , author=. The Eleventh International Conference on Learning Representations , year=

  61. [84]

    2022 , eprint=

    OPT: Open Pre-trained Transformer Language Models , author=. 2022 , eprint=

  62. [85]

    LLaMA: Open and Efficient Foundation Language Models

    LLaMA: Open and Efficient Foundation Language Models , author=. arXiv preprint arXiv:2302.13971 , year=

  63. [86]

    Hashimoto , title =

    Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

  64. [87]

    and Stoica, Ion and Xing, Eric P

    Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , month =. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90\ url =

  65. [88]

    Continuous-Time Meta-Learning with Forward Mode Differentiation , booktitle =

    Tristan Deleu and David Kanaa and Leo Feng and Giancarlo Kerg and Yoshua Bengio and Guillaume Lajoie and Pierre. Continuous-Time Meta-Learning with Forward Mode Differentiation , booktitle =. 2022 , url =

  66. [89]

    2022 , eprint=

    Improved Beam Search for Hallucination Mitigation in Abstractive Summarization , author=. 2022 , eprint=

  67. [90]

    2022 , eprint=

    Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better , author=. 2022 , eprint=

  68. [91]

    2023 , eprint=

    TinyStories: How Small Can Language Models Be and Still Speak Coherent English? , author=. 2023 , eprint=

  69. [92]

    2022 , url=

    Midjourney , title=. 2022 , url=

  70. [93]

    2023 , url=

    nVIDIA , title=. 2023 , url=

  71. [94]

    2023 , url=

    Reuters , title=. 2023 , url=

  72. [95]

    International Conference on Machine Learning , pages=

    Zero-shot text-to-image generation , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  73. [96]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Hierarchical text-conditional image generation with clip latents , author=. arXiv preprint arXiv:2204.06125 , year=

  74. [97]

    2023 , eprint=

    GPT-4 Technical Report , author=. 2023 , eprint=

  75. [98]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=. 2020 , url =

  76. [99]

    2023 , month=

    Pause Giant AI Experiments: An Open Letter , author=. 2023 , month=

  77. [100]

    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

    When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=. 2023 , url=

  78. [101]

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Hallucinated but factual! inspecting the factuality of hallucinations in abstractive summarization , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=. 2022 , url=

  79. [102]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=. 2017 , url=

  80. [103]

    2018 , publisher=

    Improving language understanding by generative pre-training , author=. 2018 , publisher=

Showing first 80 references.