Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling

Alessandro Berti; Anton Antonov; Humam Kourani; Wil M.P. van der Aalst

arxiv: 2509.15336 · v2 · submitted 2025-09-18 · 💻 cs.AI

Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling

Humam Kourani , Anton Antonov , Alessandro Berti , Wil M.P. van der Aalst This is my paper

Pith reviewed 2026-05-18 15:29 UTC · model grok-4.3

classification 💻 cs.AI

keywords knowledge-driven hallucinationlarge language modelsprocess modelingbusiness process managementhallucinationempirical studyAI reliability

0 comments

The pith

LLMs override explicit process descriptions with their pre-trained knowledge of standard business workflows

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how large language models tasked with turning textual descriptions into formal business process models sometimes produce outputs that directly contradict the given descriptions. This occurs when the model's internal knowledge of typical processes overrides the specific evidence supplied in the input. The authors test this by feeding the models both ordinary process descriptions and deliberately unusual ones, then checking how closely the generated models match the provided details rather than the pre-trained patterns. A reader would care because the same override risk appears in any setting where an AI must stay faithful to unique source material instead of generalizing from training data.

Core claim

LLMs exhibit knowledge-driven hallucination in which their generated process models contradict explicit source evidence because the output is overridden by the model's generalized internal knowledge about standard business processes.

What carries the argument

Controlled experiment that compares LLM outputs on standard process descriptions versus deliberately atypical process structures to measure fidelity to the supplied evidence.

Load-bearing premise

Deliberately atypical process structures in the input will create observable conflicts with the LLM's pre-trained knowledge without being masked by prompt phrasing or model-specific biases.

What would settle it

If the models produce process diagrams with equal fidelity to the source text for both standard and atypical descriptions, the claim of knowledge-driven overriding would be falsified.

Figures

Figures reproduced from arXiv: 2509.15336 by Alessandro Berti, Anton Antonov, Humam Kourani, Wil M.P. van der Aalst.

read the original abstract

The utility of Large Language Models (LLMs) in analytical tasks is rooted in their vast pre-trained knowledge, which allows them to interpret ambiguous inputs and infer missing information. However, this same capability introduces a critical risk of what we term knowledge-driven hallucination: a phenomenon where the model's output contradicts explicit source evidence because it is overridden by the model's generalized internal knowledge. This paper investigates this phenomenon by evaluating LLMs on the task of automated process modeling, where the goal is to generate a formal business process model from a given source artifact. The domain of Business Process Management (BPM) provides an ideal context for this study, as many core business processes follow standardized patterns, making it likely that LLMs possess strong pre-trained schemas for them. We conduct a controlled experiment designed to create scenarios with deliberate conflict between provided evidence and the LLM's background knowledge. We use inputs describing both standard and deliberately atypical process structures to measure the LLM's fidelity to the provided evidence. Our work provides a methodology for assessing this critical reliability issue and raises awareness of the need for rigorous validation of AI-generated artifacts in any evidence-based domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a plausible risk that LLMs override explicit input with pre-trained process knowledge during modeling, but it provides no measurable criterion or results to confirm the effect.

read the letter

The main thing here is that the authors describe how LLMs can produce process models that contradict the supplied source text because they fall back on generalized internal knowledge about standard business processes. This is a reasonable concern for anyone using these models on document-driven analytical tasks. They set up the idea of testing with both standard and deliberately atypical process descriptions to force a conflict, which is a logical way to probe the issue in the BPM domain where pre-trained patterns are likely strong. The framing of knowledge-driven hallucination as output overridden by background knowledge is clear enough on its own terms. They also note the broader need for validation of AI-generated artifacts in evidence-based work, which follows directly from the setup. The soft spots are more central. The paper supplies no formal, automatable way to decide when an output contradicts the input evidence, such as a specific BPMN constraint violation or graph-edit threshold. Without that, differences could come from prompt phrasing, decoding noise, or parsing gaps rather than the claimed knowledge override, and the stress-test note on this point stands. The abstract also gives no model details, dataset descriptions, quantitative results, or statistical checks, so the central claim stays unverified. This leaves the work at an early stage. The paper is aimed at researchers looking at LLM reliability in structured tasks like process modeling or at practitioners who need fidelity to source documents. A reader working on hallucination mitigation or BPM automation might pick up the experimental contrast idea, but it would need tighter execution to deliver real value. I would send it to peer review if the authors add the missing measurement details and actual data, since the topic is timely and the basic framing is workable.

Referee Report

2 major / 2 minor

Summary. The paper claims that LLMs exhibit 'knowledge-driven hallucination' when generating formal business process models from source descriptions: the models contradict explicit but atypical input evidence because the LLM's pre-trained knowledge of standard processes overrides the provided source. The authors propose a controlled experiment contrasting standard versus deliberately atypical process structures to measure fidelity to the input evidence and provide a methodology for assessing this reliability issue in evidence-based domains such as BPM.

Significance. If the central empirical claim can be substantiated with reproducible measurements, the work would usefully document a concrete failure mode for LLM use in analytical tasks that require strict adherence to supplied evidence rather than generalization from training data. It raises awareness of validation needs for AI-generated artifacts in structured domains and could inform prompt-engineering or post-processing safeguards.

major comments (2)

The central claim requires an observable, reproducible way to detect when a generated process model contradicts explicit source evidence. The experiment description contrasts standard and atypical inputs but supplies no formal, automatable criterion (e.g., violation of specific BPMN constraints, missing mandatory elements, or a graph-edit-distance threshold) for classifying outputs as contradicting the input; human judgment or post-hoc inspection is implied but not operationalized. This is load-bearing for the empirical measurement.
No quantitative results, model details (e.g., which LLMs, temperature settings, prompt templates), dataset descriptions, or statistical analysis appear in the abstract or experiment outline, preventing verification that observed differences arise from knowledge override rather than prompt sensitivity or decoding stochasticity.

minor comments (2)

Clarify the exact definition of 'atypical' structures and how they were constructed to ensure they remain valid process descriptions while conflicting with common schemas.
Add a dedicated section or appendix detailing the full experimental protocol, including input artifacts, output parsing method, and any inter-annotator agreement if human evaluation is used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which help strengthen the clarity and rigor of our empirical study on knowledge-driven hallucination in LLMs for process modeling. We address each major comment point by point below.

read point-by-point responses

Referee: The central claim requires an observable, reproducible way to detect when a generated process model contradicts explicit source evidence. The experiment description contrasts standard and atypical inputs but supplies no formal, automatable criterion (e.g., violation of specific BPMN constraints, missing mandatory elements, or a graph-edit-distance threshold) for classifying outputs as contradicting the input; human judgment or post-hoc inspection is implied but not operationalized. This is load-bearing for the empirical measurement.

Authors: We agree that an explicit, automatable criterion is necessary to substantiate the central claim and enable reproducibility. The manuscript describes the controlled contrast between standard and atypical process inputs to measure fidelity to source evidence, but does not yet formalize the detection rule. In the revised version, we will add a precise operational definition: a model exhibits knowledge-driven hallucination if its parsed structure deviates from the input-specified atypical elements (e.g., replacing a described parallel gateway with a standard sequential flow or inserting unmentioned standard activities). This will be implemented via automated graph parsing of the output BPMN and computation of a normalized graph-edit distance to an input-derived reference model, with a fixed threshold for classification. Pseudocode and validation examples will be included in the methodology section. revision: yes
Referee: No quantitative results, model details (e.g., which LLMs, temperature settings, prompt templates), dataset descriptions, or statistical analysis appear in the abstract or experiment outline, preventing verification that observed differences arise from knowledge override rather than prompt sensitivity or decoding stochasticity.

Authors: The full manuscript reports these elements in the Experiments and Results sections, including the LLMs evaluated, temperature settings chosen for reduced stochasticity, prompt templates, dataset construction (standard vs. atypical descriptions), and statistical comparisons of fidelity metrics. However, we acknowledge that the abstract and high-level experiment outline do not preview them adequately. We will revise by expanding the experiment outline to list all models, parameters, dataset sizes, and analysis methods upfront, and will add a concise summary of key quantitative findings to the abstract within length constraints. revision: partial

Circularity Check

0 steps flagged

Empirical measurement study with no derivation chain

full rationale

The paper frames its contribution as a controlled empirical experiment that contrasts LLM outputs on standard versus deliberately atypical process descriptions to observe fidelity to input evidence. No equations, first-principles derivations, or quantitative predictions appear in the abstract or described methodology. The central claim rests on experimental observations rather than any reduction of results to fitted parameters, self-definitions, or self-citation chains. The work is therefore self-contained as an empirical measurement study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLMs have internalized strong schemas for standard business processes from pre-training data, which is invoked to justify the choice of BPM as the test domain.

axioms (1)

domain assumption LLMs possess strong pre-trained schemas for standard business processes.
Explicitly stated in the abstract as the reason the BPM domain creates likely conflicts between evidence and internal knowledge.

invented entities (1)

knowledge-driven hallucination no independent evidence
purpose: To name the specific failure mode in which model output contradicts source evidence due to override by generalized internal knowledge.
New term defined in the abstract to distinguish this behavior from other hallucination types.

pith-pipeline@v0.9.0 · 5739 in / 1277 out tokens · 49209 ms · 2026-05-18T15:29:19.233835+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conduct a controlled experiment designed to create scenarios with deliberate conflict between provided evidence and the LLM's background knowledge. We use inputs describing both standard and deliberately atypical process structures to measure the LLM's fidelity to the provided evidence.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our findings strongly support our central hypothesis: LLMs exhibit a significant tendency for knowledge-driven hallucination when faced with atypical process structures.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

de A. R. Gonçalves, J.C., Santoro, F.M., Baião, F.A.: Let me tell you a story - on how to build process models. J. Univers. Comput. Sci.17(2), 276–295 (2011)

work page 2011
[2]

In: COLING 2018

van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process manage- ment. In: COLING 2018. pp. 2791–2801 (2018)

work page 2018
[3]

Alberto Blanco-Justicia et al.: Digital forgetting in large language models: a survey of unlearning methods. Artif. Intell. Rev.58(3), 90 (2025) 12 H. Kourani et al

work page 2025
[4]

Berti,A.,vanZelst,S.J.,Schuster,D.:PM4Py:Aprocessmininglibraryforpython. Softw. Impacts17, 100556 (2023)

work page 2023
[5]

In: NeurIPS 2020 (2020)

Brown, T.B., Mann, B., Ryder, N., et al., M.S.: Language models are few-shot learners. In: NeurIPS 2020 (2020)

work page 2020
[6]

CoRRabs/2410.03255(2024)

Busch, K., Leopold, H.: Towards a benchmark for large language models for busi- ness process management tasks. CoRRabs/2410.03255(2024)

work page arXiv 2024
[7]

Domain specialization as the key to make large language models disruptive: A comprehensive survey

Chen Ling et al.: Beyond one-model-fits-all: A survey of domain specialization for large language models. CoRRabs/2305.18703(2023)

work page arXiv 2023
[8]

In: CAiSE 2020

Chen Qian et al.: An approach for process model extraction by multi-grained text classification. In: CAiSE 2020. LNCS, vol. 12127, pp. 268–282. Springer (2020)

work page 2020
[9]

In: NeurIPS 2023 (2023)

Chunting Zhou et al.: LIMA: less is more for alignment. In: NeurIPS 2023 (2023)

work page 2023
[10]

Springer (2018)

Dumas, M., Rosa, M.L., Mendling, J., Reijers, H.A.: Fundamentals of Business Process Management, Second Edition. Springer (2018)

work page 2018
[11]

In: CAiSE’13 Forum

Forster, S., Pinggera, J., Weber, B.: Toward an understanding of the collaborative process of process modeling. In: CAiSE’13 Forum. pp. 98–105 (2013)

work page 2013
[12]

In: CAiSE 2011

Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: CAiSE 2011. pp. 482–496 (2011)

work page 2011
[13]

In: BPM 2023 Workshops

Grohs, M., Abb, L., Elsayed, N., Rehse, J.: Large language models can accomplish business process management tasks. In: BPM 2023 Workshops. LNBIP, vol. 492, pp. 453–465. Springer (2023)

work page 2023
[14]

In: ICLR 2024

Hosking, T., Blunsom, P., Bartolo, M.: Human feedback is not gold standard. In: ICLR 2024. OpenReview.net (2024)

work page 2024
[15]

In: BPM 2023 Forum

Klievtsova, N., Benzin, J., Kampik, T., Mangler, J., Rinderle-Ma, S.: Conversa- tionalprocessmodelling:Stateoftheart,applications,andimplicationsinpractice. In: BPM 2023 Forum. pp. 319–336 (2023)

work page 2023
[16]

CoRRabs/2412.00023(2024)

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: Evaluating large language models on business process modeling: Framework, benchmark, and self- improvement analysis. CoRRabs/2412.00023(2024)

work page arXiv 2024
[17]

In: Enterprise, Business-Process and Information Systems Modeling - BPMDS 2024 and EMMSAD 2024, Limassol, Cyprus, June 3-4, 2024, Proceedings

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: Process modeling with large language models. In: Enterprise, Business-Process and Information Systems Modeling - BPMDS 2024 and EMMSAD 2024, Limassol, Cyprus, June 3-4, 2024, Proceedings. pp. 229–244 (2024)

work page 2024
[18]

In: IJCAI 2024, Jeju, South Korea, August 3-9, 2024

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: ProMoAI: Process modeling with generative AI. In: IJCAI 2024, Jeju, South Korea, August 3-9, 2024. pp. 8708–8712 (2024)

work page 2024
[19]

Kourani, H., van Zelst, S.J.: POWL: partially ordered workflow language. In: BPM

work page
[20]

ACM Trans

Lei Huang et al.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst.43(2), 42:1–42:55 (2025)

work page 2025
[21]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao et al.: The pile: An 800gb dataset of diverse text for language modeling. CoRRabs/2101.00027(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[22]

In: ACL 2024

LongzeChenetal.:Longcontextisnotlongatall:Aprospectoroflong-dependency data for large language models. In: ACL 2024. pp. 8222–8234. Association for Computational Linguistics (2024)

work page 2024
[23]

a is b" fail to learn

Lukas Berglund et al.: The reversal curse: Llms trained on "a is b" fail to learn "b is a". In: ICLR 2024. OpenReview.net (2024)

work page 2024
[24]

In: ICLR 2024

Mrinank Sharma et al.: Towards understanding sycophancy in language models. In: ICLR 2024. OpenReview.net (2024)

work page 2024
[25]

Roberts, M., Anderson, J., Delgado, W., Johnson, R., Spencer, L.: Extending con- textual length and world knowledge generalization in large language models (2024) Knowledge-Driven Hallucination in Large Language Models 13

work page 2024
[26]

Sholiq, S., Sarno, R., Astuti, E.S.: Generating BPMN diagram from textual re- quirements. J. King Saud Univ. Comput. Inf. Sci.34(10 Part B), 10079–10093 (2022)

work page 2022
[27]

In: CBI 2017

Sintoris, K., Vergidis, K.: Extracting business process models using natural lan- guage processing (NLP) techniques. In: CBI 2017. pp. 135–139 (2017)

work page 2017
[28]

Woensel, W.V., Motie, S.: NLP4PBM: a systematic review on process extraction using natural language processing with rule-based, machine and deep learning methods. Enterp. Inf. Syst.18(11) (2024)

work page 2024
[29]

Aligning large language models with human: A survey.arXiv preprint arXiv:2307.12966, 2023

Yufei Wang et al.: Aligning large language models with human: A survey. CoRR abs/2307.12966(2023)

work page arXiv 2023
[30]

ACM Com- put

Ziwei Ji et al.: Survey of hallucination in natural language generation. ACM Com- put. Surv.55(12), 248:1–248:38 (2023)

work page 2023

[1] [1]

de A. R. Gonçalves, J.C., Santoro, F.M., Baião, F.A.: Let me tell you a story - on how to build process models. J. Univers. Comput. Sci.17(2), 276–295 (2011)

work page 2011

[2] [2]

In: COLING 2018

van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process manage- ment. In: COLING 2018. pp. 2791–2801 (2018)

work page 2018

[3] [3]

Alberto Blanco-Justicia et al.: Digital forgetting in large language models: a survey of unlearning methods. Artif. Intell. Rev.58(3), 90 (2025) 12 H. Kourani et al

work page 2025

[4] [4]

Berti,A.,vanZelst,S.J.,Schuster,D.:PM4Py:Aprocessmininglibraryforpython. Softw. Impacts17, 100556 (2023)

work page 2023

[5] [5]

In: NeurIPS 2020 (2020)

Brown, T.B., Mann, B., Ryder, N., et al., M.S.: Language models are few-shot learners. In: NeurIPS 2020 (2020)

work page 2020

[6] [6]

CoRRabs/2410.03255(2024)

Busch, K., Leopold, H.: Towards a benchmark for large language models for busi- ness process management tasks. CoRRabs/2410.03255(2024)

work page arXiv 2024

[7] [7]

Domain specialization as the key to make large language models disruptive: A comprehensive survey

Chen Ling et al.: Beyond one-model-fits-all: A survey of domain specialization for large language models. CoRRabs/2305.18703(2023)

work page arXiv 2023

[8] [8]

In: CAiSE 2020

Chen Qian et al.: An approach for process model extraction by multi-grained text classification. In: CAiSE 2020. LNCS, vol. 12127, pp. 268–282. Springer (2020)

work page 2020

[9] [9]

In: NeurIPS 2023 (2023)

Chunting Zhou et al.: LIMA: less is more for alignment. In: NeurIPS 2023 (2023)

work page 2023

[10] [10]

Springer (2018)

Dumas, M., Rosa, M.L., Mendling, J., Reijers, H.A.: Fundamentals of Business Process Management, Second Edition. Springer (2018)

work page 2018

[11] [11]

In: CAiSE’13 Forum

Forster, S., Pinggera, J., Weber, B.: Toward an understanding of the collaborative process of process modeling. In: CAiSE’13 Forum. pp. 98–105 (2013)

work page 2013

[12] [12]

In: CAiSE 2011

Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: CAiSE 2011. pp. 482–496 (2011)

work page 2011

[13] [13]

In: BPM 2023 Workshops

Grohs, M., Abb, L., Elsayed, N., Rehse, J.: Large language models can accomplish business process management tasks. In: BPM 2023 Workshops. LNBIP, vol. 492, pp. 453–465. Springer (2023)

work page 2023

[14] [14]

In: ICLR 2024

Hosking, T., Blunsom, P., Bartolo, M.: Human feedback is not gold standard. In: ICLR 2024. OpenReview.net (2024)

work page 2024

[15] [15]

In: BPM 2023 Forum

Klievtsova, N., Benzin, J., Kampik, T., Mangler, J., Rinderle-Ma, S.: Conversa- tionalprocessmodelling:Stateoftheart,applications,andimplicationsinpractice. In: BPM 2023 Forum. pp. 319–336 (2023)

work page 2023

[16] [16]

CoRRabs/2412.00023(2024)

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: Evaluating large language models on business process modeling: Framework, benchmark, and self- improvement analysis. CoRRabs/2412.00023(2024)

work page arXiv 2024

[17] [17]

In: Enterprise, Business-Process and Information Systems Modeling - BPMDS 2024 and EMMSAD 2024, Limassol, Cyprus, June 3-4, 2024, Proceedings

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: Process modeling with large language models. In: Enterprise, Business-Process and Information Systems Modeling - BPMDS 2024 and EMMSAD 2024, Limassol, Cyprus, June 3-4, 2024, Proceedings. pp. 229–244 (2024)

work page 2024

[18] [18]

In: IJCAI 2024, Jeju, South Korea, August 3-9, 2024

Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P.: ProMoAI: Process modeling with generative AI. In: IJCAI 2024, Jeju, South Korea, August 3-9, 2024. pp. 8708–8712 (2024)

work page 2024

[19] [19]

Kourani, H., van Zelst, S.J.: POWL: partially ordered workflow language. In: BPM

work page

[20] [20]

ACM Trans

Lei Huang et al.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst.43(2), 42:1–42:55 (2025)

work page 2025

[21] [21]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao et al.: The pile: An 800gb dataset of diverse text for language modeling. CoRRabs/2101.00027(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[22] [22]

In: ACL 2024

LongzeChenetal.:Longcontextisnotlongatall:Aprospectoroflong-dependency data for large language models. In: ACL 2024. pp. 8222–8234. Association for Computational Linguistics (2024)

work page 2024

[23] [23]

a is b" fail to learn

Lukas Berglund et al.: The reversal curse: Llms trained on "a is b" fail to learn "b is a". In: ICLR 2024. OpenReview.net (2024)

work page 2024

[24] [24]

In: ICLR 2024

Mrinank Sharma et al.: Towards understanding sycophancy in language models. In: ICLR 2024. OpenReview.net (2024)

work page 2024

[25] [25]

Roberts, M., Anderson, J., Delgado, W., Johnson, R., Spencer, L.: Extending con- textual length and world knowledge generalization in large language models (2024) Knowledge-Driven Hallucination in Large Language Models 13

work page 2024

[26] [26]

Sholiq, S., Sarno, R., Astuti, E.S.: Generating BPMN diagram from textual re- quirements. J. King Saud Univ. Comput. Inf. Sci.34(10 Part B), 10079–10093 (2022)

work page 2022

[27] [27]

In: CBI 2017

Sintoris, K., Vergidis, K.: Extracting business process models using natural lan- guage processing (NLP) techniques. In: CBI 2017. pp. 135–139 (2017)

work page 2017

[28] [28]

Woensel, W.V., Motie, S.: NLP4PBM: a systematic review on process extraction using natural language processing with rule-based, machine and deep learning methods. Enterp. Inf. Syst.18(11) (2024)

work page 2024

[29] [29]

Aligning large language models with human: A survey.arXiv preprint arXiv:2307.12966, 2023

Yufei Wang et al.: Aligning large language models with human: A survey. CoRR abs/2307.12966(2023)

work page arXiv 2023

[30] [30]

ACM Com- put

Ziwei Ji et al.: Survey of hallucination in natural language generation. ACM Com- put. Surv.55(12), 248:1–248:38 (2023)

work page 2023