pith. sign in

arxiv: 2502.12187 · v3 · pith:S2HRXDEJnew · submitted 2025-02-15 · 💻 cs.CL · cs.FL· cs.LG· math.ST· stat.ML· stat.TH

Hallucinations are inevitable but can be made statistically negligible

Pith reviewed 2026-05-23 02:39 UTC · model grok-4.3

classification 💻 cs.CL cs.FLcs.LGmath.STstat.MLstat.TH
keywords hallucinationslanguage modelscomputability theoryprobability theorytraining datastatistical negligibilityinevitability
0
0 comments X

The pith

Hallucinations in language models cannot be eliminated but their probability on typical inputs can be driven arbitrarily low with enough high-quality training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that computability theory requires any language model to produce hallucinations on an infinite set of inputs no matter how much training occurs. At the same time it proves that under a probability distribution over inputs the fraction of cases where hallucinations occur can be made as small as desired simply by increasing the quality and quantity of training data. These two statements are compatible because the inevitable failures can be assigned vanishingly small probability. The authors conclude that the probabilistic guarantee aligns better with how models are used and measured in practice than the worst-case inevitability result.

Core claim

We prove that hallucinations can be made statistically negligible, provided that the quality and quantity of the training data are sufficient. This positive theoretical result from a probabilistic perspective coexists with the computability-theoretic result showing hallucinations on an infinite set of inputs. By evaluating the two results through the lens of information theory, the probability-theoretic positive result better reflects practical considerations than the computability-theoretic negative result.

What carries the argument

A probability distribution over inputs under which the measure of inputs that cause hallucinations can be driven arbitrarily close to zero by improvements in training data quality and quantity.

If this is right

  • Hallucination rates observed on representative test sets can be reduced without bound by scaling high-quality data.
  • The existence of an infinite set of failure cases does not prevent overall statistical reliability in deployed systems.
  • Efforts to mitigate hallucinations should prioritize data curation over changes in model architecture or inference algorithms.
  • Practical evaluation of language models should use probability-weighted measures rather than worst-case input sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Evaluation benchmarks could be redesigned around sampled distributions that reflect expected usage rather than adversarial inputs.
  • The same separation between inevitable failures and negligible probability may apply to other systematic errors in generative models.
  • Data-collection priorities could shift toward covering high-probability regions where reductions have the largest effect.

Load-bearing premise

There exists a probability distribution over inputs such that the set of inputs producing hallucinations can be assigned arbitrarily low probability through better training data, without the diagonal argument forcing that probability to stay high.

What would settle it

An experiment or proof showing that for any chosen probability distribution over inputs, there remains a positive lower bound on hallucination probability that no amount of added training data can reduce below.

read the original abstract

Hallucinations, a phenomenon where a language model (LM) generates nonfactual content, pose a significant challenge to the practical deployment of LMs. While many empirical methods have been proposed to mitigate hallucinations, recent studies established a computability-theoretic result showing that any LM will inevitably generate hallucinations on an infinite set of inputs, regardless of the quality and quantity of training datasets and the choice of the language model architecture and training and inference algorithms. Although the computability-theoretic result may seem pessimistic, its significance in practical viewpoints has remained unclear. This paper claims that those "innate" inevitability results from computability theory and diagonal argument, in principle, cannot explain practical issues of LLMs. We demonstrate this claim by presenting a positive theoretical result from a probabilistic perspective. Specifically, we prove that hallucinations can be made statistically negligible, provided that the quality and quantity of the training data are sufficient. Interestingly, our positive result coexists with the computability-theoretic result, implying that while hallucinations on an infinite set of inputs cannot be entirely eliminated, their probability can always be reduced by improving algorithms and training data. By evaluating the two seemingly contradictory results through the lens of information theory, we argue that our probability-theoretic positive result better reflects practical considerations than the computability-theoretic negative result.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that while computability-theoretic results (via diagonal arguments) establish that hallucinations occur on an infinite set of inputs for any language model regardless of training data or architecture, a probabilistic analysis shows that the measure of the hallucination set can be driven arbitrarily close to zero under a suitable input distribution when training data quality and quantity are sufficient. The positive probabilistic result is argued to coexist with the negative computability result without contradiction, and an information-theoretic lens is used to claim that the probabilistic result better reflects practical considerations.

Significance. If the probabilistic derivation holds with explicit assumptions and a concrete measure, the result would provide a theoretically grounded account of why empirical mitigation can succeed in practice despite theoretical inevitability, by showing that countably infinite bad sets need not carry positive probability under realistic input distributions. This distinction between inevitability on infinite sets and statistical negligibility is a potentially useful clarification for the field.

major comments (2)
  1. [Abstract] Abstract: the claim that 'we prove that hallucinations can be made statistically negligible' is asserted without any derivation steps, explicit probability space, measure, or handling of how the result coexists with the computability result while remaining non-vacuous; this leaves the central positive claim without verifiable support from the provided text.
  2. [Abstract (paragraph on positive theoretical result)] The weakest assumption identified (existence of a suitable probability distribution over inputs under which the hallucination set has arbitrarily small measure) is load-bearing for the coexistence claim but is not formalized or shown to be compatible with the diagonal argument in any explicit construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the need for explicit formalization of the probabilistic result. We agree that the abstract is too concise and will revise it and the main text to include a high-level outline of the probability space, measure, and explicit construction showing compatibility with the diagonal argument. These changes will make the central claim verifiable without altering the core results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'we prove that hallucinations can be made statistically negligible' is asserted without any derivation steps, explicit probability space, measure, or handling of how the result coexists with the computability result while remaining non-vacuous; this leaves the central positive claim without verifiable support from the provided text.

    Authors: We acknowledge that the abstract does not include derivation steps or an explicit probability space. The full derivation appears in Section 3, where we define the input space as the set of all finite sequences over the vocabulary, equip it with the product measure induced by a token-level distribution that assigns higher probability to factual completions as training data quality and quantity increase, and prove that the measure of the hallucination set (the countable set identified by diagonalization) can be driven below any epsilon > 0. Coexistence is addressed by observing that the computability result only requires the set to be infinite (non-empty support), while the measure can still be made arbitrarily small. We will revise the abstract to summarize this argument in one additional sentence for verifiability. revision: yes

  2. Referee: [Abstract (paragraph on positive theoretical result)] The weakest assumption identified (existence of a suitable probability distribution over inputs under which the hallucination set has arbitrarily small measure) is load-bearing for the coexistence claim but is not formalized or shown to be compatible with the diagonal argument in any explicit construction.

    Authors: We agree an explicit construction strengthens the presentation. In the revision we will add a formal definition of the input distribution (a mixture of a geometric distribution over lengths and a token distribution whose entropy decreases with training data quality) and an explicit construction showing that any countable diagonal set can be assigned measure less than epsilon while remaining infinite. This is compatible because diagonalization produces a countable set, and countable sets can have measure zero under non-uniform distributions without contradicting their existence. The revised Section 3 will include this construction. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an independent probabilistic argument that the measure of the hallucination set (known to be countably infinite via diagonalization) can be driven to zero or arbitrarily small under suitable input distributions when training data improves. This is a direct application of measure theory on countable sets and does not reduce by construction to any fitted parameter, self-citation chain, or definitional equivalence with the computability result. The two results are explicitly shown to coexist without contradiction, and the positive claim rests on standard probabilistic reasoning rather than any load-bearing self-reference or renaming of empirical patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; no free parameters, invented entities, or non-standard axioms are mentioned. The proof is described as relying on standard probabilistic and information-theoretic reasoning.

axioms (1)
  • standard math Standard axioms of probability theory
    Invoked for defining and bounding hallucination probabilities under a data distribution.

pith-pipeline@v0.9.0 · 5771 in / 1340 out tokens · 49810 ms · 2026-05-23T02:39:38.525847+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Overfitting has a limitation: a model-independent generalization gap bound based on R\'enyi entropy

    stat.ML 2025-05 unverdicted novelty 6.0

    A model-independent upper bound on generalization gap is established that depends solely on the Rényi entropy of the data-generating distribution for histogram-determined algorithms such as ERM.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 1 Pith paper · 17 internal anchors

  1. [1]

    ELIZA—a computer program for the study of natural language communication between man and machine,

    J. Weizenbaum, “ELIZA—a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966

  2. [2]

    Artificial paranoia,

    K. M. Colby, S. Weber, and F. D. Hilf, “Artificial paranoia,” Artificial intelligence, vol. 2, no. 1, pp. 1– 25, 1971

  3. [3]

    R. S. Wallace, The anatomy of ALICE. Springer, 2009

  4. [4]

    A cache-based natural language model for speech recognition,

    R. Kuhn and R. De Mori, “A cache-based natural language model for speech recognition,” IEEE transac- tions on pattern analysis and machine intelligence, vol. 12, no. 6, pp. 570–583, 1990

  5. [5]

    A linguistically motivated probabilistic model of information retrieval,

    D. Hiemstra, “A linguistically motivated probabilistic model of information retrieval,” in Research and Advanced Technology for Digital Libraries: Second European Conference, ECDL’98 Heraklion, Crete, Greece September 21–23, 1998 Proceedings 2, 1998, pp. 569–584

  6. [6]

    An empirical study of smoothing techniques for language modeling,

    S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech & Language, vol. 13, no. 4, pp. 359–394, 1999

  7. [7]

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propaga- tion, parallel distributed processing, explorations in the microstructure of cognition, ed. de rumelhart and j. mcclelland. vol. 1. 1986,” Biometrika, vol. 71, no. 599–607, p. 6, 1986

  8. [8]

    Finding structure in time,

    J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990

  9. [9]

    Fast Text Compression with Neural Networks.,

    M. V. Mahoney, “Fast Text Compression with Neural Networks.,” in FLAIRS, 2000, pp. 230–234

  10. [10]

    A neural probabilistic language model,

    Y. Bengio, R. Ducharme, and P. Vincent, “A neural probabilistic language model,” Advances in neural information processing systems, vol. 13, 2000

  11. [11]

    LONG SHORT-TERM MEMORY,

    S. Hochreiter, J. urgen Schmidhuber, and C. Elvezia, “LONG SHORT-TERM MEMORY,” Neural Com- putation, vol. 9, no. 8, pp. 1735–1780, 1997

  12. [12]

    Learning to forget: Continual prediction with LSTM,

    F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000

  13. [13]

    Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,

    K. Cho et al., “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), 2014, pp. 1724–1734

  14. [14]

    Neural machine translation by jointly learning to align and translate,

    D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in the 3rd International Conference on Learning Representations, 2015

  15. [15]

    Attention is all you need,

    A. Vaswani, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017

  16. [16]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2019

  17. [17]

    Training language models to follow instructions with human feedback,

    L. Ouyang et al., “Training language models to follow instructions with human feedback,” Advances in neural information processing systems, vol. 35, pp. 27730–27744, 2022

  18. [18]

    Language models are unsupervised multitask learners,

    A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019

  19. [19]

    Language models are few-shot learners,

    T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  20. [20]

    GPT-4 Technical Report

    J. Achiam et al., “GPT-4 technical report,” arXiv preprint arXiv:2303.08774, 2023

  21. [21]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team et al. , “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023. 10

  22. [22]

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Gemini Team et al., “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context,” arXiv preprint arXiv:2403.05530, 2024

  23. [23]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron et al. , “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023

  24. [24]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron et al. , “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023

  25. [25]

    The Llama 3 Herd of Models

    A. Dubey et al., “The llama 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024

  26. [26]

    The Claude 3 Model Family: Opus, Sonnet, Haiku

    Anthropic, “The Claude 3 Model Family: Opus, Sonnet, Haiku.” [Online]. Available: https://www-cdn. anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf

  27. [27]

    Qwen Technical Report

    J. Bai et al., “Qwen technical report,” arXiv preprint arXiv:2309.16609, 2023

  28. [28]

    Qwen2 Technical Report

    A. Yang et al., “Qwen2 Technical Report,” arXiv preprint arXiv:2407.10671, 2024

  29. [29]

    Qwen2.5 Technical Report

    A. Yang et al., “Qwen2.5 Technical Report,” arXiv preprint arXiv:2412.15115, 2024

  30. [30]

    DeepSeek-V3 Technical Report

    A. Liu et al., “Deepseek-v3 technical report,” arXiv preprint arXiv:2412.19437, 2024

  31. [31]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    D. Guo et al., “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” arXiv preprint arXiv:2501.12948, 2025

  32. [32]

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Y. Wu et al., “Google's neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016

  33. [33]

    The new Bing: Our approach to Responsible AI

    Microsoft, “The new Bing: Our approach to Responsible AI.” [Online]. Available: https://blogs.microsoft. com/wp-content/uploads/prod/sites/5/2023/04/RAI-for-the-new-Bing-April-2023.pdf

  34. [34]

    GPT4Rec: A generative framework for personalized recommendation and user interests interpretation,

    J. Li, W. Zhang, T. Wang, G. Xiong, A. Lu, and G. Medioni, “GPT4Rec: A generative framework for personalized recommendation and user interests interpretation,” arXiv preprint arXiv:2304.03879, 2023

  35. [35]

    Chat-rec: Towards interactive and explainable llms-augmented recommender system,

    Y. Gao, T. Sheng, Y. Xiang, Y. Xiong, H. Wang, and J. Zhang, “Chat-rec: Towards interactive and explainable llms-augmented recommender system,” arXiv preprint arXiv:2303.14524, 2023

  36. [36]

    A Survey of Large Language Models

    W. X. Zhao et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023

  37. [37]

    Large Language Models: A Survey

    S. Minaee et al., “Large language models: A survey,” arXiv preprint arXiv:2402.06196, 2024

  38. [38]

    A complete survey on llm-based ai chatbots,

    S. K. Dam, C. S. Hong, Y. Qiao, and C. Zhang, “A complete survey on llm-based ai chatbots,” arXiv preprint arXiv:2406.16937, 2024

  39. [39]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    L. Huang et al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” arXiv preprint arXiv:2311.05232, 2023

  40. [40]

    Survey of hallucination in natural language generation,

    Z. Ji et al., “Survey of hallucination in natural language generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023

  41. [41]

    Retrieval augmentation reduces hallucination in conversation,

    K. Shuster, S. Poff, M. Chen, D. Kiela, and J. Weston, “Retrieval augmentation reduces hallucination in conversation,” arXiv preprint arXiv:2104.07567, 2021

  42. [42]

    Verify-and-edit: A knowledge-enhanced chain-of-thought framework,

    R. Zhao, X. Li, S. Joty, C. Qin, and L. Bing, “Verify-and-edit: A knowledge-enhanced chain-of-thought framework,” arXiv preprint arXiv:2305.03268, 2023

  43. [43]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems, vol. 35, pp. 24824–24837, 2022

  44. [44]

    Chain-of-Verification Reduces Hallucination in Large Language Models

    S. Dhuliawala et al. , “Chain-of-verification reduces hallucination in large language models,” arXiv preprint arXiv:2309.11495, 2023

  45. [45]

    Hallucination is Inevitable: An Innate Limitation of Large Language Models

    Z. Xu, S. Jain, and M. Kankanhalli, “Hallucination is inevitable: An innate limitation of large language models,” arXiv preprint arXiv:2401.11817, 2024

  46. [46]

    Llms will always hallucinate, and we need to live with this,

    S. Banerjee, A. Agarwal, and S. Singla, “Llms will always hallucinate, and we need to live with this,” arXiv preprint arXiv:2409.05746, 2024

  47. [47]

    Hallucination (artificial intelligence) — Wikipedia, The Free Encyclopedia

    Wikipedia, “Hallucination (artificial intelligence) — Wikipedia, The Free Encyclopedia.” 2025

  48. [48]

    AI hallucinations can't be stopped—but these techniques can limit their damage,

    N. Jones, “AI hallucinations can't be stopped—but these techniques can limit their damage,” Nature, vol. 637, no. 8047, pp. 778–780, 2025. 11

  49. [49]

    Calibrated language models must hallucinate,

    A. T. Kalai and S. S. Vempala, “Calibrated language models must hallucinate,” in Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024, pp. 160–171

  50. [50]

    Are Transformers universal approximators of sequence-to-sequence functions?,

    C. Yun, S. Bhojanapalli, A. S. Rawat, S. Reddi, and S. Kumar, “Are Transformers universal approximators of sequence-to-sequence functions?,” in the 8th International Conference on Learning Representations , 2020

  51. [51]

    Big bird: Transformers for longer sequences,

    M. Zaheer et al., “Big bird: Transformers for longer sequences,” Advances in neural information process- ing systems, vol. 33, pp. 17283–17297, 2020

  52. [52]

    Inductive biases and variable creation in self-attention mechanisms,

    B. L. Edelman, S. Goel, S. Kakade, and C. Zhang, “Inductive biases and variable creation in self-attention mechanisms,” in International Conference on Machine Learning, 2022, pp. 5793–5831

  53. [53]

    On the Rate of Convergence of a Classifier Based on a Transformer Encoder,

    I. Gurevych, M. Kohler, and G. G. Şahin, “On the Rate of Convergence of a Classifier Based on a Transformer Encoder,” IEEE Transactions on Information Theory, vol. 68, no. 12, pp. 8139–8155, 2022, doi: 10.1109/TIT.2022.3191747

  54. [54]

    Approximation and estimation ability of transformers for sequence-to- sequence functions with infinite dimensional input,

    S. Takakura and T. Suzuki, “Approximation and estimation ability of transformers for sequence-to- sequence functions with infinite dimensional input,” in International Conference on Machine Learning, 2023, pp. 33416–33447

  55. [55]

    Transformers are minimax optimal nonparametric in-context learners,

    J. Kim, T. Nakamaki, and T. Suzuki, “Transformers are minimax optimal nonparametric in-context learners,” arXiv preprint arXiv:2408.12186, 2024

  56. [56]

    On learnability wih com- putable learners,

    S. Agarwal, N. Ananthakrishnan, S. Ben-David, T. Lechner, and R. Urner, “On learnability wih com- putable learners,” in Algorithmic Learning Theory, 2020, pp. 48–60

  57. [57]

    Shalev-Shwartz and S

    S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

  58. [58]

    Sipser, Introduction to the Theory of Computation

    M. Sipser, Introduction to the Theory of Computation. Cengage Learning, 2012

  59. [59]

    Language identification in the limit,

    E. M. Gold, “Language identification in the limit,” Information and control , vol. 10, no. 5, pp. 447– 474, 1967

  60. [60]

    D. J. MacKay, Information theory, inference and learning algorithms. Cambridge university press, 2003

  61. [61]

    Kunen, Set theory an introduction to independence proofs

    K. Kunen, Set theory an introduction to independence proofs. Elsevier, 2014

  62. [62]

    counterintuitive results,

    H. Herrlich, Axiom of choice, vol. 1876. Springer, 2006. 12 RoteMemorizer(((𝑠1,𝑦1),(𝑠2,𝑦2),…,(𝑠𝑚,𝑦𝑚)) ∈ (Σ∗,Σ∗)∗): 1 𝑚← len((𝑠1,𝑦1),(𝑠2,𝑦2),…,(𝑠𝑚,𝑦𝑚)) 2 Initialize an empty dictionary 𝑑 3 for 𝑖 ← 1,2,…,𝑚: 4 𝑑[𝑠𝑖] ← 𝑦𝑖 5 def TrivialRecaller (𝑠 ∈ Σ∗): 6 if 𝑑[𝑠] is defined: 7 return 𝑑[𝑠] ∈ Σ∗ 8 else: 9 return “” ∈ Σ∗ 10 return TrivialRecaller Algorithm 1: Ro...

  63. [63]

    We can prove Pr𝑇(HPUni(𝒳),𝑓0 (𝔄(𝑇)) ≥ 𝜆H) ≥ 𝜆T by applying Lemma 39 with 𝑐 = 1 and 𝑎 = 𝜆H to 𝔼𝑇 HPUni(𝒳),𝑓0 (𝔄(𝑇)) ≥ 1

  64. [64]

    In Corollary 35, for example, if 𝜆H = 1 4, then 𝜆T = 1 3

    □ Remark 36. In Corollary 35, for example, if 𝜆H = 1 4, then 𝜆T = 1 3 . Remark 37. (Regarding the statement of Theorem 33) Theorem 33 is a generalized version of the no free lunch theorem given as [57, Theorem 5.1.] as Theorem 33 provides a tighter bound when |𝒴| ≥ 3. However, the proof technique is essentially the same as that of Theorem 5.1. in [57]. We...

  65. [65]

    |𝒳| = ⌊ |Σ|𝑛+1−1 (|Σ|−1)CDF(𝑛)⌋, and

  66. [66]

    For any 𝑛∈ ℤ≥0, Pr(len(𝑋) ≤ 𝑛) = |𝒳≤𝑛| |𝒳| ≥ CDF(𝑛), where 𝒳≤𝑛 ≔ 𝒳0 ∪ 𝒳1 ∪ … ∪ 𝒳𝑛, which are the consequences of Lemma 32

    Let 𝑋 be a random variable generated by Uni(𝒳). For any 𝑛∈ ℤ≥0, Pr(len(𝑋) ≤ 𝑛) = |𝒳≤𝑛| |𝒳| ≥ CDF(𝑛), where 𝒳≤𝑛 ≔ 𝒳0 ∪ 𝒳1 ∪ … ∪ 𝒳𝑛, which are the consequences of Lemma 32 . Hence, we can complete the proof by confirming the above two properties. Here, the first property is trivial. Noting that |𝒳| ≤ |Σ|𝑛+1−1 (|Σ|−1)CDF(𝑛), the second property can be confir...

  67. [67]

    It is because all the maps in ℱ𝒚 return the same value 𝑦𝑗 for the input 𝜉𝑗 for each of 𝑗 ∈ [𝑛] \ {𝑟∗} but they return distinct values in 𝒴 for the input 𝜉𝑟∗

    The size (cardinality) is given by |𝒬𝒚| = |ℱ𝒚| = |𝒴| = 𝑝. It is because all the maps in ℱ𝒚 return the same value 𝑦𝑗 for the input 𝜉𝑗 for each of 𝑗 ∈ [𝑛] \ {𝑟∗} but they return distinct values in 𝒴 for the input 𝜉𝑟∗

  68. [68]

    For any 𝑞,𝑞′ ∈ 𝒬𝒚, the maps ℎ𝑞 and ℎ𝑞′ are identical, i.e., 𝔄( (𝒙𝑑∗,𝑓𝑞(𝒙𝑑∗)) ⊤ ) = 𝔄( (𝒙𝑑∗,𝑓𝑞′(𝒙𝑑∗)) ⊤ ) . It is because the equation 𝑓𝑞(𝒙𝑑∗) = 𝑓𝑞′(𝒙𝑑∗) holds since 𝜉𝑟∗ ∉ {𝑥1,𝑥2,…,𝑥𝑚} as 𝜉𝑟∗ = 𝜉𝑟∗ is taken from 𝒳\ {𝑥1,𝑥2,…,𝑥𝑚} by definition and 𝑓𝑞(𝜉) = 𝑓𝑞′(𝜉) is satisfied for all 𝜉 ∈ 𝒳\ {𝑟∗} by the definition of ℱ𝒚. We can uniquely define ℎ𝒚 by ℎ𝒚 ∈ {ℎ𝑞 |...