pith. machine review for the scientific record. sign in

arxiv: 2605.11672 · v1 · submitted 2026-05-12 · 💻 cs.AI · cs.DB

Recognition: 1 theorem link

· Lean Theorem

A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:09 UTC · model grok-4.3

classification 💻 cs.AI cs.DB
keywords large language modelstrilemmasemantic underdeterminationbiascorrectnessutilityCAP theorem
0
0 comments X

The pith

Under semantic underdetermination, large language models cannot always guarantee strong correctness, strict non-bias, and high utility at the same time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper draws an analogy to the CAP theorem to argue that LLMs encounter a fundamental trade-off in handling prompts where the given information does not point to a single answer. To provide a useful and direct response, the model often needs to apply its own selection criteria or preferences, which the paper views as introducing bias if not explicitly supported by the input. Sticking strictly to correctness and non-bias may lead the model to hedge, clarify, or refuse, which reduces the practical value of the output. The trilemma suggests that some apparent shortcomings in LLM responses are inevitable consequences of underdetermined decision-making rather than fixable errors.

Core claim

The central claim is that, under semantic underdetermination, an LLM cannot always simultaneously guarantee strong correctness, strict non-bias, and high utility. A prompt is semantically underdetermined when the premises do not determine a unique answer. In such cases, a useful response requires the model to introduce a selection criterion, preference, prior, or value ordering. If this criterion is not supplied by the user or justified by the premises, the response is biased. Avoiding this preserves correctness and non-bias but lowers utility through refusal or hedging.

What carries the argument

The correctness-non-bias-utility trilemma under semantic underdetermination, which arises because decisive responses require the model to introduce unsupported selection criteria that count as bias.

If this is right

  • Some LLM failures arise from the structure of underdetermined decision requests rather than model limitations alone.
  • Models that prioritize utility will necessarily introduce preferences not present in the premises.
  • Strict adherence to non-bias and correctness leads to reduced decisiveness and utility in ambiguous scenarios.
  • The trilemma applies specifically when premises allow multiple valid answers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers might need to explicitly surface the selection criteria used by models in responses to ambiguous queries.
  • This structural limit could explain why alignment techniques struggle to eliminate all forms of bias without impacting usefulness.
  • Future systems could declare the degree of underdetermination in a query to inform users about potential trade-offs.

Load-bearing premise

Any selection criterion or preference introduced by the model but not supplied by the user or premises necessarily constitutes bias in a selection-theoretic sense.

What would settle it

Demonstration of an LLM that provides decisive, fully correct responses to semantically underdetermined prompts while maintaining strict non-bias without any hedging or refusal.

read the original abstract

The CAP theorem states that a distributed system cannot simultaneously guarantee consistency, availability, and partition tolerance under network partition. Inspired by this result, this paper formulates a CAP-like conjecture for Large Language Models (LLMs). The proposed trilemma states that, under semantic underdetermination, an LLM cannot always simultaneously guarantee strong correctness, strict non-bias, and high utility. A prompt is semantically underdetermined when the given premises do not determine a unique answer. In such cases, a useful and decisive response requires the model to introduce a selection criterion, preference, prior, or value ordering. If this criterion is not supplied by the user or justified by the available premises, the response becomes biased in a broad selection-theoretic sense. Conversely, if the model avoids unsupported preferences, it may preserve correctness and non-bias but may reduce utility through refusal, hedging, or clarification. The paper formalizes this correctness--non-bias--utility trilemma, develops examples, and argues that certain LLM failures arise not merely from model limitations but from the structure of underdetermined decision requests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a CAP-like trilemma for LLMs: under semantic underdetermination (premises not determining a unique answer), an LLM cannot always simultaneously guarantee strong correctness, strict non-bias, and high utility. It defines strict non-bias as the model introducing no selection criterion, preference, or prior not supplied by the user or premises, argues that decisive useful responses require exactly such introductions (hence bias), and claims that avoiding them reduces utility via hedging or refusal. The manuscript develops the idea via examples and attributes certain LLM failures to this structural issue.

Significance. If the trilemma can be established beyond definitional framing, the analogy to CAP could usefully frame inherent trade-offs in LLM response generation for ambiguous queries and guide design choices around neutrality versus decisiveness. The paper correctly identifies that some failures may be structural rather than due to model scale or data alone. However, the current lack of formal derivation, proof, or empirical test substantially limits its significance to the field.

major comments (2)
  1. [Abstract] Abstract: The central trilemma is advanced as a conjecture supported only by illustrative examples and definitional arguments, with no formal derivation, proof, or empirical test described. The incompatibility follows directly from defining strict non-bias as absence of any model-introduced selection (which utility in underdetermined cases requires), rather than from an independent system model or reduction as in the CAP theorem.
  2. [The formalization of the trilemma] The argument for the trilemma: The assumption that any model-introduced selection criterion not supplied by the user or premises necessarily constitutes bias in a selection-theoretic sense is load-bearing but not independently justified or shown to be the only viable definition of non-bias. Without separating pragmatic selection from bias via additional axioms or a concrete model, the claim reduces to a restatement of terminology.
minor comments (1)
  1. [Abstract] The manuscript would benefit from explicit discussion of how the proposed trilemma relates to or differs from existing work in AI alignment, decision theory under uncertainty, or fairness literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We appreciate the acknowledgment that the proposed trilemma identifies potentially structural issues in LLM behavior. We address the major comments point by point below, clarifying the conjectural nature of the work while committing to revisions that strengthen the formal presentation without overstating its status as a theorem.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central trilemma is advanced as a conjecture supported only by illustrative examples and definitional arguments, with no formal derivation, proof, or empirical test described. The incompatibility follows directly from defining strict non-bias as absence of any model-introduced selection (which utility in underdetermined cases requires), rather than from an independent system model or reduction as in the CAP theorem.

    Authors: We agree that the trilemma is presented as a conjecture rather than a theorem with a mathematical proof, and that the core incompatibility is derived from the interplay of the stated definitions. Unlike the CAP theorem, which operates in a fully axiomatized model of distributed systems, our work is conceptual and draws an analogy to highlight trade-offs that arise in practice for LLMs facing semantically underdetermined prompts. The definitions are not arbitrary; they are motivated by standard requirements in decision-making contexts where utility demands resolution and strict non-bias prohibits unprompted preferences. We will revise the abstract, introduction, and a new formalization subsection to explicitly label the result as a conjecture, outline a minimal decision-theoretic model showing the incompatibility, and discuss why a full reduction analogous to CAP is not claimed here. revision: partial

  2. Referee: [The formalization of the trilemma] The argument for the trilemma: The assumption that any model-introduced selection criterion not supplied by the user or premises necessarily constitutes bias in a selection-theoretic sense is load-bearing but not independently justified or shown to be the only viable definition of non-bias. Without separating pragmatic selection from bias via additional axioms or a concrete model, the claim reduces to a restatement of terminology.

    Authors: The definition of strict non-bias is chosen to isolate cases where the model supplies a selection criterion absent from the premises or user input, which we argue qualifies as bias under a selection-theoretic view because it alters the outcome distribution without justification. This is not presented as the sole possible definition; the paper contrasts it with weaker notions of neutrality that permit defaults. To address the concern, we will add explicit axioms in the formalization section that distinguish bias (unwarranted introduction of preferences) from pragmatic selection (e.g., tie-breaking rules explicitly declared as such), supported by references to decision theory and AI ethics literature on value alignment. This will make the load-bearing assumption more transparent and open to alternative formulations. revision: yes

Circularity Check

1 steps flagged

Trilemma reduces to definitional incompatibility by construction

specific steps
  1. self definitional [Abstract]
    "If this criterion is not supplied by the user or justified by the available premises, the response becomes biased in a broad selection-theoretic sense. Conversely, if the model avoids unsupported preferences, it may preserve correctness and non-bias but may reduce utility through refusal, hedging, or clarification."

    The trilemma asserts that strong correctness, strict non-bias, and high utility cannot coexist under semantic underdetermination. Strict non-bias is defined as the model introducing no selection criterion, preference, or prior not supplied by the user or premises, while high utility for decisive responses explicitly requires introducing such a criterion. The claimed impossibility therefore follows directly from these definitions rather than from any independent argument or model showing the properties are incompatible.

full rationale

The paper formulates a conjecture rather than deriving it from independent axioms or a system model like the original CAP theorem. The load-bearing step equates any model-introduced selection with bias (by definition) while requiring such selection for utility in underdetermined cases, making the trilemma hold tautologically from the chosen terminology. No equations, external theorems, or self-citations are invoked to demonstrate the tradeoff; the incompatibility is asserted via the framing itself. This qualifies as partial circularity but is not a full reduction of a quantitative result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The argument depends on the analogy to the CAP theorem and on the definitional claim that unsupported selection criteria equal bias; these are not independently justified within the abstract.

axioms (2)
  • domain assumption A prompt is semantically underdetermined when its premises do not determine a unique answer.
    This definition is invoked to trigger the trilemma and is presented without further derivation.
  • ad hoc to paper Introducing a selection criterion not supplied by the user or premises constitutes bias in a broad selection-theoretic sense.
    This equivalence is central to labeling the response as biased and is not derived from external benchmarks.
invented entities (1)
  • Correctness-non-bias-utility trilemma no independent evidence
    purpose: To describe an inherent trade-off in LLM responses to underdetermined prompts
    The trilemma is postulated as a conjecture without independent falsifiable evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1417 out tokens · 44863 ms · 2026-05-13T01:09:59.709845+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Towards robust distributed systems,

    E. A. Brewer, “Towards robust distributed systems,” keynote at the ACM Symposium on Principles of Distributed Computing, Portland, Oregon, USA, 2000

  2. [2]

    Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services,

    S. Gilbert and N. Lynch, “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services,”ACM SIGACT News, vol. 33, no. 2, pp. 51–59, 2002. doi: 10.1145/564585.564601

  3. [3]

    Semantics derived automatically from language corpora contain human-like biases,

    A. Caliskan, J. J. Bryson, and A. Narayanan, “Semantics derived automatically from language corpora contain human-like biases,”Science, vol. 356, no. 6334, pp. 183–186, 2017. doi: 10.1126/science.aal4230

  4. [4]

    In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

    E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic parrots: Can language models be too big?,” inProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623, 2021. doi: 10.1145/3442188.3445922. 5

  5. [5]

    Training language models to follow instructions with human feedback,

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,”Advances in Neural Information Processing Systems, vol. 35...

  6. [6]

    TruthfulQA: Measuring how models mimic human falsehoods,

    S. Lin, J. Hilton, and O. Evans, “TruthfulQA: Measuring how models mimic human falsehoods,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 3214–3252, Dublin, Ireland,

  7. [7]

    doi: 10.18653/v1/2022.acl-long.229

  8. [8]

    On the Opportunities and Risks of Foundation Models

    R. Bommasani et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258,

  9. [9]

    doi: 10.48550/arXiv.2108.07258. 6