pith. sign in

arxiv: 2509.22297 · v2 · submitted 2025-09-26 · 💻 cs.AI

Large Language Models as Nondeterministic Causal Models

Pith reviewed 2026-05-18 12:48 UTC · model grok-4.3

classification 💻 cs.AI
keywords counterfactualslarge language modelscausal modelsnondeterminismblack-box modelsinterpretability
0
0 comments X

The pith

Large language models should be represented as nondeterministic causal models to generate counterfactuals on any black-box implementation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that recent methods for producing counterfactuals in probabilistic LLMs rely on an inconsistent reading of what the model actually is. One approach assumes one can alter the sampling process without changing the model itself, while the other forces a deterministic causal structure onto an inherently nondeterministic system. The author instead treats the LLM according to its intended interpretation as a nondeterministic causal model. This yields a simpler procedure for asking what the output would have been under a different input, and the procedure works without inspecting or modifying any internal details. The result supplies a theoretical basis for counterfactual reasoning that can support explanation, evaluation, and targeted improvement of LLM behavior.

Core claim

Representing an LLM as a nondeterministic causal model according to its intended semantics makes it possible to generate counterfactuals directly from the black-box model itself, without assuming one can change its sampling implementation or recasting it as a deterministic structure; this method is therefore applicable to any LLM and stands in contrast to earlier techniques that trade generality for the ability to produce one specific class of counterfactuals.

What carries the argument

nondeterministic causal model of the LLM's intended interpretation

If this is right

  • Counterfactuals become available for any black-box LLM without code changes or internal access.
  • Different types of counterfactuals can be distinguished and chosen according to the intended application.
  • A shared theoretical foundation now exists for developing further, purpose-specific counterfactual procedures.
  • Explanation and evaluation tasks that rely on what-if reasoning gain a consistent semantic basis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same nondeterministic framing could be tested on other sampling-based generative systems to see whether counterfactual reasoning extends beyond text models.
  • One could examine whether the generated counterfactuals help surface biases that remain hidden under standard prompt variation.
  • Future work might combine this approach with existing deterministic causal methods to produce hybrid counterfactuals tuned to particular evaluation goals.

Load-bearing premise

Large language models possess a determinate intended interpretation that can be read directly as a nondeterministic causal model regardless of their concrete implementation.

What would settle it

Apply the method to a concrete LLM and a factual prompt, then check whether the generated counterfactual output matches the actual output the same model produces when the prompt is manually replaced by the counterfactual input.

read the original abstract

Recent work by Chatzi et al. and Ravfogel et al. has developed, for the first time, a method for generating counterfactuals of probabilistic Large Language Models. Such counterfactuals tell us what would - or might - have been the output of an LLM if some factual prompt ${\bf x}$ had been ${\bf x}^*$ instead. The ability to generate such counterfactuals is an important necessary step towards explaining, evaluating, and eventually improving, the behavior of LLMs. I argue, however, that the existing method rests on an ambiguous interpretation of LLMs: it does not interpret LLMs literally, for the method involves the assumption that one can change the implementation of an LLM's sampling process without changing the LLM itself, nor does it interpret LLMs as intended, for the method involves explicitly representing a nondeterministic LLM as a deterministic causal model. I here present a much simpler method for generating counterfactuals that is based on an LLM's intended interpretation by representing it as a nondeterministic causal model instead. The advantage of my simpler method is that it is directly applicable to any black-box LLM without modification, as it is agnostic to any implementation details. The advantage of the existing method, on the other hand, is that it directly implements the generation of a specific type of counterfactuals that is useful for certain purposes, but not for others. I clarify how both methods relate by offering a theoretical foundation for reasoning about counterfactuals in LLMs based on their intended semantics, thereby laying the groundwork for novel application-specific methods for generating counterfactuals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that methods for generating counterfactuals in probabilistic LLMs developed by Chatzi et al. and Ravfogel et al. rest on ambiguous interpretations of LLMs (neither literal nor intended). It proposes representing LLMs as nondeterministic causal models according to their intended interpretation, yielding a simpler method for counterfactual generation that applies directly to any black-box LLM without modification and is agnostic to implementation details. The paper further offers a theoretical foundation for counterfactual reasoning in LLMs based on intended semantics to relate the two approaches and enable novel application-specific methods.

Significance. If the central claim holds, the work supplies a conceptual clarification of LLM semantics for counterfactual reasoning and explicitly contrasts the advantages of an implementation-agnostic nondeterministic approach with prior deterministic representations. This could support broader use in explaining and improving LLM behavior while providing a foundation for future methods tailored to specific counterfactual types.

major comments (1)
  1. [Abstract] Abstract: the claim that representing an LLM as a nondeterministic causal model 'is directly applicable to any black-box LLM without modification, as it is agnostic to any implementation details' is load-bearing for the central contribution, yet the manuscript supplies no concrete procedure for extracting or simulating the required structural mechanisms and exogenous noise realizations. For a black-box model the observable is only the conditional P(output | prompt); without an explicit mapping from black-box queries to intervened nondeterministic trajectories, it remains unclear how to generate a true counterfactual (same randomness, altered prompt) rather than simply re-querying the factual distribution at x*.
minor comments (1)
  1. [Abstract] The abstract contrasts the proposed method with prior work but does not include even a high-level sketch of the steps involved in the nondeterministic representation; adding such an outline would improve accessibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed and constructive report. The central contribution of the manuscript is the conceptual clarification that LLMs are best interpreted as nondeterministic causal models under their intended semantics, which yields a simpler and more general approach to counterfactual generation. We address the referee's major comment below and will incorporate clarifications to strengthen the practical discussion.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that representing an LLM as a nondeterministic causal model 'is directly applicable to any black-box LLM without modification, as it is agnostic to any implementation details' is load-bearing for the central contribution, yet the manuscript supplies no concrete procedure for extracting or simulating the required structural mechanisms and exogenous noise realizations. For a black-box model the observable is only the conditional P(output | prompt); without an explicit mapping from black-box queries to intervened nondeterministic trajectories, it remains unclear how to generate a true counterfactual (same randomness, altered prompt) rather than simply re-querying the factual distribution at x*.

    Authors: We agree that the manuscript would benefit from greater explicitness on the translation from the nondeterministic causal model to concrete queries. Under the intended interpretation, the LLM itself constitutes the structural equation, with exogenous noise variables representing the inherent stochasticity of token sampling. A counterfactual is generated by intervening on the prompt (i.e., supplying x*) while holding the noise realization fixed; this corresponds to querying the black-box LLM with the altered prompt under any available reproducibility mechanism (e.g., a fixed random seed when the API supports it). Because the representation treats the entire sampling process as part of the nondeterministic model rather than an internal implementation detail to be altered or inspected, no modification to the LLM is required and the approach remains agnostic to architecture, training procedure, or exact decoding algorithm. This contrasts with deterministic representations that presuppose access to or alteration of the underlying deterministic components. We will add a new subsection (e.g., in Section 3 or 4) that spells out this mapping, discusses API-level reproducibility options, and notes the distinction between exact noise matching (when supported) and distributional approximation (when it is not). revision: yes

Circularity Check

0 steps flagged

No circularity: proposed nondeterministic causal model representation for black-box LLMs is conceptually independent of inputs

full rationale

The paper's central argument contrasts existing methods (Chatzi et al., Ravfogel et al.) that rely on deterministic causal models or implementation changes with a simpler alternative that treats the LLM directly as a nondeterministic causal model based on its intended semantics. This representation is claimed to enable counterfactual generation agnostic to internal details. No equations, fitted parameters, or self-referential definitions appear that would reduce the method to its own inputs by construction. The derivation draws on standard causal modeling without load-bearing self-citations, uniqueness theorems imported from the author's prior work, or ansatzes smuggled via citation. The chain remains self-contained against external causal theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal depends on domain assumptions about LLM semantics rather than new empirical data or formal proofs.

axioms (1)
  • domain assumption LLMs possess an intended interpretation as nondeterministic causal models suitable for counterfactual reasoning
    This premise is invoked to contrast the new method with existing approaches that treat LLMs differently.

pith-pipeline@v0.9.0 · 5799 in / 1197 out tokens · 40082 ms · 2026-05-18T12:48:44.982179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

  1. [1]

    The counterfactual antecedent has to hold, as is standard

  2. [2]

    Any counterfactual world in which some child variableYobtains a non- actual value despite all of its parents taking on their actual values, is excluded. The motivation for this is that the actual world offers infor- mation regarding the behavior of the nondeterministic mechanism given the actual parent values, and this behavior is identical in any world t...

  3. [3]

    This case almost comes down to stating that if the counterfactual an- tecedent is consistent with the actual values, then the only possible world is the actual world. To see why, note first that the condition says there is no variable that is both a parent and takes on a non-actual value, and second note that the leave variables that do have parents have ...

  4. [4]

    IfX ∗ werex ∗

    The fourth case states roughly that for all the counterfactual worlds failing to satisfy any of the earlier cases, we use the prior distribution for all vari- ables that have parents with non-actual values. (Note that the variables which have parents with actual values take on their actual values, by the second case.) Theorem 1Given a deterministic causal...

  5. [5]

    , t∗ 1), and thus the result follows

    =P (tk,...,t1,x,y)(t∗ i |t∗ i−1, . . . , t∗ 1), and thus the result follows. 18