Optimal information deletion and Bayes' theorem

H{\aa}vard Rue; Hans Montcho

arxiv: 2602.09061 · v2 · pith:LREFTSIInew · submitted 2026-02-08 · 📊 stat.ME · cs.IT· math.IT· math.ST· stat.TH

Optimal information deletion and Bayes' theorem

Hans Montcho , H{\aa}vard Rue This is my paper

Pith reviewed 2026-05-21 14:05 UTC · model grok-4.3

classification 📊 stat.ME cs.ITmath.ITmath.STstat.TH

keywords Bayes theoreminformation deletionleave-data-out posterioroptimal update ruleantedata distributionvariational inferenceposterior updating

0 comments

The pith

Bayes' theorem gives the unique rule for deleting data that neither destroys nor creates nonexistent information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines an optimal information deletion rule as any update from posterior to antedata distribution that removes a portion of the data without destroying existing information or inventing new information that was not there. It then proves that this rule must be exactly the posterior obtained by applying Bayes' theorem to the remaining data alone. A reader should care because the result supplies an independent justification for Bayes' theorem grounded in deletion rather than in the usual addition of evidence. The argument revisits Zellner's earlier variational view of Bayes' theorem but shifts the focus to what happens when information is taken away.

Core claim

A rule that updates a posterior distribution into an antedata distribution when a portion of data is removed is optimal if and only if it neither destroys nor creates nonexistent information; the paper proves that this optimal rule is identical to the leave-data-out posterior obtained directly from Bayes' theorem.

What carries the argument

The optimal information deletion rule, defined solely by the requirement that it neither destroys existing information nor creates nonexistent information when data are removed.

If this is right

The leave-data-out posterior is the only update that preserves information exactly when data are removed.
Bayes' theorem can be characterized as the optimal deletion operator in the same way it was earlier characterized as the optimal addition operator.
Any alternative deletion procedure must either lose some information that was present or introduce information that was not supported by the remaining data.
The result extends the variational formulation of Bayes' theorem from information addition to information deletion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same optimality criterion could be used to define deletion rules in non-Bayesian settings where a posterior is replaced by some other summary.
The framework might suggest new diagnostics for model misspecification by checking whether observed deletions behave like the optimal rule.
Extensions to sequential or streaming data removal could follow the same uniqueness argument.

Load-bearing premise

That the property of neither destroying nor creating nonexistent information is enough by itself to pick out a single, unique update rule without already assuming the form of the Bayes posterior.

What would settle it

Any concrete data set and prior where a deletion update that satisfies the no-destruction/no-creation condition produces a distribution different from the leave-data-out posterior computed with Bayes' theorem.

read the original abstract

Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule, a result that led to the variational formulation of Bayes' theorem, and a central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules that update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule that does not destroy or create nonexistent information is called the optimal information deletion rule and we prove that it coincides with the leave-data-out posterior from Bayes' theorem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript revisits Zellner's 1980s work on Bayes' theorem as an optimal information-processing rule, shifting focus to information deletion. It considers rules that update a posterior to an 'antedata' distribution after removing a portion of data. A rule that neither destroys nor creates nonexistent information is termed the optimal information deletion rule; the central claim is a proof that this rule coincides exactly with the leave-data-out posterior obtained from standard Bayes' theorem.

Significance. If the result holds, the paper supplies an independent characterization of Bayesian updating via an information-deletion optimality criterion. This could strengthen the variational and generalized-variational foundations of Bayes' theorem, offer a new lens on data-removal operations in statistical models, and connect to existing work on information measures in inference.

major comments (2)

[Abstract] Abstract (paragraph defining the optimal rule): the optimality criterion is stated solely as 'neither destroy nor create nonexistent information,' yet no explicit information measure (e.g., a divergence or entropy functional), set of axioms, or uniqueness argument is supplied. Without these, it is unclear whether the criterion is independent of the target leave-data-out posterior or sufficient to single out a unique update map; this directly affects whether the subsequent proof establishes coincidence or presupposes Bayesian structure.
[Abstract] Abstract (statement of the proof): the claim that the optimal deletion rule 'coincides with the leave-data-out posterior from Bayes' theorem' is load-bearing for the entire contribution. Because the abstract provides no equations formalizing 'nonexistent information' or the deletion operator, the independence of the optimality definition from Bayes' theorem cannot be verified; a concrete counter-example or alternative update map satisfying the same verbal criterion would falsify uniqueness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. The feedback highlights important points about the presentation of the optimality criterion and the proof in the abstract. We have revised the abstract to include brief references to the explicit information measure and formal definitions used in the main text, while preserving its conciseness. Below we address each major comment point by point.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph defining the optimal rule): the optimality criterion is stated solely as 'neither destroy nor create nonexistent information,' yet no explicit information measure (e.g., a divergence or entropy functional), set of axioms, or uniqueness argument is supplied. Without these, it is unclear whether the criterion is independent of the target leave-data-out posterior or sufficient to single out a unique update map; this directly affects whether the subsequent proof establishes coincidence or presupposes Bayesian structure.

Authors: We agree that the abstract, as a high-level summary, does not spell out the information measure or axioms. In the full manuscript these are defined explicitly using a relative-entropy functional to quantify nonexistent information, together with a set of axioms for the deletion operator that make no reference to Bayes' theorem. The uniqueness argument is developed in the body of the paper by showing that any map satisfying the axioms must reproduce the leave-data-out posterior. To address the referee's concern we have added a short clause to the abstract directing readers to the relevant definitions and axioms in Section 2. revision: yes
Referee: [Abstract] Abstract (statement of the proof): the claim that the optimal deletion rule 'coincides with the leave-data-out posterior from Bayes' theorem' is load-bearing for the entire contribution. Because the abstract provides no equations formalizing 'nonexistent information' or the deletion operator, the independence of the optimality definition from Bayes' theorem cannot be verified; a concrete counter-example or alternative update map satisfying the same verbal criterion would falsify uniqueness.

Authors: The abstract omits equations for brevity, but the manuscript supplies the precise operator and the formalization of nonexistent information via the chosen divergence. The proof proceeds from the independent optimality axioms to derive the leave-data-out posterior without circularity. We have inserted a parenthetical reference in the revised abstract to the formal statement of the deletion operator. Because the paper contains a uniqueness proof under the stated axioms, we do not supply counter-examples; any alternative map that appears to satisfy the verbal criterion is shown in the derivations to violate the no-creation/no-destruction condition. revision: yes

Circularity Check

0 steps flagged

No significant circularity: optimality criterion defined independently before equivalence proof

full rationale

The paper defines the optimal information deletion rule via the standalone property of neither destroying nor creating nonexistent information, then separately proves equivalence to the leave-data-out Bayes posterior. This structure keeps the input criterion independent of the target result, with no reduction by construction, no fitted inputs renamed as predictions, and no load-bearing self-citations or imported uniqueness theorems. The derivation chain remains self-contained against external benchmarks such as Zellner's prior work on information processing, without the central claim collapsing into its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work is primarily a mathematical proof extending prior concepts from Zellner without introducing new free parameters or postulated entities.

axioms (2)

standard math Axioms of probability theory
Used to define distributions and updates.
standard math Properties of information measures such as entropy or KL divergence
Likely used to quantify 'destroy or create information'.

pith-pipeline@v0.9.0 · 5624 in / 1216 out tokens · 88169 ms · 2026-05-21T14:05:57.514333+00:00 · methodology

Optimal information deletion and Bayes' theorem

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)