pith. sign in

arxiv: 1907.09472 · v1 · pith:75A7JRZFnew · submitted 2019-07-22 · 💻 cs.AI · cs.LO

Learning Probabilities: Towards a Logic of Statistical Learning

Pith reviewed 2026-05-24 18:30 UTC · model grok-4.3

classification 💻 cs.AI cs.LO
keywords imprecise probabilitiesplausibility mapsstatistical learningbelief revisionalmost sure convergencedoxastic logicsamplingBayes rule
0
0 comments X

The pith

Agents learn unknown probabilities by updating plausibility rankings on a fixed set of measures, with beliefs converging almost surely to the true value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models situations of radical uncertainty about probabilities by combining a set of possible probability measures with a plausibility map that ranks them. Sampling updates only the plausibility map through a plausibilistic version of Bayes' rule, without shrinking or expanding the set of measures. Higher-order information instead reduces the set while leaving plausibilities fixed. Beliefs are identified with what holds in the most plausible measures, and this setup produces non-AGM belief change. The key result is that repeated sampling makes the agent's beliefs converge almost surely to the actual probability.

Core claim

The central claim is that the beliefs obtained by repeated sampling converge almost surely to the correct belief in the true probability, because the sampling update revises only the plausibility map while leaving the given set of measures unchanged.

What carries the argument

Plausibility map over a set of probability measures, updated by a plausibilistic Bayes rule that re-ranks measures without altering the set.

If this is right

  • Belief change from sampling violates the standard AGM axioms.
  • Higher-order linear inequalities shrink the set of measures without affecting their plausibility ordering.
  • Beliefs are defined as propositions true in all most-plausible measures.
  • The model supports a dynamic doxastic logic that combines sampling and higher-order updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of set contraction and plausibility re-ranking could be tested in sequential decision tasks where agents must act before full convergence occurs.
  • If plausibility is defined via entropy or center-of-mass, the convergence speed may vary with the choice of ranking function, providing a testable prediction for simulations.
  • The framework suggests a way to reconcile imprecise probabilities with pointwise learning without forcing the agent to adopt a single prior at the outset.

Load-bearing premise

The update induced by sampling is a plausibilistic version of Bayes' Rule that changes only the plausibility map while leaving the given set of measures unchanged.

What would settle it

A sequence of independent draws from a fixed distribution in which the most plausible measure after many samples fails to approach the empirical frequencies obtained from those draws.

read the original abstract

We propose a new model for forming beliefs and learning about unknown probabilities (such as the probability of picking a red marble from a bag with an unknown distribution of coloured marbles). The most widespread model for such situations of 'radical uncertainty' is in terms of imprecise probabilities, i.e. representing the agent's knowledge as a set of probability measures. We add to this model a plausibility map, associating to each measure a plausibility number, as a way to go beyond what is known with certainty and represent the agent's beliefs about probability. There are a number of standard examples: Shannon Entropy, Centre of Mass etc. We then consider learning of two types of information: (1) learning by repeated sampling from the unknown distribution (e.g. picking marbles from the bag); and (2) learning higher-order information about the distribution (in the shape of linear inequalities, e.g. we are told there are more red marbles than green marbles). The first changes only the plausibility map (via a 'plausibilistic' version of Bayes' Rule), but leaves the given set of measures unchanged; the second shrinks the set of measures, without changing their plausibility. Beliefs are defined as in Belief Revision Theory, in terms of truth in the most plausible worlds. But our belief change does not comply with standard AGM axioms, since the revision induced by (1) is of a non-AGM type. This is essential, as it allows our agents to learn the true probability: we prove that the beliefs obtained by repeated sampling converge almost surely to the correct belief (in the true probability). We end by sketching the contours of a dynamic doxastic logic for statistical learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a model for beliefs about unknown probabilities under radical uncertainty by equipping a set of probability measures with a plausibility map. It distinguishes two learning operations: type (1) sampling updates only the plausibility map via a plausibilistic Bayes rule while leaving the set of measures fixed; type (2) higher-order linear inequalities shrink the set without altering plausibilities. Beliefs are defined as properties true in the most plausible measures (in the style of belief revision). The central claim is a proof that repeated type-(1) updates yield beliefs that converge almost surely to the true probability; the paper notes that the induced revision violates standard AGM axioms, which is presented as essential for the convergence result, and sketches a dynamic doxastic logic for statistical learning.

Significance. If the convergence result holds, the framework provides a technically novel bridge between imprecise probabilities and non-AGM belief revision that permits agents to learn the true measure from sampling. The explicit use of a fixed set plus plausibility map, together with the non-AGM revision operator, is a substantive contribution to the logic of statistical learning and could inform work in AI epistemology and dynamic epistemic logic.

major comments (2)
  1. [Abstract / learning type (1) paragraph] Abstract and the paragraph describing learning type (1): the convergence claim ('beliefs obtained by repeated sampling converge almost surely to the correct belief (in the true probability)') is stated without any mention that the true measure must belong to the initial fixed set of measures. Because type-(1) learning is defined to leave the set unchanged, convergence to the true measure is impossible unless that measure is already present; this assumption is load-bearing for the central theorem yet is neither stated nor justified.
  2. [Model definition and learning type (1)] The definition of beliefs (most plausible measures within the fixed set) together with the invariance of the set under sampling implies that the model can only ever represent beliefs inside the initial set. The manuscript should therefore either add the explicit hypothesis that the true measure lies in the initial set or qualify the convergence statement to 'convergence to the most plausible measure within the initial set that is closest to the true one.'
minor comments (1)
  1. [Introduction / model section] The examples of plausibility maps (Shannon Entropy, Centre of Mass, etc.) are listed without formal definitions or citations; adding precise mathematical statements would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments correctly identify an implicit modeling assumption that was not made explicit in the abstract or model description. We will revise the manuscript to address both points.

read point-by-point responses
  1. Referee: [Abstract / learning type (1) paragraph] Abstract and the paragraph describing learning type (1): the convergence claim ('beliefs obtained by repeated sampling converge almost surely to the correct belief (in the true probability)') is stated without any mention that the true measure must belong to the initial fixed set of measures. Because type-(1) learning is defined to leave the set unchanged, convergence to the true measure is impossible unless that measure is already present; this assumption is load-bearing for the central theorem yet is neither stated nor justified.

    Authors: We agree that the assumption is load-bearing and was not stated. The framework is designed for agents whose initial set of measures includes the true probability (as is standard when modeling radical uncertainty over an unknown distribution). We will revise the abstract and the type-(1) description to state this hypothesis explicitly and note that it is required for the almost-sure convergence result. revision: yes

  2. Referee: [Model definition and learning type (1)] The definition of beliefs (most plausible measures within the fixed set) together with the invariance of the set under sampling implies that the model can only ever represent beliefs inside the initial set. The manuscript should therefore either add the explicit hypothesis that the true measure lies in the initial set or qualify the convergence statement to 'convergence to the most plausible measure within the initial set that is closest to the true one.'

    Authors: We agree that beliefs remain inside the initial set. We will add the explicit hypothesis that the true measure belongs to the initial set (rather than qualifying the convergence claim), as this matches the intended modeling choice: the agent considers a set of possible measures that includes the unknown true probability. The revision will appear in the model-definition section and will be cross-referenced in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity; convergence is a stated theorem under standard model assumptions

full rationale

The paper defines a fixed set of probability measures that sampling does not alter, updating only a plausibility ordering via a non-AGM rule. The central result is an almost-sure convergence theorem for beliefs (defined via most-plausible measures) to the true probability. This is presented as a proof relying on external measure-theoretic probability, not as a re-derivation or fit of the input data. No equation reduces the claimed limit to a parameter fitted from the same data, no self-citation supplies a uniqueness theorem that forces the result, and the fixed-set modeling choice is an explicit modeling decision rather than a definitional loop. The derivation therefore remains independent of its target conclusion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the model rests on the existence of a plausibility map and a specific non-standard update rule whose justification is not supplied.

axioms (1)
  • domain assumption Plausibilistic version of Bayes' Rule updates only the plausibility map while the set of measures stays fixed
    Stated in the abstract as the mechanism for type-(1) learning.

pith-pipeline@v0.9.0 · 5838 in / 1179 out tokens · 23930 ms · 2026-05-24T18:30:11.533435+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Decision makers as statisticians: diversity,ambiguity, and learning

    Al-Najjar, N., “Decision makers as statisticians: diversity,ambiguity, and learning”, Econometrica, 77(5): 1339–1369, 2009. doi:10.3982/ECTA7501

  2. [2]

    The Logic of Justified Belief, Explicit Knowledge and Conclusive Evidence

    Baltag, A., Renne, B., and Smets, S., “The Logic of Justified Belief, Explicit Knowledge and Conclusive Evidence”, Annals of Pure and Applied Logic, 165(1): 49–81, 2014. doi:10.1016/j.apal.2013.07.005

  3. [3]

    Stud Logica (2018)

    Baltag, A., Gierasimczuk, N., and Smets, S. Stud Logica (2018). doi:10.1007/s11225-018-9812-x

  4. [4]

    Dynamic Logic of Belief Revision

    van Benthem, J., “Dynamic Logic of Belief Revision”, Journal for Applied Non-Classical Logics , 17(2): 129–155, 2007. doi:10.3166/jancl.17.129-155 3To be more precise, if one starts with a prior probability for an event A, and keeps updating this probability by condition- alising on new evidence, then almost surely, the conditional probability of A conver...

  5. [5]

    Dynamic interactive epistemology

    Board, O., “Dynamic interactive epistemology”, in Games and Economic Behavior ,49: 49–80, 2004. doi:10.1016/j.geb.2003.10.006

  6. [6]

    Types of Uncertainty

    Bradley, R., & Drechsler, M., “ Types of Uncertainty”, in Erkenntnis, 79: 1225–1248, 2014. doi:10.1007/s10670-013-9518-4

  7. [7]

    doi:10.1007/s10670-013-9529-1

    Bradley, S., & Steele, K., “Uncertainty, Learning and the ’Problem’ of Dilation“, in Erkenntnis, 79: 1287– 1303, 2014. doi:10.1007/s10670-013-9529-1

  8. [8]

    Ambiguity and robust statistics

    Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., and Montrucchio, L., “Ambiguity and robust statistics”, Journal of Economics Theory, 148: 974–1049, 2013. doi:10.1016/j.jet.2012.10.003

  9. [9]

    Subjective Probabilities Need Not Be Sharp

    Chandler, J., “Subjective Probabilities Need Not Be Sharp”, in Erkenntnis, 79: 1273–1286, 2014. doi:10.1007/s10670-013-9597-2

  10. [10]

    What Is a Martingale?

    Doob,J. L. “What Is a Martingale?” in American Mathematical Monthly 78:451-462, 1971. doi:10.2307/2317751

  11. [11]

    Bayesian Statistical Inference for Psychological Research

    Edwards, W., Lindman, R, and Savage, L. J., “Bayesian Statistical Inference for Psychological Research”, in Psychological Review 70: 193-242, 1963. doi:10.1007/978-1-4612-0919-5 34

  12. [12]

    Bayes or Bust: A Critical Examination of Bayesian Confirmation theory

    Earman, J. “ Bayes or Bust: A Critical Examination of Bayesian Confirmation theory”, MIT press, 1992

  13. [13]

    Probabilities over Rich Languages

    Gaifman, H., & Snir, M.“Probabilities over Rich Languages”, Journal of Symbolic Logic47:495-548, 1982. doi:10.2307/2273587

  14. [14]

    Maxmin expected utility with non-unique prior

    Gilboa, I., and Schmeidler, D. “Maxmin expected utility with non-unique prior”, J. Math. Econ. 18: 141– 153, 1989. doi:10.1016/0304-4068(89)90018-9

  15. [15]

    Rationality and Indeterminate Probabilities

    Hajek, A., & Smithson, M., “Rationality and Indeterminate Probabilities”, in Synthese, 187: 33–48, 2012. doi:10.1007/s11229-011-0033-3

  16. [16]

    Minimax test and neyman-pearson lemma for capacities

    Huber, P. J., and Strassen, V ., “Minimax test and neyman-pearson lemma for capacities”, The Annals of Statistics, 1:251–263, 1973. doi:10.1214/aos/1176342363

  17. [17]

    K., An Introduction to Real Analysis , https://www.math.ucdavis.edu/ hunter/m125a/intro analy- sis.pdf

    Hunter, J. K., An Introduction to Real Analysis , https://www.math.ucdavis.edu/ hunter/m125a/intro analy- sis.pdf

  18. [18]

    Decision making

    Huntley, N., Hable, R., & and Troffaes, M., “Decision making”, in Augustin et al, 2014

  19. [19]

    A smooth model of decision making under ambiguity

    Klibanoff, P., Marinacci, M., and Mukerji, S., “A smooth model of decision making under ambiguity”, Econometrica, 73: 1849–1892, 2005. doi:10.1111/j.1468-0262.2005.00640.x

  20. [20]

    Imprecision and Indeterminacy in Probability Judgment

    Levi, I., “Imprecision and Indeterminacy in Probability Judgment”, Philosophy of Science , 52:390–409,

  21. [21]

    Inference Processes for Quantified Predicate Knowledge

    Paris, J. & Rafiee Rad, S., “Inference Processes for Quantified Predicate Knowledge”, Logic, Language, Information and Computation , Eds. W. Hodges & R. de Queiroz, Springer LNAI, 5110: 249–259, 2008. doi:10.1007/978-3-540-69937-8 22

  22. [22]

    A Note On The Least Informative Model of A Theory

    Paris, J. & Rafiee Rad, S., “A Note On The Least Informative Model of A Theory”, in Programs, Proofs, Processes , CiE 2010 , Eds. F. Ferreira, B. Lowe, E. Mayordomo, & L. Mendes Gomes, Springer LNCS, 6158: 342–351, 2010. doi:10.1007/978-3-642-13962-8 38

  23. [23]

    In defence of the maximum entropy inference process

    Paris, J.B. & Vencovska, “In defence of the maximum entropy inference process”, in International Journal of Approximate Reasoning,17(1): 77-103, 1997. doi:10.1016/S0888-613X(97)00014-5

  24. [24]

    What you see is what you get

    Paris, J. B., “What you see is what you get”, in Entropy,16: 6186?6194, 2014. doi:10.3390/e16116186

  25. [25]

    Equivocation Axiom for First Order Languages

    Rafiee Rad, S., “Equivocation Axiom for First Order Languages”, in em Studia Logica, 105(21), 2017. doi:10.1007/s11225-016-9684-x

  26. [26]

    Subjective probability and expected utility without additivity

    Schmeidler, D., “Subjective probability and expected utility without additivity”, Econometrica, 57(3):571– 587, 1989. doi:10.2307/1911053

  27. [27]

    Integral representation without additivity

    Schmeidler, D., “Integral representation without additivity”, Proceedings of the American Mathematical Society, 97(2), 1986. doi:10.1090/S0002-9939-1986-0835875-8 46 Learning Probabilities

  28. [28]

    Decision making under uncertainty using imprecise probabilities

    Troffaesin, C. M., “Decision making under uncertainty using imprecise probabilities”, inInternational Jour- nal of Approximate Reasoning, 45:17-29, 2007. doi:10.1016/j.ijar.2006.06.001

  29. [29]

    doi:10.1016/j.jal.2013.03.006

    Williamson, J., “From Bayesian epistemology to inductive logic, in Journal of Applied Logic , 2, 2013. doi:10.1016/j.jal.2013.03.006

  30. [30]

    doi:10.1016/j.jalgor.2008.07.001

    Williamson, J., “Objective Bayesian probabilistic logic, in Journal of Algorithms in Cognition, Informatics and Logic, 63:167-183, 2008. doi:10.1016/j.jalgor.2008.07.001

  31. [31]

    Williamson, J., In Defence of Objective Bayesianism, Oxford University Press, 2010

  32. [32]

    Inferences from Multinomal Data: Learning about a bag of marbles

    Walley, P. “Inferences from Multinomal Data: Learning about a bag of marbles”, in Journal of the Royal Statistical Society Series B, 58:3-57, 1996. doi:10.1111/j.2517-6161.1996.tb02065.x

  33. [33]

    Towards a unified theory of imprecise probability

    Walley, P. “Towards a unified theory of imprecise probability”, in International Journal of Approximate Reasoning, 24(2): 125-148, 2000. doi:10.1016/S0888-613X(00)00031-1

  34. [34]

    Modeling vague beliefs using fuzzy-valued belief structures

    Denoeux, T., “Modeling vague beliefs using fuzzy-valued belief structures”, in Fuzzy Sets and Systems , 116(2):167-199, 2000. doi:10.1016/S0165-0114(98)00405-9

  35. [35]

    Radical Uncertainty: Beyond Probabilistic Models of Belief

    Romeijn, J-W. & Roy, O., “Radical Uncertainty: Beyond Probabilistic Models of Belief”, in Erkenntnis, 79(6):1221–1223, 2014. doi:10.1007/s10670-014-9687-9

  36. [36]

    Resolving Peer Disagreements Through Imprecise Probabilities

    Elkin, L. & Wheeler, G., “Resolving Peer Disagreements Through Imprecise Probabilities”, in Nous, forth- coming. doi:10.1111/nous.12143

  37. [37]

    Scoring Imprecise Credences: A Mildly Immodest Proposal

    Mayo-Wilson, C. & Wheeler, G. “Scoring Imprecise Credences: A Mildly Immodest Proposal” , in Philos- ophy and Phenomenological Research, 93(1): 55?78, 2016. doi:10.1111/phpr.12256

  38. [38]

    A contrast between two decision rules for use with (convex) sets of prob- abilities: Gamma-maximin versus E-admissibility

    Seidenfeld, t., “A contrast between two decision rules for use with (convex) sets of prob- abilities: Gamma-maximin versus E-admissibility”, in Synthese, 140:69–88, 2004. doi: 10.1023/B:SYNT.0000029942.11359.8d

  39. [39]

    Rudin, W., Principles of Mathematical Analysis, McGraw-Hill Inc, 1953

  40. [40]

    Savage, L. J. Foundations of Statistics, New York: John Wiley, 1954

  41. [41]

    Coherent choice functions under uncertainty

    Seidenfeld, T., Schervish, M. J., & Kadane, J. B., “Coherent choice functions under uncertainty”, in Syn- these, 172: 157–176, 2010. doi:10.1007/s11229-009-9470-7

  42. [42]

    Decision-making under indeterminacy

    Williams, J. & Robert, G., “Decision-making under indeterminacy”, in Philosophers’ Impreint, 14:1–34, 2014. Appendix Proof of proposition 5: Proof. Let µ∈ M, then (plao j )o′k (µ) = plao j (µ) ˆµ(o′k) = pla(µ) ˆµ(o j). ˆµ(o′k) = pla(µ) ˆµ(o j∩ o′k) where the last equality follows from the independence assumption in iid case. ■ Proof of proposition 7: Proo...