What is... a Markov basis?

Sonja Petrovi\'c

arxiv: 1907.07320 · v1 · pith:DA5F2FXDnew · submitted 2019-07-16 · 🧮 math.ST · math.AC· stat.ME· stat.TH

What is... a Markov basis?

Sonja Petrovi\'c This is my paper

Pith reviewed 2026-05-24 20:55 UTC · model grok-4.3

classification 🧮 math.ST math.ACstat.MEstat.TH

keywords Markov basisalgebraic statisticscontingency tablestoric idealfiber connectednesslattice kernelMCMC moves

0 comments

The pith

A Markov basis is a finite set of integer vectors that connects every pair of non-negative integer solutions to Ax = b for any fixed b.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper supplies a self-contained definition of a Markov basis written for readers whose main background is in pure mathematics. It presents the object as a generating set for the integer kernel of a matrix whose columns index the cells of a contingency table. A sympathetic reader would care because the same object turns a statistical sampling problem into an algebraic one: the moves allow one to travel between all tables that share the same margins without leaving the non-negative lattice. The definition is given directly in terms of fibers and connectedness of graphs on those fibers. This framing makes the statistical use of toric ideals immediately legible to algebraists.

Core claim

A Markov basis for an integer matrix A is any finite subset B of the integer kernel of A such that, for every right-hand side vector b, the graph whose vertices are the non-negative integer solutions to Ax = b and whose edges correspond to adding or subtracting an element of B is connected.

What carries the argument

Markov basis: the finite set of moves that makes the fiber graph connected for every margin vector b.

If this is right

Any two tables with the same margins can be reached from each other by a sequence of additions and subtractions of basis elements.
The moves generate the lattice kernel and therefore correspond to generators of the associated toric ideal.
Markov-chain Monte Carlo algorithms that use only these moves produce samples from the conditional distribution given the margins.
The same construction applies to any toric model whose design matrix is A.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same connectedness property could be checked algorithmically for small tables by enumerating fibers.
Textbooks on commutative algebra could add this definition as a concrete application of toric ideals to discrete statistics.
The minimal size of a Markov basis for a given model remains an open computational question that algebraists are now equipped to attack.

Load-bearing premise

That the algebraic definition of a Markov basis can be stated and motivated without assuming the reader already knows contingency-table models or conditional inference.

What would settle it

A pure mathematician reads the definition, then cannot exhibit even one element of a Markov basis for the independence model on a 2-by-2 table or verify that it connects the two tables with margins (1,1) and (1,1).

read the original abstract

This short piece defines a Markov basis. The aim is to introduce the statistical concept to mathematicians.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short, accurate definition note with no new results or claims.

read the letter

The paper states the standard definition of a Markov basis from algebraic statistics and tries to make it readable for pure mathematicians. That is the entire contribution. It does not derive anything, prove anything, or apply the concept to new problems. The text is short and sticks to the definition plus basic context, which keeps it focused. The presentation appears clean and avoids unnecessary statistical machinery, which matches the stated goal of accessibility. No circular reasoning or hidden assumptions show up in the definitional material. The main limitation is that nothing is new. The definition is already in the literature the paper cites, and the note adds no fresh insight, example, or connection that would change how someone uses the concept. Because the work is purely expository, questions about empirical soundness or formal verification do not apply. A mathematician who has never encountered the term might find the self-contained wording useful as a first look. Anyone already reading papers in algebraic statistics will not need it. The piece is not substantial enough for a research journal. It could sit in an expository column or newsletter if the venue has one, but it does not require referee time. I would not bring it to a reading group or cite it.

Referee Report

0 major / 1 minor

Summary. The manuscript is a short expository note whose central claim is that the standard definition of a Markov basis from algebraic statistics can be stated in a self-contained manner that is accessible and meaningful to readers whose primary background is in pure mathematics rather than statistics.

Significance. If the presentation succeeds, the note provides a concise bridge between algebraic statistics and pure mathematics by making the definition of Markov bases available without requiring statistical prerequisites. The paper's strength is its explicit focus on a definitional exposition with no derivations, predictions, or fitted quantities, which aligns with the expository goal and avoids any internal inconsistency or circularity.

minor comments (1)

The abstract could more explicitly indicate the target audience (pure mathematicians) and note that the exposition is limited to the definition itself.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The report contains no major comments requiring a point-by-point response.

Circularity Check

0 steps flagged

No significant circularity; purely expository definition

full rationale

The paper is an expository note whose sole purpose is to state the standard definition of a Markov basis from algebraic statistics in language accessible to pure mathematicians. No derivations, predictions, fitted quantities, or deductive chains exist in the manuscript. The central content is definitional rather than deductive, with no self-citations serving as load-bearing premises that reduce any claim to its own inputs by construction. The presentation is self-contained against external benchmarks and does not invoke any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, axioms, or invented entities because it is purely expository and definitional.

pith-pipeline@v0.9.0 · 5517 in / 909 out tokens · 23069 ms · 2026-05-24T20:55:45.640331+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A Markov basis for the model is a set of vectors {b1,...,bn} subset kerZ A such that for every pair u,v with Au=Av there exists a choice of basis vectors satisfying u + bi1 + ... + biN = v with each partial sum non-negative.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem [5]. A set of vectors is a Markov basis if and only if the corresponding set of binomials {x^{b_i^+} - x^{b_i^-}} generates the toric ideal IA.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.