pith. sign in

arxiv: 2503.23340 · v3 · submitted 2025-03-30 · 🧮 math.PR · cs.IT· math.CO· math.IT

Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization

Pith reviewed 2026-05-22 22:21 UTC · model grok-4.3

classification 🧮 math.PR cs.ITmath.COmath.IT
keywords multivariate Markov chainssubmodular optimizationentropy ratecoordinate selectionpartition selectiongreedy algorithmsinformation-theoretic criteria
0
0 comments X

The pith

Information-theoretic criteria for coordinate subsets and partitions in multivariate Markov chains exhibit submodular structure, enabling greedy algorithms with guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes methods for selecting a subset of coordinates or an optimal partition from a finite ergodic multivariate Markov chain to optimize criteria such as entropy rate or distance to factorizability while respecting cardinality constraints. These selection tasks are cast as best-subset or best-partition problems whose objectives inherit (k-)submodular or (k-)supermodular properties, which in turn justify the use of greedy algorithms that come with approximation guarantees. A generalized distorted greedy algorithm is derived along the way. The approach is applied to standard interaction models to produce lower-dimensional chains that incur minimal information loss relative to the original process.

Core claim

We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest.

What carries the argument

The (k-)submodular or (k-)supermodular structures of information-theoretic objectives (entropy rate, distance to factorizability, independence, stationarity) that justify greedy selection under cardinality constraints.

If this is right

  • Greedy algorithms produce subsets and partitions with provable performance bounds for the listed criteria.
  • The generalized distorted greedy algorithm supplies a new subroutine for cardinality-constrained submodular problems.
  • Numerical experiments on Bernoulli-Laplace and Curie-Weiss chains confirm that the selected projections incur limited information loss.
  • The same structural arguments apply to any finite ergodic multivariate chain once the submodularity of the chosen criterion is verified.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be tested on empirical transition matrices estimated from time-series data in domains such as gene networks or climate variables.
  • If the submodularity persists under mild perturbations of the transition probabilities, the algorithms remain useful for noisy or estimated chains.
  • Partition selection may connect to model-reduction techniques that preserve stationarity or mixing properties beyond the four criteria studied here.

Load-bearing premise

The listed information-theoretic criteria actually possess the claimed (k-)submodular or (k-)supermodular structure for finite ergodic multivariate chains.

What would settle it

For a small multivariate chain whose state space permits exhaustive enumeration of all subsets of a given size, compare the subset returned by the greedy algorithm against the true optimum for one of the criteria such as entropy-rate minimization.

Figures

Figures reproduced from arXiv: 2503.23340 by Michael C.H. Choi, Zheyuan Lai.

Figure 1
Figure 1. Figure 1: The leave-one-out mixing analysis of the Curie-Weiss model ( [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical CDF comparison of MCMC simulation. Here, “Tensor samples” refer to the 1000 [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Entropy rate against subset size for the three algorithms (B–L model). [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Entropy rate against subset size for the three algorithms (C–W model). [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distance to factorizability against subset size for the three algorithms [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distance to independence against subset size for the three algorithms (B–L model). [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distance to independence against subset size for the three algorithms (C–W model). [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distance to independence of the complement set against subset size. [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distance to stationarity of the output against subset size. [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Performance evaluation of the generalized distorted greedy algorithm. [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distance to stationarity of the complement set against subset size. [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Performance evaluation of the batch greedy algorithm. [PITH_FULL_IMAGE:figures/full_fig_p038_12.png] view at source ↗
read the original abstract

We study the problem of optimally projecting the transition matrix of a finite ergodic multivariate Markov chain onto a lower-dimensional state space, as well as the problem of finding an optimal partition of coordinates such that the factorized Markov chain gives minimal information loss compared to the original multivariate chain. Specifically, we seek to construct a Markov chain that optimizes various information-theoretic criteria under cardinality constraints. These criteria include entropy rate, information-theoretic distance to factorizability, independence, and stationarity. We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest. Finally, we illustrate the theory and algorithms through extensive numerical experiments with publicly available code on multivariate Markov chains associated with the Bernoulli--Laplace and Curie--Weiss models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript formulates coordinate-subset selection and coordinate-partition selection for finite ergodic multivariate Markov chains as cardinality-constrained optimization problems whose objectives are various information-theoretic functionals (entropy rate, distance to factorizability, independence, stationarity). It asserts that these functionals are (k-)submodular or (k-)supermodular, invokes the resulting greedy algorithms (including a generalized distorted-greedy procedure) to obtain approximation guarantees, and illustrates the approach on Bernoulli-Laplace and Curie-Weiss models.

Significance. If the asserted (k-)submodularity properties hold for the entropy-rate and stationarity criteria on finite ergodic chains, the work supplies a principled algorithmic framework for dimension reduction and factorization of multivariate Markov chains together with explicit approximation guarantees; the introduction of a generalized distorted-greedy algorithm is a secondary contribution that may be of independent interest.

major comments (3)
  1. [Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.
  2. [Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.
  3. [§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).
minor comments (2)
  1. [Abstract] Notation for the projected transition matrix and the restricted state space should be introduced once and used consistently; the current abstract uses several informal phrases for the same object.
  2. [Numerical experiments] The numerical experiments section should report the precise cardinality constraints used and the wall-clock times of the greedy versus exhaustive baselines so that the practical scaling can be assessed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We will address each major comment by providing the requested proofs and details in a revised version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.

    Authors: We agree that the submodularity of the entropy-rate and stationarity functionals is central to our claims. Upon review, we recognize that the explicit proof of the diminishing-returns inequality was not sufficiently detailed in the manuscript. In the revised version, we will add a dedicated subsection deriving the (k-)submodularity property for the projected transition matrix under these criteria, including the step-by-step verification of the inequality. revision: yes

  2. Referee: [Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.

    Authors: We acknowledge this point. The manuscript asserts the properties but the verification on the partition lattice may require more explicit treatment. We will include proofs or verifications for the (k-)submodularity/supermodularity of the distance-to-factorizability and independence criteria in the revision to fully justify the algorithms. revision: yes

  3. Referee: [§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).

    Authors: We will revise the algorithmic section to provide the complete statement of the generalized distorted-greedy algorithm, its convergence analysis, and the explicit expression for the distortion parameter, including how it depends on the submodularity ratio and curvature constants specific to our information measures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; submodularity claims and greedy guarantees rest on external combinatorial results plus internal proofs.

full rationale

The derivation formulates coordinate subset and partition selection as cardinality-constrained optimization of entropy rate, distance to factorizability, independence and stationarity. It invokes (k-)submodular or (k-)supermodular structure to obtain guarantees via a generalized distorted greedy algorithm introduced in the paper. No quoted step reduces a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is unverified. The structural claims are presented as properties to be verified for finite ergodic chains and are not shown to be tautological with the algorithm outputs. This is the normal case of a paper whose central technical content is independent of its own fitted values or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unverified assertion that the objective functions are submodular.

pith-pipeline@v0.9.0 · 5710 in / 1028 out tokens · 36485 ms · 2026-05-22T22:21:39.666025+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Springer, 2016

    Anton Bovier and Frank Den Hollander.Metastability: a potential-theoretic approach, volume 351. Springer, 2016. 38

  2. [2]

    Choi, Youjia Wang, and Geoffrey Wolfer

    Michael C.H. Choi, Youjia Wang, and Geoffrey Wolfer. Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration.arXiv preprint arXiv:2404.12589, 2024

  3. [3]

    Mehta, and Sean P

    Kun Deng, Prashant G. Mehta, and Sean P. Meyn. Optimal Kullback-Leibler aggregation via spectral theory of Markov chains.IEEE Trans. Automat. Control, 56(12):2793–2808, 2011

  4. [4]

    Streaming algorithm for monotonek-submodular maximization with cardinality constraints

    Alina Ene and Huy Nguyen. Streaming algorithm for monotonek-submodular maximization with cardinality constraints. InInternational Conference on Machine Learning, pages 5944–5967. PMLR, 2022

  5. [5]

    Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

    Michael F Faulkner and Samuel Livingstone. Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

  6. [6]

    Maximizing non-monotone submodular functions

    Uriel Feige, Vahab S Mirrokni, and Jan Vondr´ ak. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011

  7. [7]

    Polymatroidal dependence structure of a set of random variables.Inform

    Satoru Fujishige. Polymatroidal dependence structure of a set of random variables.Inform. and Control, 39(1):55–72, 1978

  8. [8]

    Geiger and Christoph Temmel

    Bernhard C. Geiger and Christoph Temmel. Lumpings of Markov chains, entropy rate preservation, and higher-order lumpability.Journal of Applied Probability, 51(4):1114–1132, 2014

  9. [9]

    Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications

    Chris Harshaw, Moran Feldman, Justin Ward, and Amin Karbasi. Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Re- search, pages 2634–2643. PMLR, 09–15 Jun 2019

  10. [10]

    Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

    Jayanth Jagalur-Mohan and Youssef Marzouk. Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

  11. [11]

    Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann

    Kshitij Khare and Hua Zhou. Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann. Appl. Probab., 19(2):737–777, 2009

  12. [12]

    Springer, 2011

    Bernhard H Korte, Jens Vygen, B Korte, and J Vygen.Combinatorial optimization, volume 1. Springer, 2011

  13. [13]

    Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

    Jon Lee, Maxim Sviridenko, and Jan Vondr´ ak. Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

  14. [14]

    American Mathe- matical Soc., 2017

    David A Levin and Yuval Peres.Markov chains and mixing times, volume 107. American Mathe- matical Soc., 2017

  15. [15]

    G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. I.Math. Programming, 14(3):265–294, 1978

  16. [16]

    Monotone k-submodular function maximization with size con- straints

    Naoto Ohsaka and Yuichi Yoshida. Monotone k-submodular function maximization with size con- straints. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

  17. [17]

    Cambridge University Press, 2025

    Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, 2025

  18. [18]

    Maximizingk-submodular functions and beyond.ACM Trans

    Justin Ward and Stanislav ˇZivn´ y. Maximizingk-submodular functions and beyond.ACM Trans. Algorithms, 12(4):Art. 47, 26, 2016. 39