Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization

Michael C.H. Choi; Zheyuan Lai

arxiv: 2503.23340 · v3 · submitted 2025-03-30 · 🧮 math.PR · cs.IT· math.CO· math.IT

Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization

Zheyuan Lai , Michael C.H. Choi This is my paper

Pith reviewed 2026-05-22 22:21 UTC · model grok-4.3

classification 🧮 math.PR cs.ITmath.COmath.IT

keywords multivariate Markov chainssubmodular optimizationentropy ratecoordinate selectionpartition selectiongreedy algorithmsinformation-theoretic criteria

0 comments

The pith

Information-theoretic criteria for coordinate subsets and partitions in multivariate Markov chains exhibit submodular structure, enabling greedy algorithms with guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes methods for selecting a subset of coordinates or an optimal partition from a finite ergodic multivariate Markov chain to optimize criteria such as entropy rate or distance to factorizability while respecting cardinality constraints. These selection tasks are cast as best-subset or best-partition problems whose objectives inherit (k-)submodular or (k-)supermodular properties, which in turn justify the use of greedy algorithms that come with approximation guarantees. A generalized distorted greedy algorithm is derived along the way. The approach is applied to standard interaction models to produce lower-dimensional chains that incur minimal information loss relative to the original process.

Core claim

We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest.

What carries the argument

The (k-)submodular or (k-)supermodular structures of information-theoretic objectives (entropy rate, distance to factorizability, independence, stationarity) that justify greedy selection under cardinality constraints.

If this is right

Greedy algorithms produce subsets and partitions with provable performance bounds for the listed criteria.
The generalized distorted greedy algorithm supplies a new subroutine for cardinality-constrained submodular problems.
Numerical experiments on Bernoulli-Laplace and Curie-Weiss chains confirm that the selected projections incur limited information loss.
The same structural arguments apply to any finite ergodic multivariate chain once the submodularity of the chosen criterion is verified.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be tested on empirical transition matrices estimated from time-series data in domains such as gene networks or climate variables.
If the submodularity persists under mild perturbations of the transition probabilities, the algorithms remain useful for noisy or estimated chains.
Partition selection may connect to model-reduction techniques that preserve stationarity or mixing properties beyond the four criteria studied here.

Load-bearing premise

The listed information-theoretic criteria actually possess the claimed (k-)submodular or (k-)supermodular structure for finite ergodic multivariate chains.

What would settle it

For a small multivariate chain whose state space permits exhaustive enumeration of all subsets of a given size, compare the subset returned by the greedy algorithm against the true optimum for one of the criteria such as entropy-rate minimization.

Figures

Figures reproduced from arXiv: 2503.23340 by Michael C.H. Choi, Zheyuan Lai.

**Figure 2.** Figure 2: Empirical CDF comparison of MCMC simulation. Here, “Tensor samples” refer to the 1000 [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

**Figure 3.** Figure 3: Entropy rate against subset size for the three algorithms (B–L model). [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

**Figure 4.** Figure 4: Entropy rate against subset size for the three algorithms (C–W model). [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗

**Figure 5.** Figure 5: Distance to factorizability against subset size for the three algorithms [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗

**Figure 6.** Figure 6: Distance to independence against subset size for the three algorithms (B–L model). [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗

**Figure 7.** Figure 7: Distance to independence against subset size for the three algorithms (C–W model). [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗

**Figure 8.** Figure 8: Distance to independence of the complement set against subset size. [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗

**Figure 9.** Figure 9: Distance to stationarity of the output against subset size. [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗

**Figure 10.** Figure 10: Performance evaluation of the generalized distorted greedy algorithm. [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗

**Figure 11.** Figure 11: Distance to stationarity of the complement set against subset size. [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗

**Figure 12.** Figure 12: Performance evaluation of the batch greedy algorithm. [PITH_FULL_IMAGE:figures/full_fig_p038_12.png] view at source ↗

read the original abstract

We study the problem of optimally projecting the transition matrix of a finite ergodic multivariate Markov chain onto a lower-dimensional state space, as well as the problem of finding an optimal partition of coordinates such that the factorized Markov chain gives minimal information loss compared to the original multivariate chain. Specifically, we seek to construct a Markov chain that optimizes various information-theoretic criteria under cardinality constraints. These criteria include entropy rate, information-theoretic distance to factorizability, independence, and stationarity. We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest. Finally, we illustrate the theory and algorithms through extensive numerical experiments with publicly available code on multivariate Markov chains associated with the Bernoulli--Laplace and Curie--Weiss models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up coordinate and partition selection for multivariate Markov chains as submodular problems and adds a generalized greedy algorithm, but the load-bearing submodularity claim for the information criteria is not shown in the abstract.

read the letter

The paper's core move is to recast projection of a finite ergodic multivariate Markov chain onto fewer coordinates, and the search for a good partition of coordinates, as cardinality-constrained selection problems. The objectives are entropy rate, distance to factorizability, independence, and stationarity. The authors claim these functions are (k-)submodular or (k-)supermodular, which lets them invoke greedy algorithms with approximation guarantees, and they introduce a generalized distorted greedy procedure that may stand on its own. They close with experiments on the Bernoulli-Laplace and Curie-Weiss models plus released code. That last part is straightforward to check and gives a practical sense of performance on two standard examples. The formulation itself is a direct transfer of existing submodular machinery to this setting rather than a conceptual break. The main uncertainty is whether the claimed submodularity actually holds once the transition matrix is restricted to a coordinate subset or the chain is factorized by a partition. The abstract states the property without a derivation or even a short argument, so the theoretical guarantees rest on an unverified structural fact. If the diminishing-returns inequality fails for the entropy-rate functional under projection, the justification for the algorithms is gone even if the numerical runs look reasonable. This is the point that needs direct inspection in the full text. The work is aimed at the small group already comfortable with both multivariate Markov chains and submodular combinatorial optimization. A reader in that overlap may find the generalized algorithm or the specific criteria worth looking at, but the paper will not move the broader literature on stochastic processes. I would send it to peer review so referees can confirm whether the submodularity holds and whether the algorithm extension is substantive.

Referee Report

3 major / 2 minor

Summary. The manuscript formulates coordinate-subset selection and coordinate-partition selection for finite ergodic multivariate Markov chains as cardinality-constrained optimization problems whose objectives are various information-theoretic functionals (entropy rate, distance to factorizability, independence, stationarity). It asserts that these functionals are (k-)submodular or (k-)supermodular, invokes the resulting greedy algorithms (including a generalized distorted-greedy procedure) to obtain approximation guarantees, and illustrates the approach on Bernoulli-Laplace and Curie-Weiss models.

Significance. If the asserted (k-)submodularity properties hold for the entropy-rate and stationarity criteria on finite ergodic chains, the work supplies a principled algorithmic framework for dimension reduction and factorization of multivariate Markov chains together with explicit approximation guarantees; the introduction of a generalized distorted-greedy algorithm is a secondary contribution that may be of independent interest.

major comments (3)

[Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.
[Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.
[§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).

minor comments (2)

[Abstract] Notation for the projected transition matrix and the restricted state space should be introduced once and used consistently; the current abstract uses several informal phrases for the same object.
[Numerical experiments] The numerical experiments section should report the precise cardinality constraints used and the wall-clock times of the greedy versus exhaustive baselines so that the practical scaling can be assessed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We will address each major comment by providing the requested proofs and details in a revised version of the manuscript.

read point-by-point responses

Referee: [Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.

Authors: We agree that the submodularity of the entropy-rate and stationarity functionals is central to our claims. Upon review, we recognize that the explicit proof of the diminishing-returns inequality was not sufficiently detailed in the manuscript. In the revised version, we will add a dedicated subsection deriving the (k-)submodularity property for the projected transition matrix under these criteria, including the step-by-step verification of the inequality. revision: yes
Referee: [Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.

Authors: We acknowledge this point. The manuscript asserts the properties but the verification on the partition lattice may require more explicit treatment. We will include proofs or verifications for the (k-)submodularity/supermodularity of the distance-to-factorizability and independence criteria in the revision to fully justify the algorithms. revision: yes
Referee: [§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).

Authors: We will revise the algorithmic section to provide the complete statement of the generalized distorted-greedy algorithm, its convergence analysis, and the explicit expression for the distortion parameter, including how it depends on the submodularity ratio and curvature constants specific to our information measures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; submodularity claims and greedy guarantees rest on external combinatorial results plus internal proofs.

full rationale

The derivation formulates coordinate subset and partition selection as cardinality-constrained optimization of entropy rate, distance to factorizability, independence and stationarity. It invokes (k-)submodular or (k-)supermodular structure to obtain guarantees via a generalized distorted greedy algorithm introduced in the paper. No quoted step reduces a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is unverified. The structural claims are presented as properties to be verified for finite ergodic chains and are not shown to be tautological with the algorithm outputs. This is the normal case of a paper whose central technical content is independent of its own fitted values or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unverified assertion that the objective functions are submodular.

pith-pipeline@v0.9.0 · 5710 in / 1028 out tokens · 36485 ms · 2026-05-22T22:21:39.666025+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that several natural set functionals arising from the above information-theoretic distances exhibit a diminishing returns (i.e. submodular) or increasing returns (i.e. supermodular) structure
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the mapping S ↦ H(P^(S)) is submodular

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

Springer, 2016

Anton Bovier and Frank Den Hollander.Metastability: a potential-theoretic approach, volume 351. Springer, 2016. 38

work page 2016
[2]

Choi, Youjia Wang, and Geoffrey Wolfer

Michael C.H. Choi, Youjia Wang, and Geoffrey Wolfer. Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration.arXiv preprint arXiv:2404.12589, 2024

work page internal anchor Pith review arXiv 2024
[3]

Mehta, and Sean P

Kun Deng, Prashant G. Mehta, and Sean P. Meyn. Optimal Kullback-Leibler aggregation via spectral theory of Markov chains.IEEE Trans. Automat. Control, 56(12):2793–2808, 2011

work page 2011
[4]

Streaming algorithm for monotonek-submodular maximization with cardinality constraints

Alina Ene and Huy Nguyen. Streaming algorithm for monotonek-submodular maximization with cardinality constraints. InInternational Conference on Machine Learning, pages 5944–5967. PMLR, 2022

work page 2022
[5]

Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

Michael F Faulkner and Samuel Livingstone. Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

work page 2024
[6]

Maximizing non-monotone submodular functions

Uriel Feige, Vahab S Mirrokni, and Jan Vondr´ ak. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011

work page 2011
[7]

Polymatroidal dependence structure of a set of random variables.Inform

Satoru Fujishige. Polymatroidal dependence structure of a set of random variables.Inform. and Control, 39(1):55–72, 1978

work page 1978
[8]

Geiger and Christoph Temmel

Bernhard C. Geiger and Christoph Temmel. Lumpings of Markov chains, entropy rate preservation, and higher-order lumpability.Journal of Applied Probability, 51(4):1114–1132, 2014

work page 2014
[9]

Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications

Chris Harshaw, Moran Feldman, Justin Ward, and Amin Karbasi. Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Re- search, pages 2634–2643. PMLR, 09–15 Jun 2019

work page 2019
[10]

Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

Jayanth Jagalur-Mohan and Youssef Marzouk. Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

work page 2021
[11]

Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann

Kshitij Khare and Hua Zhou. Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann. Appl. Probab., 19(2):737–777, 2009

work page 2009
[12]

Springer, 2011

Bernhard H Korte, Jens Vygen, B Korte, and J Vygen.Combinatorial optimization, volume 1. Springer, 2011

work page 2011
[13]

Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

Jon Lee, Maxim Sviridenko, and Jan Vondr´ ak. Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

work page 2010
[14]

American Mathe- matical Soc., 2017

David A Levin and Yuval Peres.Markov chains and mixing times, volume 107. American Mathe- matical Soc., 2017

work page 2017
[15]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. I.Math. Programming, 14(3):265–294, 1978

work page 1978
[16]

Monotone k-submodular function maximization with size con- straints

Naoto Ohsaka and Yuichi Yoshida. Monotone k-submodular function maximization with size con- straints. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

work page 2015
[17]

Cambridge University Press, 2025

Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, 2025

work page 2025
[18]

Maximizingk-submodular functions and beyond.ACM Trans

Justin Ward and Stanislav ˇZivn´ y. Maximizingk-submodular functions and beyond.ACM Trans. Algorithms, 12(4):Art. 47, 26, 2016. 39

work page 2016

[1] [1]

Springer, 2016

Anton Bovier and Frank Den Hollander.Metastability: a potential-theoretic approach, volume 351. Springer, 2016. 38

work page 2016

[2] [2]

Choi, Youjia Wang, and Geoffrey Wolfer

Michael C.H. Choi, Youjia Wang, and Geoffrey Wolfer. Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration.arXiv preprint arXiv:2404.12589, 2024

work page internal anchor Pith review arXiv 2024

[3] [3]

Mehta, and Sean P

Kun Deng, Prashant G. Mehta, and Sean P. Meyn. Optimal Kullback-Leibler aggregation via spectral theory of Markov chains.IEEE Trans. Automat. Control, 56(12):2793–2808, 2011

work page 2011

[4] [4]

Streaming algorithm for monotonek-submodular maximization with cardinality constraints

Alina Ene and Huy Nguyen. Streaming algorithm for monotonek-submodular maximization with cardinality constraints. InInternational Conference on Machine Learning, pages 5944–5967. PMLR, 2022

work page 2022

[5] [5]

Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

Michael F Faulkner and Samuel Livingstone. Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024

work page 2024

[6] [6]

Maximizing non-monotone submodular functions

Uriel Feige, Vahab S Mirrokni, and Jan Vondr´ ak. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011

work page 2011

[7] [7]

Polymatroidal dependence structure of a set of random variables.Inform

Satoru Fujishige. Polymatroidal dependence structure of a set of random variables.Inform. and Control, 39(1):55–72, 1978

work page 1978

[8] [8]

Geiger and Christoph Temmel

Bernhard C. Geiger and Christoph Temmel. Lumpings of Markov chains, entropy rate preservation, and higher-order lumpability.Journal of Applied Probability, 51(4):1114–1132, 2014

work page 2014

[9] [9]

Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications

Chris Harshaw, Moran Feldman, Justin Ward, and Amin Karbasi. Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Re- search, pages 2634–2643. PMLR, 09–15 Jun 2019

work page 2019

[10] [10]

Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

Jayanth Jagalur-Mohan and Youssef Marzouk. Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021

work page 2021

[11] [11]

Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann

Kshitij Khare and Hua Zhou. Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann. Appl. Probab., 19(2):737–777, 2009

work page 2009

[12] [12]

Springer, 2011

Bernhard H Korte, Jens Vygen, B Korte, and J Vygen.Combinatorial optimization, volume 1. Springer, 2011

work page 2011

[13] [13]

Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

Jon Lee, Maxim Sviridenko, and Jan Vondr´ ak. Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010

work page 2010

[14] [14]

American Mathe- matical Soc., 2017

David A Levin and Yuval Peres.Markov chains and mixing times, volume 107. American Mathe- matical Soc., 2017

work page 2017

[15] [15]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. I.Math. Programming, 14(3):265–294, 1978

work page 1978

[16] [16]

Monotone k-submodular function maximization with size con- straints

Naoto Ohsaka and Yuichi Yoshida. Monotone k-submodular function maximization with size con- straints. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015

work page 2015

[17] [17]

Cambridge University Press, 2025

Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, 2025

work page 2025

[18] [18]

Maximizingk-submodular functions and beyond.ACM Trans

Justin Ward and Stanislav ˇZivn´ y. Maximizingk-submodular functions and beyond.ACM Trans. Algorithms, 12(4):Art. 47, 26, 2016. 39

work page 2016