Information-theoretic coordinate subset and partition selection of multivariate Markov chains via submodular optimization
Pith reviewed 2026-05-22 22:21 UTC · model grok-4.3
The pith
Information-theoretic criteria for coordinate subsets and partitions in multivariate Markov chains exhibit submodular structure, enabling greedy algorithms with guarantees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest.
What carries the argument
The (k-)submodular or (k-)supermodular structures of information-theoretic objectives (entropy rate, distance to factorizability, independence, stationarity) that justify greedy selection under cardinality constraints.
If this is right
- Greedy algorithms produce subsets and partitions with provable performance bounds for the listed criteria.
- The generalized distorted greedy algorithm supplies a new subroutine for cardinality-constrained submodular problems.
- Numerical experiments on Bernoulli-Laplace and Curie-Weiss chains confirm that the selected projections incur limited information loss.
- The same structural arguments apply to any finite ergodic multivariate chain once the submodularity of the chosen criterion is verified.
Where Pith is reading between the lines
- The framework could be tested on empirical transition matrices estimated from time-series data in domains such as gene networks or climate variables.
- If the submodularity persists under mild perturbations of the transition probabilities, the algorithms remain useful for noisy or estimated chains.
- Partition selection may connect to model-reduction techniques that preserve stationarity or mixing properties beyond the four criteria studied here.
Load-bearing premise
The listed information-theoretic criteria actually possess the claimed (k-)submodular or (k-)supermodular structure for finite ergodic multivariate chains.
What would settle it
For a small multivariate chain whose state space permits exhaustive enumeration of all subsets of a given size, compare the subset returned by the greedy algorithm against the true optimum for one of the criteria such as entropy-rate minimization.
Figures
read the original abstract
We study the problem of optimally projecting the transition matrix of a finite ergodic multivariate Markov chain onto a lower-dimensional state space, as well as the problem of finding an optimal partition of coordinates such that the factorized Markov chain gives minimal information loss compared to the original multivariate chain. Specifically, we seek to construct a Markov chain that optimizes various information-theoretic criteria under cardinality constraints. These criteria include entropy rate, information-theoretic distance to factorizability, independence, and stationarity. We formulate these tasks as best subset or partition selection problems over multivariate Markov chains and leverage the (k-)submodular (or (k-)supermodular) structures of the objective functions to develop efficient greedy-based algorithms with theoretical guarantees. Along the way, we introduce a generalized version of the distorted greedy algorithm, which may be of independent interest. Finally, we illustrate the theory and algorithms through extensive numerical experiments with publicly available code on multivariate Markov chains associated with the Bernoulli--Laplace and Curie--Weiss models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formulates coordinate-subset selection and coordinate-partition selection for finite ergodic multivariate Markov chains as cardinality-constrained optimization problems whose objectives are various information-theoretic functionals (entropy rate, distance to factorizability, independence, stationarity). It asserts that these functionals are (k-)submodular or (k-)supermodular, invokes the resulting greedy algorithms (including a generalized distorted-greedy procedure) to obtain approximation guarantees, and illustrates the approach on Bernoulli-Laplace and Curie-Weiss models.
Significance. If the asserted (k-)submodularity properties hold for the entropy-rate and stationarity criteria on finite ergodic chains, the work supplies a principled algorithmic framework for dimension reduction and factorization of multivariate Markov chains together with explicit approximation guarantees; the introduction of a generalized distorted-greedy algorithm is a secondary contribution that may be of independent interest.
major comments (3)
- [Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.
- [Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.
- [§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).
minor comments (2)
- [Abstract] Notation for the projected transition matrix and the restricted state space should be introduced once and used consistently; the current abstract uses several informal phrases for the same object.
- [Numerical experiments] The numerical experiments section should report the precise cardinality constraints used and the wall-clock times of the greedy versus exhaustive baselines so that the practical scaling can be assessed.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable suggestions. We will address each major comment by providing the requested proofs and details in a revised version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and § on objective functions] The central claim that the entropy-rate functional (and the stationarity criterion) is (k-)submodular over coordinate subsets of a finite ergodic multivariate chain is load-bearing for all theoretical guarantees. The manuscript must supply an explicit derivation or proof of the diminishing-returns inequality for the projected transition matrix; without it the invocation of the greedy algorithm and its approximation bounds is unsupported.
Authors: We agree that the submodularity of the entropy-rate and stationarity functionals is central to our claims. Upon review, we recognize that the explicit proof of the diminishing-returns inequality was not sufficiently detailed in the manuscript. In the revised version, we will add a dedicated subsection deriving the (k-)submodularity property for the projected transition matrix under these criteria, including the step-by-step verification of the inequality. revision: yes
-
Referee: [Abstract and § on partition selection] The same structural claim is made for the distance-to-factorizability and independence criteria under partition selection. The manuscript must verify that these functionals satisfy the (k-)supermodular or (k-)submodular inequality on the lattice of partitions; failure of the inequality for any of the four listed criteria collapses the justification for the algorithms.
Authors: We acknowledge this point. The manuscript asserts the properties but the verification on the partition lattice may require more explicit treatment. We will include proofs or verifications for the (k-)submodularity/supermodularity of the distance-to-factorizability and independence criteria in the revision to fully justify the algorithms. revision: yes
-
Referee: [§ on algorithmic development] The generalized distorted-greedy algorithm is presented as a contribution that may be of independent interest. Its statement, convergence analysis, and the precise distortion parameter must be given explicitly (including any dependence on the submodularity ratio or curvature constants of the specific information measures).
Authors: We will revise the algorithmic section to provide the complete statement of the generalized distorted-greedy algorithm, its convergence analysis, and the explicit expression for the distortion parameter, including how it depends on the submodularity ratio and curvature constants specific to our information measures. revision: yes
Circularity Check
No significant circularity; submodularity claims and greedy guarantees rest on external combinatorial results plus internal proofs.
full rationale
The derivation formulates coordinate subset and partition selection as cardinality-constrained optimization of entropy rate, distance to factorizability, independence and stationarity. It invokes (k-)submodular or (k-)supermodular structure to obtain guarantees via a generalized distorted greedy algorithm introduced in the paper. No quoted step reduces a claimed prediction to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is unverified. The structural claims are presented as properties to be verified for finite ergodic chains and are not shown to be tautological with the algorithm outputs. This is the normal case of a paper whose central technical content is independent of its own fitted values or prior self-citations.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that several natural set functionals arising from the above information-theoretic distances exhibit a diminishing returns (i.e. submodular) or increasing returns (i.e. supermodular) structure
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the mapping S ↦ H(P^(S)) is submodular
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anton Bovier and Frank Den Hollander.Metastability: a potential-theoretic approach, volume 351. Springer, 2016. 38
work page 2016
-
[2]
Choi, Youjia Wang, and Geoffrey Wolfer
Michael C.H. Choi, Youjia Wang, and Geoffrey Wolfer. Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration.arXiv preprint arXiv:2404.12589, 2024
work page internal anchor Pith review arXiv 2024
-
[3]
Kun Deng, Prashant G. Mehta, and Sean P. Meyn. Optimal Kullback-Leibler aggregation via spectral theory of Markov chains.IEEE Trans. Automat. Control, 56(12):2793–2808, 2011
work page 2011
-
[4]
Streaming algorithm for monotonek-submodular maximization with cardinality constraints
Alina Ene and Huy Nguyen. Streaming algorithm for monotonek-submodular maximization with cardinality constraints. InInternational Conference on Machine Learning, pages 5944–5967. PMLR, 2022
work page 2022
-
[5]
Michael F Faulkner and Samuel Livingstone. Sampling algorithms in statistical physics: a guide for statistics and machine learning.Statistical Science, 39(1):137–164, 2024
work page 2024
-
[6]
Maximizing non-monotone submodular functions
Uriel Feige, Vahab S Mirrokni, and Jan Vondr´ ak. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011
work page 2011
-
[7]
Polymatroidal dependence structure of a set of random variables.Inform
Satoru Fujishige. Polymatroidal dependence structure of a set of random variables.Inform. and Control, 39(1):55–72, 1978
work page 1978
-
[8]
Bernhard C. Geiger and Christoph Temmel. Lumpings of Markov chains, entropy rate preservation, and higher-order lumpability.Journal of Applied Probability, 51(4):1114–1132, 2014
work page 2014
-
[9]
Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications
Chris Harshaw, Moran Feldman, Justin Ward, and Amin Karbasi. Submodular maximization be- yond non-negativity: Guarantees, fast algorithms, and applications. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Re- search, pages 2634–2643. PMLR, 09–15 Jun 2019
work page 2019
-
[10]
Jayanth Jagalur-Mohan and Youssef Marzouk. Batch greedy maximization of non-submodular func- tions: Guarantees and applications to experimental design.Journal of Machine Learning Research, 22(252):1–62, 2021
work page 2021
-
[11]
Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann
Kshitij Khare and Hua Zhou. Rates of convergence of some multivariate Markov chains with poly- nomial eigenfunctions.Ann. Appl. Probab., 19(2):737–777, 2009
work page 2009
-
[12]
Bernhard H Korte, Jens Vygen, B Korte, and J Vygen.Combinatorial optimization, volume 1. Springer, 2011
work page 2011
-
[13]
Jon Lee, Maxim Sviridenko, and Jan Vondr´ ak. Submodular maximization over multiple matroids via generalized exchange properties.Mathematics of Operations Research, 35(4):795–806, 2010
work page 2010
-
[14]
American Mathe- matical Soc., 2017
David A Levin and Yuval Peres.Markov chains and mixing times, volume 107. American Mathe- matical Soc., 2017
work page 2017
-
[15]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. I.Math. Programming, 14(3):265–294, 1978
work page 1978
-
[16]
Monotone k-submodular function maximization with size con- straints
Naoto Ohsaka and Yuichi Yoshida. Monotone k-submodular function maximization with size con- straints. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015
work page 2015
-
[17]
Cambridge University Press, 2025
Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, 2025
work page 2025
-
[18]
Maximizingk-submodular functions and beyond.ACM Trans
Justin Ward and Stanislav ˇZivn´ y. Maximizingk-submodular functions and beyond.ACM Trans. Algorithms, 12(4):Art. 47, 26, 2016. 39
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.