Many (most?) column subset selection criteria are NP hard for a few columns
Pith reviewed 2026-05-18 00:46 UTC · model grok-4.3
The pith
Selecting few representative columns from a matrix under many common criteria is NP-hard.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that the optimization problems of absolute volume and S-optimality maximization, norm and pseudo-inverse norm minimization, condition number minimization, stable rank maximization, and relative volume maximization are NP-hard for selecting k representative columns from a real matrix A when 1 ≤ k ≤ min{rank(A), m/3}. Many of these problems do not admit polynomial-time approximation schemes. The proofs rely on reducing the problems to decision versions using explicit expressions for the optimal values and for partitioned pseudo-inverses.
What carries the argument
Expressions for partitioned pseudo-inverses that convert the column-subset optimization problems into decision problems in the small-k regime.
If this is right
- The Frobenius-norm pseudo-inverse minimization result directly applies to trace minimization in A-optimal experimental design.
- Exact solutions for volume, condition, or norm criteria become computationally intractable once k reaches even a modest fraction of the row dimension.
- Heuristic or randomized column selection methods are required in practice for the covered criteria when exact optimality is desired.
- The relative-volume criterion, being inversely related to a power of the condition number, inherits the same hardness.
Where Pith is reading between the lines
- Similar hardness may hold for larger k, but the partitioned-inverse technique used here is tailored to the small-k case.
- The results suggest that greedy or sampling-based algorithms should be analyzed for their worst-case guarantees on these specific objectives.
- Connections to feature selection in machine learning may allow transfer of these NP-hardness statements to high-dimensional data matrices.
- Checking whether known approximation algorithms for related problems achieve the claimed absence of PTAS would test the tightness of the negative results.
Load-bearing premise
The hardness proofs apply only when the number of columns to select is at most one-third the number of rows.
What would settle it
A polynomial-time algorithm, or a PTAS for one of the listed criteria, that works for all matrices under the k ≤ min{rank(A), m/3} restriction would refute the claimed hardness.
read the original abstract
We consider a variety of criteria for selecting k representative columns from a real mxn matrix A, when sufficiently few columns are required, i.e., 1<= k<= min{rank(A), m/3}. The criteria include the following optimization problems: absolute volume and S-optimality maximization; norm, pseudo-inverse norm, and condition minimization number in the two-norm, Frobenius norm and Schatten p-norms for p>2; stable rank maximization; and the new criterion of relative volume maximization, which is inversely proportional to a power of the condition number. We show that these criteria are NP hard and many do not admit polynomial time approximation schemes (PTAS). To formulate the optimization problems as decision problems, we derive optimal values for the subset selection criteria, as well as expressions for partitioned pseudo-inverses. The results for minimization of the pseudo-inverse in the Frobenius norm are applicable to trace optimization in A-optimal design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves NP-hardness (and lack of PTAS for many cases) for a collection of column-subset selection criteria on an m×n matrix A in the regime 1 ≤ k ≤ min{rank(A), m/3}. The criteria include absolute-volume and S-optimality maximization; minimization of the matrix or its pseudoinverse in the 2-norm, Frobenius norm, and Schatten p-norms (p>2); condition-number minimization; stable-rank maximization; and the new relative-volume criterion (inversely proportional to a power of the condition number). Hardness is obtained by constructing decision versions whose yes/no thresholds are set to explicit optimal values derived from closed-form expressions for the partitioned pseudoinverse of the selected k-column submatrix; the results are also noted to apply to trace optimization in A-optimal design.
Significance. If the derivations hold, the work shows that exact or approximate optimization under these standard and novel criteria remains intractable even when only a few columns are selected, which is relevant to column-based low-rank approximation, experimental design, and numerical linear algebra. The explicit optimal-value formulas and the partitioned-pseudoinverse identities constitute a concrete technical contribution that could support future heuristic or exact algorithms for restricted instances.
major comments (1)
- [Derivations of optimal values and partitioned pseudoinverses] The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.
minor comments (2)
- [Abstract and main results] The abstract states that 'many do not admit PTAS' but does not list which criteria do and do not; the main text should contain an explicit theorem or table enumerating the PTAS status for each criterion together with the corresponding hardness proofs.
- [Introduction of relative volume] Notation for the new relative-volume criterion should be introduced with a displayed equation before it is used in the hardness statements.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for identifying the central role of the partitioned-pseudoinverse derivations in all of our hardness reductions. We address the major comment below.
read point-by-point responses
-
Referee: The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.
Authors: We appreciate the referee highlighting the sensitivity of the reductions to these algebraic identities. The partitioned-pseudoinverse formulas were obtained by applying the block-matrix inversion lemma to A = [A_S A_rest] under the standing assumption that rank(A_S) = k (which is enforced by the construction and the bound k ≤ m/3). The Schur complement A_rest^T (I - P_{A_S}) A_rest appears explicitly in the derivation of A_S^+ and is retained in every subsequent norm and condition-number expression; no terms were omitted. The Frobenius-norm identity ||A_S^+||_F^2 = trace((A_S^T A_S)^{-1}) follows directly from the definition of the Moore-Penrose inverse for full-column-rank matrices and was cross-checked against the explicit optimal value used for the decision threshold. The same verification was performed for the 2-norm, Schatten-p (p > 2), condition-number, volume, and relative-volume criteria. Because the referee's concern is well-founded in principle, we will add a self-contained appendix that reproduces every intermediate matrix-algebra step for the partitioned pseudoinverse and the resulting closed-form optima. This will make the derivations independently verifiable without altering the main text. revision: partial
Circularity Check
NP-hardness via reductions from known problems and standard matrix identities is self-contained
full rationale
The paper proves NP-hardness of multiple column subset selection criteria by constructing explicit reductions from established NP-hard problems and deriving the required optimal values of each criterion using algebraic expressions for partitioned pseudo-inverses that hold under the stated regime k ≤ min{rank(A), m/3}. These steps rely on standard linear-algebra identities rather than any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation. The derivation chain is therefore independent of the target result and remains falsifiable against external complexity and matrix-theory benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math P ≠ NP
- domain assumption Standard algebraic identities for partitioned pseudo-inverses and matrix norms hold for real matrices.
Reference graph
Works this paper leans on
-
[1]
R. Armstrong andA. Damle,Collect, commit, expand: Efficient CPQR-based column selection for extremely wide matrices, 2025. arXiv:2501.18035
-
[2]
H. Avron andC. Boutsidis,Faster subset selection for matrices and applications, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1464–1499
work page 2013
-
[3]
O. M. Baksalary andG. Trenkler,On formulae for the Moore-Penrose inverse of a columnwise partitioned matrix, Appl. Math. Comput., 403 (2021), pp. Paper No. 125913, 10. 26
work page 2021
-
[4]
R. Bhatia,Matrix analysis, vol. 169 of Graduate Texts in Mathematics, Springer- Verlag, New York, 1997
work page 1997
-
[5]
C. Boutsidis, P. Drineas,andM. Magdon-Ismail,Near-optimal column-based matrix reconstruction, SIAM J. Comput., 43 (2014), pp. 687–717
work page 2014
-
[6]
Çivril,Column subset selection problem is UG-hard, J
A. Çivril,Column subset selection problem is UG-hard, J. Comput. and System Sci., 80 (2014), pp. 849–859
work page 2014
-
[7]
A. Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp. 167–181
work page 2001
-
[8]
S. Chandrasekaran andI. C. F. Ipsen,On rank-revealing factorisations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592–622
work page 1994
-
[9]
A. Çivril andM. Magdon-Ismail,On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), pp. 4801– 4811
work page 2009
-
[10]
,Exponential inapproximability of selecting a maximum volume sub-matrix, Algorithmica, 65 (2013), pp. 159–176
work page 2013
-
[11]
R. E. Cline,Representations for the generalized inverse of a partitioned matrix, J. Soc. Indust. Appl. Math., 12 (1964), pp. 588–600
work page 1964
-
[12]
R. W. Cottle,Manifestations of the Schur complement, Linear Algebra Appl., 8 (1974), pp. 189–211
work page 1974
- [13]
-
[14]
F. R.deHoog andR. M. M. Mattheij,A note on subset selection for matrices, Linear Algebra Appl., 434 (2011), pp. 1845–1850
work page 2011
-
[15]
J. W. Demmel,The geometry of ill-conditioning, J. Complexity, 3 (1987), pp. 201– 229
work page 1987
-
[16]
S. Esw ar, V . Rao,andA. K. Saibaba,Bayesian D-optimal experimental designs via column subset selection, arXiv preprint arXiv:2402.16000, (2024)
-
[17]
M. R. Garey andD. S. Johnson,Computers and intractability, A Series of Books in the Mathematical Sciences, W. H. Freeman and Co., San Francisco, CA, 1979. A guide to the theory of NP-completeness
work page 1979
-
[18]
G. H. Golub andC. F. VanLoan,Matrix computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013
work page 2013
-
[19]
L. Grigori andZ. Xue,Randomized strong rank-revealing QR for column subset selection and low-rank matrix approximation, 2025. arXiv:2503.18496. 27
-
[20]
M. Gu andS. C. Eisenstat,Efficient algorithms for computing a strong rank- revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869
work page 1996
-
[21]
Y . P. Hong andC.-T. Pan,A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), pp. 27–32
work page 1992
-
[22]
R. A. Horn andC. R. Johnson,Matrix analysis, Cambridge University Press, Cambridge, second ed., 2013
work page 2013
-
[23]
I. C. F. Ipsen andA. K. Saibaba,Stable rank and intrinsic dimension of real and complex matrices, SIAM J. Matrix Anal. Appl., 46 (2025), pp. 1988–2007
work page 2025
-
[24]
R. M. Karp,Reducibility among combinatorial problems, in Complexity of com- puter computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, York- town Heights, N.Y., 1972), The IBM Research Symposia Series, Plenum, New York-London, 1972, pp. 85–103
work page 1972
-
[25]
I. Kozyrev andA. Osinsky,Subset selection for matrices in spectral norm, 2025. arXiv:2507.20435
-
[26]
D. Kressner, T. Ni,andA. Uschmajew,On the approximation of vector-valued functions by volume sampling, 2024
work page 2024
-
[27]
J. T. Lauzon, S. W. Cheung, Y . Shin, Y . Choi, D. M. Copeland,andK. Huynh,S- OPT: a points selection algorithm for hyper-reduction in reduced order models, SIAM J. Sci. Comput., 46 (2024), pp. B474–B501
work page 2024
-
[28]
P. J. Maher,Some singular values, and unitarily invariant norm inequalities con- cerning generalized inverses, Filomat, 21 (2007), pp. 99–111
work page 2007
-
[29]
Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp
A. Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp. Paper No. 42, 26
work page 2023
-
[30]
Osinsky,V olume-based subset selection, Numer
A. Osinsky,V olume-based subset selection, Numer. Linear Algebra Appl., 31 (2024), pp. Paper No. e2525, 14
work page 2024
-
[31]
C. H. Papadimitriou andM. Yannakakis,Optimization, approximation, and com- plexity classes, J. Comput. System Sci., 43 (1991), pp. 425–440
work page 1991
- [32]
-
[33]
Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp
Y . Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp. 52–58
work page 2021
-
[34]
D. P. Williamson andD. B. Shmoys,The design of approximation algorithms, Cambridge University Press, Cambridge, 2011. 28
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.