Many (most?) column subset selection criteria are NP hard for a few columns

Arvind K. Saibaba; Ilse C.F. Ipsen

arxiv: 2511.02740 · v2 · submitted 2025-11-04 · 🧮 math.NA · cs.NA

Many (most?) column subset selection criteria are NP hard for a few columns

Ilse C.F. Ipsen , Arvind K. Saibaba This is my paper

Pith reviewed 2026-05-18 00:46 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords column subset selectionNP-hardnesspseudo-inversevolume maximizationcondition numberA-optimal designapproximation schemesmatrix selection criteria

0 comments

The pith

Selecting few representative columns from a matrix under many common criteria is NP-hard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies optimization problems for picking k columns from an m-by-n matrix when k is small, specifically at most the minimum of the matrix rank and one-third of the number of rows. The criteria range from maximizing absolute or relative volume to minimizing various norms of the pseudo-inverse, condition numbers in different norms, and maximizing stable rank. It establishes that all these problems are NP-hard in the stated regime and that many of them have no polynomial-time approximation schemes. A sympathetic reader would care because column selection appears in data compression, experimental design, and numerical algorithms, so hardness results explain why exact solutions are intractable and guide the choice of practical methods.

Core claim

We show that the optimization problems of absolute volume and S-optimality maximization, norm and pseudo-inverse norm minimization, condition number minimization, stable rank maximization, and relative volume maximization are NP-hard for selecting k representative columns from a real matrix A when 1 ≤ k ≤ min{rank(A), m/3}. Many of these problems do not admit polynomial-time approximation schemes. The proofs rely on reducing the problems to decision versions using explicit expressions for the optimal values and for partitioned pseudo-inverses.

What carries the argument

Expressions for partitioned pseudo-inverses that convert the column-subset optimization problems into decision problems in the small-k regime.

If this is right

The Frobenius-norm pseudo-inverse minimization result directly applies to trace minimization in A-optimal experimental design.
Exact solutions for volume, condition, or norm criteria become computationally intractable once k reaches even a modest fraction of the row dimension.
Heuristic or randomized column selection methods are required in practice for the covered criteria when exact optimality is desired.
The relative-volume criterion, being inversely related to a power of the condition number, inherits the same hardness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hardness may hold for larger k, but the partitioned-inverse technique used here is tailored to the small-k case.
The results suggest that greedy or sampling-based algorithms should be analyzed for their worst-case guarantees on these specific objectives.
Connections to feature selection in machine learning may allow transfer of these NP-hardness statements to high-dimensional data matrices.
Checking whether known approximation algorithms for related problems achieve the claimed absence of PTAS would test the tightness of the negative results.

Load-bearing premise

The hardness proofs apply only when the number of columns to select is at most one-third the number of rows.

What would settle it

A polynomial-time algorithm, or a PTAS for one of the listed criteria, that works for all matrices under the k ≤ min{rank(A), m/3} restriction would refute the claimed hardness.

read the original abstract

We consider a variety of criteria for selecting k representative columns from a real mxn matrix A, when sufficiently few columns are required, i.e., 1<= k<= min{rank(A), m/3}. The criteria include the following optimization problems: absolute volume and S-optimality maximization; norm, pseudo-inverse norm, and condition minimization number in the two-norm, Frobenius norm and Schatten p-norms for p>2; stable rank maximization; and the new criterion of relative volume maximization, which is inversely proportional to a power of the condition number. We show that these criteria are NP hard and many do not admit polynomial time approximation schemes (PTAS). To formulate the optimization problems as decision problems, we derive optimal values for the subset selection criteria, as well as expressions for partitioned pseudo-inverses. The results for minimization of the pseudo-inverse in the Frobenius norm are applicable to trace optimization in A-optimal design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows NP-hardness for many column subset selection criteria when k is small, via reductions that rely on explicit optimal values from partitioned pseudo-inverses.

read the letter

The main thing to know is that selecting a small number of columns to optimize volume, various matrix norms, condition number, or stable rank is NP-hard for k up to min(rank(A), m/3), and many of these problems have no PTAS. They also introduce a relative-volume criterion and tie Frobenius-norm pseudo-inverse minimization to trace minimization in A-optimal design. That link is useful for people working in experimental design. The approach of deriving closed-form optimal values for the decision versions on specially constructed matrices is a direct way to set up the reductions from known hard problems. They cover a broad list of criteria in one go, which saves others from repeating the work for each one separately. The small-k restriction is stated up front, so the scope is clear. The results point algorithm designers toward approximation methods with guarantees rather than exact solvers. The soft spot is in the partitioned pseudo-inverse expressions that supply those optimal values. All the hardness claims rest on the decision threshold matching the exact optimum from those formulas, which the paper says hold only for k ≤ m/3. If an algebraic detail is off—such as a missing term in the Schur complement or an incorrect projector factor—the equivalence to the source NP-hard problem would fail. The abstract mentions deriving these expressions, but the reductions need close inspection to confirm no slips occurred. This is for researchers in numerical linear algebra, CUR methods, feature selection, and optimal design who need to know when exact column selection is intractable. A reader looking for complexity results or guidance on when to use heuristics will get concrete value. It deserves a serious referee because the claims are specific enough to affect practical choices, even if the proofs require verification. I would send it to peer review with instructions to check the pseudo-inverse derivations and the reduction constructions in detail.

Referee Report

1 major / 2 minor

Summary. The manuscript proves NP-hardness (and lack of PTAS for many cases) for a collection of column-subset selection criteria on an m×n matrix A in the regime 1 ≤ k ≤ min{rank(A), m/3}. The criteria include absolute-volume and S-optimality maximization; minimization of the matrix or its pseudoinverse in the 2-norm, Frobenius norm, and Schatten p-norms (p>2); condition-number minimization; stable-rank maximization; and the new relative-volume criterion (inversely proportional to a power of the condition number). Hardness is obtained by constructing decision versions whose yes/no thresholds are set to explicit optimal values derived from closed-form expressions for the partitioned pseudoinverse of the selected k-column submatrix; the results are also noted to apply to trace optimization in A-optimal design.

Significance. If the derivations hold, the work shows that exact or approximate optimization under these standard and novel criteria remains intractable even when only a few columns are selected, which is relevant to column-based low-rank approximation, experimental design, and numerical linear algebra. The explicit optimal-value formulas and the partitioned-pseudoinverse identities constitute a concrete technical contribution that could support future heuristic or exact algorithms for restricted instances.

major comments (1)

[Derivations of optimal values and partitioned pseudoinverses] The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.

minor comments (2)

[Abstract and main results] The abstract states that 'many do not admit PTAS' but does not list which criteria do and do not; the main text should contain an explicit theorem or table enumerating the PTAS status for each criterion together with the corresponding hardness proofs.
[Introduction of relative volume] Notation for the new relative-volume criterion should be introduced with a displayed equation before it is used in the hardness statements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying the central role of the partitioned-pseudoinverse derivations in all of our hardness reductions. We address the major comment below.

read point-by-point responses

Referee: The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.

Authors: We appreciate the referee highlighting the sensitivity of the reductions to these algebraic identities. The partitioned-pseudoinverse formulas were obtained by applying the block-matrix inversion lemma to A = [A_S A_rest] under the standing assumption that rank(A_S) = k (which is enforced by the construction and the bound k ≤ m/3). The Schur complement A_rest^T (I - P_{A_S}) A_rest appears explicitly in the derivation of A_S^+ and is retained in every subsequent norm and condition-number expression; no terms were omitted. The Frobenius-norm identity ||A_S^+||_F^2 = trace((A_S^T A_S)^{-1}) follows directly from the definition of the Moore-Penrose inverse for full-column-rank matrices and was cross-checked against the explicit optimal value used for the decision threshold. The same verification was performed for the 2-norm, Schatten-p (p > 2), condition-number, volume, and relative-volume criteria. Because the referee's concern is well-founded in principle, we will add a self-contained appendix that reproduces every intermediate matrix-algebra step for the partitioned pseudoinverse and the resulting closed-form optima. This will make the derivations independently verifiable without altering the main text. revision: partial

Circularity Check

0 steps flagged

NP-hardness via reductions from known problems and standard matrix identities is self-contained

full rationale

The paper proves NP-hardness of multiple column subset selection criteria by constructing explicit reductions from established NP-hard problems and deriving the required optimal values of each criterion using algebraic expressions for partitioned pseudo-inverses that hold under the stated regime k ≤ min{rank(A), m/3}. These steps rely on standard linear-algebra identities rather than any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation. The derivation chain is therefore independent of the target result and remains falsifiable against external complexity and matrix-theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from computational complexity theory and linear algebra (properties of matrix norms, pseudo-inverses, and volume) together with the small-k restriction stated in the abstract.

axioms (2)

standard math P ≠ NP
Implicit when concluding that NP-hard problems have no polynomial-time algorithms unless P=NP.
domain assumption Standard algebraic identities for partitioned pseudo-inverses and matrix norms hold for real matrices.
Used to derive optimal values of the selection criteria.

pith-pipeline@v0.9.0 · 5699 in / 1337 out tokens · 52157 ms · 2026-05-18T00:46:19.303718+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Armstrong andA

R. Armstrong andA. Damle,Collect, commit, expand: Efficient CPQR-based column selection for extremely wide matrices, 2025. arXiv:2501.18035

work page arXiv 2025
[2]

Avron andC

H. Avron andC. Boutsidis,Faster subset selection for matrices and applications, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1464–1499

work page 2013
[3]

O. M. Baksalary andG. Trenkler,On formulae for the Moore-Penrose inverse of a columnwise partitioned matrix, Appl. Math. Comput., 403 (2021), pp. Paper No. 125913, 10. 26

work page 2021
[4]

Bhatia,Matrix analysis, vol

R. Bhatia,Matrix analysis, vol. 169 of Graduate Texts in Mathematics, Springer- Verlag, New York, 1997

work page 1997
[5]

Boutsidis, P

C. Boutsidis, P. Drineas,andM. Magdon-Ismail,Near-optimal column-based matrix reconstruction, SIAM J. Comput., 43 (2014), pp. 687–717

work page 2014
[6]

Çivril,Column subset selection problem is UG-hard, J

A. Çivril,Column subset selection problem is UG-hard, J. Comput. and System Sci., 80 (2014), pp. 849–859

work page 2014
[7]

Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp

A. Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp. 167–181

work page 2001
[8]

Chandrasekaran andI

S. Chandrasekaran andI. C. F. Ipsen,On rank-revealing factorisations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592–622

work page 1994
[9]

Çivril andM

A. Çivril andM. Magdon-Ismail,On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), pp. 4801– 4811

work page 2009
[10]

,Exponential inapproximability of selecting a maximum volume sub-matrix, Algorithmica, 65 (2013), pp. 159–176

work page 2013
[11]

R. E. Cline,Representations for the generalized inverse of a partitioned matrix, J. Soc. Indust. Appl. Math., 12 (1964), pp. 588–600

work page 1964
[12]

R. W. Cottle,Manifestations of the Schur complement, Linear Algebra Appl., 8 (1974), pp. 189–211

work page 1974
[13]

Damle, S

A. Damle, S. Glas, A. Townsend,andA. Yu,How to reveal the rank of a matrix?,

work page
[14]

R.deHoog andR

F. R.deHoog andR. M. M. Mattheij,A note on subset selection for matrices, Linear Algebra Appl., 434 (2011), pp. 1845–1850

work page 2011
[15]

J. W. Demmel,The geometry of ill-conditioning, J. Complexity, 3 (1987), pp. 201– 229

work page 1987
[16]

Eskenazis and Y

S. Esw ar, V . Rao,andA. K. Saibaba,Bayesian D-optimal experimental designs via column subset selection, arXiv preprint arXiv:2402.16000, (2024)

work page arXiv 2024
[17]

M. R. Garey andD. S. Johnson,Computers and intractability, A Series of Books in the Mathematical Sciences, W. H. Freeman and Co., San Francisco, CA, 1979. A guide to the theory of NP-completeness

work page 1979
[18]

G. H. Golub andC. F. VanLoan,Matrix computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

work page 2013
[19]

Grigori andZ

L. Grigori andZ. Xue,Randomized strong rank-revealing QR for column subset selection and low-rank matrix approximation, 2025. arXiv:2503.18496. 27

work page arXiv 2025
[20]

M. Gu andS. C. Eisenstat,Efficient algorithms for computing a strong rank- revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869

work page 1996
[21]

Y . P. Hong andC.-T. Pan,A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), pp. 27–32

work page 1992
[22]

R. A. Horn andC. R. Johnson,Matrix analysis, Cambridge University Press, Cambridge, second ed., 2013

work page 2013
[23]

I. C. F. Ipsen andA. K. Saibaba,Stable rank and intrinsic dimension of real and complex matrices, SIAM J. Matrix Anal. Appl., 46 (2025), pp. 1988–2007

work page 2025
[24]

R. M. Karp,Reducibility among combinatorial problems, in Complexity of com- puter computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, York- town Heights, N.Y., 1972), The IBM Research Symposia Series, Plenum, New York-London, 1972, pp. 85–103

work page 1972
[25]

Kozyrev andA

I. Kozyrev andA. Osinsky,Subset selection for matrices in spectral norm, 2025. arXiv:2507.20435

work page arXiv 2025
[26]

Kressner, T

D. Kressner, T. Ni,andA. Uschmajew,On the approximation of vector-valued functions by volume sampling, 2024

work page 2024
[27]

J. T. Lauzon, S. W. Cheung, Y . Shin, Y . Choi, D. M. Copeland,andK. Huynh,S- OPT: a points selection algorithm for hyper-reduction in reduced order models, SIAM J. Sci. Comput., 46 (2024), pp. B474–B501

work page 2024
[28]

P. J. Maher,Some singular values, and unitarily invariant norm inequalities con- cerning generalized inverses, Filomat, 21 (2007), pp. 99–111

work page 2007
[29]

Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp

A. Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp. Paper No. 42, 26

work page 2023
[30]

Osinsky,V olume-based subset selection, Numer

A. Osinsky,V olume-based subset selection, Numer. Linear Algebra Appl., 31 (2024), pp. Paper No. e2525, 14

work page 2024
[31]

C. H. Papadimitriou andM. Yannakakis,Optimization, approximation, and com- plexity classes, J. Comput. System Sci., 43 (1991), pp. 425–440

work page 1991
[32]

Shin andD

Y . Shin andD. Xiu,Nonadaptive quasi-optimal points selection for least squares linear regression, SIAM J. Sci. Comput., 38 (2016), pp. A385–A411

work page 2016
[33]

Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp

Y . Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp. 52–58

work page 2021
[34]

D. P. Williamson andD. B. Shmoys,The design of approximation algorithms, Cambridge University Press, Cambridge, 2011. 28

work page 2011

[1] [1]

Armstrong andA

R. Armstrong andA. Damle,Collect, commit, expand: Efficient CPQR-based column selection for extremely wide matrices, 2025. arXiv:2501.18035

work page arXiv 2025

[2] [2]

Avron andC

H. Avron andC. Boutsidis,Faster subset selection for matrices and applications, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1464–1499

work page 2013

[3] [3]

O. M. Baksalary andG. Trenkler,On formulae for the Moore-Penrose inverse of a columnwise partitioned matrix, Appl. Math. Comput., 403 (2021), pp. Paper No. 125913, 10. 26

work page 2021

[4] [4]

Bhatia,Matrix analysis, vol

R. Bhatia,Matrix analysis, vol. 169 of Graduate Texts in Mathematics, Springer- Verlag, New York, 1997

work page 1997

[5] [5]

Boutsidis, P

C. Boutsidis, P. Drineas,andM. Magdon-Ismail,Near-optimal column-based matrix reconstruction, SIAM J. Comput., 43 (2014), pp. 687–717

work page 2014

[6] [6]

Çivril,Column subset selection problem is UG-hard, J

A. Çivril,Column subset selection problem is UG-hard, J. Comput. and System Sci., 80 (2014), pp. 849–859

work page 2014

[7] [7]

Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp

A. Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp. 167–181

work page 2001

[8] [8]

Chandrasekaran andI

S. Chandrasekaran andI. C. F. Ipsen,On rank-revealing factorisations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592–622

work page 1994

[9] [9]

Çivril andM

A. Çivril andM. Magdon-Ismail,On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), pp. 4801– 4811

work page 2009

[10] [10]

,Exponential inapproximability of selecting a maximum volume sub-matrix, Algorithmica, 65 (2013), pp. 159–176

work page 2013

[11] [11]

R. E. Cline,Representations for the generalized inverse of a partitioned matrix, J. Soc. Indust. Appl. Math., 12 (1964), pp. 588–600

work page 1964

[12] [12]

R. W. Cottle,Manifestations of the Schur complement, Linear Algebra Appl., 8 (1974), pp. 189–211

work page 1974

[13] [13]

Damle, S

A. Damle, S. Glas, A. Townsend,andA. Yu,How to reveal the rank of a matrix?,

work page

[14] [14]

R.deHoog andR

F. R.deHoog andR. M. M. Mattheij,A note on subset selection for matrices, Linear Algebra Appl., 434 (2011), pp. 1845–1850

work page 2011

[15] [15]

J. W. Demmel,The geometry of ill-conditioning, J. Complexity, 3 (1987), pp. 201– 229

work page 1987

[16] [16]

Eskenazis and Y

S. Esw ar, V . Rao,andA. K. Saibaba,Bayesian D-optimal experimental designs via column subset selection, arXiv preprint arXiv:2402.16000, (2024)

work page arXiv 2024

[17] [17]

M. R. Garey andD. S. Johnson,Computers and intractability, A Series of Books in the Mathematical Sciences, W. H. Freeman and Co., San Francisco, CA, 1979. A guide to the theory of NP-completeness

work page 1979

[18] [18]

G. H. Golub andC. F. VanLoan,Matrix computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

work page 2013

[19] [19]

Grigori andZ

L. Grigori andZ. Xue,Randomized strong rank-revealing QR for column subset selection and low-rank matrix approximation, 2025. arXiv:2503.18496. 27

work page arXiv 2025

[20] [20]

M. Gu andS. C. Eisenstat,Efficient algorithms for computing a strong rank- revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869

work page 1996

[21] [21]

Y . P. Hong andC.-T. Pan,A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), pp. 27–32

work page 1992

[22] [22]

R. A. Horn andC. R. Johnson,Matrix analysis, Cambridge University Press, Cambridge, second ed., 2013

work page 2013

[23] [23]

I. C. F. Ipsen andA. K. Saibaba,Stable rank and intrinsic dimension of real and complex matrices, SIAM J. Matrix Anal. Appl., 46 (2025), pp. 1988–2007

work page 2025

[24] [24]

R. M. Karp,Reducibility among combinatorial problems, in Complexity of com- puter computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, York- town Heights, N.Y., 1972), The IBM Research Symposia Series, Plenum, New York-London, 1972, pp. 85–103

work page 1972

[25] [25]

Kozyrev andA

I. Kozyrev andA. Osinsky,Subset selection for matrices in spectral norm, 2025. arXiv:2507.20435

work page arXiv 2025

[26] [26]

Kressner, T

D. Kressner, T. Ni,andA. Uschmajew,On the approximation of vector-valued functions by volume sampling, 2024

work page 2024

[27] [27]

J. T. Lauzon, S. W. Cheung, Y . Shin, Y . Choi, D. M. Copeland,andK. Huynh,S- OPT: a points selection algorithm for hyper-reduction in reduced order models, SIAM J. Sci. Comput., 46 (2024), pp. B474–B501

work page 2024

[28] [28]

P. J. Maher,Some singular values, and unitarily invariant norm inequalities con- cerning generalized inverses, Filomat, 21 (2007), pp. 99–111

work page 2007

[29] [29]

Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp

A. Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp. Paper No. 42, 26

work page 2023

[30] [30]

Osinsky,V olume-based subset selection, Numer

A. Osinsky,V olume-based subset selection, Numer. Linear Algebra Appl., 31 (2024), pp. Paper No. e2525, 14

work page 2024

[31] [31]

C. H. Papadimitriou andM. Yannakakis,Optimization, approximation, and com- plexity classes, J. Comput. System Sci., 43 (1991), pp. 425–440

work page 1991

[32] [32]

Shin andD

Y . Shin andD. Xiu,Nonadaptive quasi-optimal points selection for least squares linear regression, SIAM J. Sci. Comput., 38 (2016), pp. A385–A411

work page 2016

[33] [33]

Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp

Y . Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp. 52–58

work page 2021

[34] [34]

D. P. Williamson andD. B. Shmoys,The design of approximation algorithms, Cambridge University Press, Cambridge, 2011. 28

work page 2011