pith. sign in

arxiv: 2511.02740 · v2 · submitted 2025-11-04 · 🧮 math.NA · cs.NA

Many (most?) column subset selection criteria are NP hard for a few columns

Pith reviewed 2026-05-18 00:46 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords column subset selectionNP-hardnesspseudo-inversevolume maximizationcondition numberA-optimal designapproximation schemesmatrix selection criteria
0
0 comments X

The pith

Selecting few representative columns from a matrix under many common criteria is NP-hard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies optimization problems for picking k columns from an m-by-n matrix when k is small, specifically at most the minimum of the matrix rank and one-third of the number of rows. The criteria range from maximizing absolute or relative volume to minimizing various norms of the pseudo-inverse, condition numbers in different norms, and maximizing stable rank. It establishes that all these problems are NP-hard in the stated regime and that many of them have no polynomial-time approximation schemes. A sympathetic reader would care because column selection appears in data compression, experimental design, and numerical algorithms, so hardness results explain why exact solutions are intractable and guide the choice of practical methods.

Core claim

We show that the optimization problems of absolute volume and S-optimality maximization, norm and pseudo-inverse norm minimization, condition number minimization, stable rank maximization, and relative volume maximization are NP-hard for selecting k representative columns from a real matrix A when 1 ≤ k ≤ min{rank(A), m/3}. Many of these problems do not admit polynomial-time approximation schemes. The proofs rely on reducing the problems to decision versions using explicit expressions for the optimal values and for partitioned pseudo-inverses.

What carries the argument

Expressions for partitioned pseudo-inverses that convert the column-subset optimization problems into decision problems in the small-k regime.

If this is right

  • The Frobenius-norm pseudo-inverse minimization result directly applies to trace minimization in A-optimal experimental design.
  • Exact solutions for volume, condition, or norm criteria become computationally intractable once k reaches even a modest fraction of the row dimension.
  • Heuristic or randomized column selection methods are required in practice for the covered criteria when exact optimality is desired.
  • The relative-volume criterion, being inversely related to a power of the condition number, inherits the same hardness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hardness may hold for larger k, but the partitioned-inverse technique used here is tailored to the small-k case.
  • The results suggest that greedy or sampling-based algorithms should be analyzed for their worst-case guarantees on these specific objectives.
  • Connections to feature selection in machine learning may allow transfer of these NP-hardness statements to high-dimensional data matrices.
  • Checking whether known approximation algorithms for related problems achieve the claimed absence of PTAS would test the tightness of the negative results.

Load-bearing premise

The hardness proofs apply only when the number of columns to select is at most one-third the number of rows.

What would settle it

A polynomial-time algorithm, or a PTAS for one of the listed criteria, that works for all matrices under the k ≤ min{rank(A), m/3} restriction would refute the claimed hardness.

read the original abstract

We consider a variety of criteria for selecting k representative columns from a real mxn matrix A, when sufficiently few columns are required, i.e., 1<= k<= min{rank(A), m/3}. The criteria include the following optimization problems: absolute volume and S-optimality maximization; norm, pseudo-inverse norm, and condition minimization number in the two-norm, Frobenius norm and Schatten p-norms for p>2; stable rank maximization; and the new criterion of relative volume maximization, which is inversely proportional to a power of the condition number. We show that these criteria are NP hard and many do not admit polynomial time approximation schemes (PTAS). To formulate the optimization problems as decision problems, we derive optimal values for the subset selection criteria, as well as expressions for partitioned pseudo-inverses. The results for minimization of the pseudo-inverse in the Frobenius norm are applicable to trace optimization in A-optimal design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proves NP-hardness (and lack of PTAS for many cases) for a collection of column-subset selection criteria on an m×n matrix A in the regime 1 ≤ k ≤ min{rank(A), m/3}. The criteria include absolute-volume and S-optimality maximization; minimization of the matrix or its pseudoinverse in the 2-norm, Frobenius norm, and Schatten p-norms (p>2); condition-number minimization; stable-rank maximization; and the new relative-volume criterion (inversely proportional to a power of the condition number). Hardness is obtained by constructing decision versions whose yes/no thresholds are set to explicit optimal values derived from closed-form expressions for the partitioned pseudoinverse of the selected k-column submatrix; the results are also noted to apply to trace optimization in A-optimal design.

Significance. If the derivations hold, the work shows that exact or approximate optimization under these standard and novel criteria remains intractable even when only a few columns are selected, which is relevant to column-based low-rank approximation, experimental design, and numerical linear algebra. The explicit optimal-value formulas and the partitioned-pseudoinverse identities constitute a concrete technical contribution that could support future heuristic or exact algorithms for restricted instances.

major comments (1)
  1. [Derivations of optimal values and partitioned pseudoinverses] The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.
minor comments (2)
  1. [Abstract and main results] The abstract states that 'many do not admit PTAS' but does not list which criteria do and do not; the main text should contain an explicit theorem or table enumerating the PTAS status for each criterion together with the corresponding hardness proofs.
  2. [Introduction of relative volume] Notation for the new relative-volume criterion should be introduced with a displayed equation before it is used in the hardness statements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying the central role of the partitioned-pseudoinverse derivations in all of our hardness reductions. We address the major comment below.

read point-by-point responses
  1. Referee: The section deriving optimal values via partitioned pseudoinverses (the core of all reductions): the decision-problem thresholds for every criterion (volume, ||A_S^+||_F, condition number, etc.) are set equal to the closed-form optimum obtained from the partitioned-pseudoinverse expressions that hold only for k ≤ m/3. An algebraic slip in any of those expressions (e.g., an omitted Schur-complement term when rank(A_S)=k or an incorrect factor in the Frobenius-norm identity) would make the claimed optimal value false on the constructed instances and collapse the equivalence to the source NP-hard problem.

    Authors: We appreciate the referee highlighting the sensitivity of the reductions to these algebraic identities. The partitioned-pseudoinverse formulas were obtained by applying the block-matrix inversion lemma to A = [A_S A_rest] under the standing assumption that rank(A_S) = k (which is enforced by the construction and the bound k ≤ m/3). The Schur complement A_rest^T (I - P_{A_S}) A_rest appears explicitly in the derivation of A_S^+ and is retained in every subsequent norm and condition-number expression; no terms were omitted. The Frobenius-norm identity ||A_S^+||_F^2 = trace((A_S^T A_S)^{-1}) follows directly from the definition of the Moore-Penrose inverse for full-column-rank matrices and was cross-checked against the explicit optimal value used for the decision threshold. The same verification was performed for the 2-norm, Schatten-p (p > 2), condition-number, volume, and relative-volume criteria. Because the referee's concern is well-founded in principle, we will add a self-contained appendix that reproduces every intermediate matrix-algebra step for the partitioned pseudoinverse and the resulting closed-form optima. This will make the derivations independently verifiable without altering the main text. revision: partial

Circularity Check

0 steps flagged

NP-hardness via reductions from known problems and standard matrix identities is self-contained

full rationale

The paper proves NP-hardness of multiple column subset selection criteria by constructing explicit reductions from established NP-hard problems and deriving the required optimal values of each criterion using algebraic expressions for partitioned pseudo-inverses that hold under the stated regime k ≤ min{rank(A), m/3}. These steps rely on standard linear-algebra identities rather than any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation. The derivation chain is therefore independent of the target result and remains falsifiable against external complexity and matrix-theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from computational complexity theory and linear algebra (properties of matrix norms, pseudo-inverses, and volume) together with the small-k restriction stated in the abstract.

axioms (2)
  • standard math P ≠ NP
    Implicit when concluding that NP-hard problems have no polynomial-time algorithms unless P=NP.
  • domain assumption Standard algebraic identities for partitioned pseudo-inverses and matrix norms hold for real matrices.
    Used to derive optimal values of the selection criteria.

pith-pipeline@v0.9.0 · 5699 in / 1337 out tokens · 52157 ms · 2026-05-18T00:46:19.303718+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Armstrong andA

    R. Armstrong andA. Damle,Collect, commit, expand: Efficient CPQR-based column selection for extremely wide matrices, 2025. arXiv:2501.18035

  2. [2]

    Avron andC

    H. Avron andC. Boutsidis,Faster subset selection for matrices and applications, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1464–1499

  3. [3]

    O. M. Baksalary andG. Trenkler,On formulae for the Moore-Penrose inverse of a columnwise partitioned matrix, Appl. Math. Comput., 403 (2021), pp. Paper No. 125913, 10. 26

  4. [4]

    Bhatia,Matrix analysis, vol

    R. Bhatia,Matrix analysis, vol. 169 of Graduate Texts in Mathematics, Springer- Verlag, New York, 1997

  5. [5]

    Boutsidis, P

    C. Boutsidis, P. Drineas,andM. Magdon-Ismail,Near-optimal column-based matrix reconstruction, SIAM J. Comput., 43 (2014), pp. 687–717

  6. [6]

    Çivril,Column subset selection problem is UG-hard, J

    A. Çivril,Column subset selection problem is UG-hard, J. Comput. and System Sci., 80 (2014), pp. 849–859

  7. [7]

    Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp

    A. Cegielski,Obtuse cones and Gram matrices with non-negative inverse, Linear Algebra Appl., 335 (2001), pp. 167–181

  8. [8]

    Chandrasekaran andI

    S. Chandrasekaran andI. C. F. Ipsen,On rank-revealing factorisations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592–622

  9. [9]

    Çivril andM

    A. Çivril andM. Magdon-Ismail,On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), pp. 4801– 4811

  10. [10]

    ,Exponential inapproximability of selecting a maximum volume sub-matrix, Algorithmica, 65 (2013), pp. 159–176

  11. [11]

    R. E. Cline,Representations for the generalized inverse of a partitioned matrix, J. Soc. Indust. Appl. Math., 12 (1964), pp. 588–600

  12. [12]

    R. W. Cottle,Manifestations of the Schur complement, Linear Algebra Appl., 8 (1974), pp. 189–211

  13. [13]

    Damle, S

    A. Damle, S. Glas, A. Townsend,andA. Yu,How to reveal the rank of a matrix?,

  14. [14]

    R.deHoog andR

    F. R.deHoog andR. M. M. Mattheij,A note on subset selection for matrices, Linear Algebra Appl., 434 (2011), pp. 1845–1850

  15. [15]

    J. W. Demmel,The geometry of ill-conditioning, J. Complexity, 3 (1987), pp. 201– 229

  16. [16]

    Eskenazis and Y

    S. Esw ar, V . Rao,andA. K. Saibaba,Bayesian D-optimal experimental designs via column subset selection, arXiv preprint arXiv:2402.16000, (2024)

  17. [17]

    M. R. Garey andD. S. Johnson,Computers and intractability, A Series of Books in the Mathematical Sciences, W. H. Freeman and Co., San Francisco, CA, 1979. A guide to the theory of NP-completeness

  18. [18]

    G. H. Golub andC. F. VanLoan,Matrix computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

  19. [19]

    Grigori andZ

    L. Grigori andZ. Xue,Randomized strong rank-revealing QR for column subset selection and low-rank matrix approximation, 2025. arXiv:2503.18496. 27

  20. [20]

    M. Gu andS. C. Eisenstat,Efficient algorithms for computing a strong rank- revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869

  21. [21]

    Y . P. Hong andC.-T. Pan,A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), pp. 27–32

  22. [22]

    R. A. Horn andC. R. Johnson,Matrix analysis, Cambridge University Press, Cambridge, second ed., 2013

  23. [23]

    I. C. F. Ipsen andA. K. Saibaba,Stable rank and intrinsic dimension of real and complex matrices, SIAM J. Matrix Anal. Appl., 46 (2025), pp. 1988–2007

  24. [24]

    R. M. Karp,Reducibility among combinatorial problems, in Complexity of com- puter computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, York- town Heights, N.Y., 1972), The IBM Research Symposia Series, Plenum, New York-London, 1972, pp. 85–103

  25. [25]

    Kozyrev andA

    I. Kozyrev andA. Osinsky,Subset selection for matrices in spectral norm, 2025. arXiv:2507.20435

  26. [26]

    Kressner, T

    D. Kressner, T. Ni,andA. Uschmajew,On the approximation of vector-valued functions by volume sampling, 2024

  27. [27]

    J. T. Lauzon, S. W. Cheung, Y . Shin, Y . Choi, D. M. Copeland,andK. Huynh,S- OPT: a points selection algorithm for hyper-reduction in reduced order models, SIAM J. Sci. Comput., 46 (2024), pp. B474–B501

  28. [28]

    P. J. Maher,Some singular values, and unitarily invariant norm inequalities con- cerning generalized inverses, Filomat, 21 (2007), pp. 99–111

  29. [29]

    Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp

    A. Osinsky,Polynomial timeρ-locally maximum volume search, Calcolo, 60 (2023), pp. Paper No. 42, 26

  30. [30]

    Osinsky,V olume-based subset selection, Numer

    A. Osinsky,V olume-based subset selection, Numer. Linear Algebra Appl., 31 (2024), pp. Paper No. e2525, 14

  31. [31]

    C. H. Papadimitriou andM. Yannakakis,Optimization, approximation, and com- plexity classes, J. Comput. System Sci., 43 (1991), pp. 425–440

  32. [32]

    Shin andD

    Y . Shin andD. Xiu,Nonadaptive quasi-optimal points selection for least squares linear regression, SIAM J. Sci. Comput., 38 (2016), pp. A385–A411

  33. [33]

    Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp

    Y . Shitov,Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp. 52–58

  34. [34]

    D. P. Williamson andD. B. Shmoys,The design of approximation algorithms, Cambridge University Press, Cambridge, 2011. 28