pith. sign in

arxiv: 2606.12930 · v1 · pith:SRYVUYZGnew · submitted 2026-06-11 · 💻 cs.LG

Is Spurious Correlation Removal Always Learnable?

Pith reviewed 2026-06-27 07:10 UTC · model grok-4.3

classification 💻 cs.LG
keywords invariant learningspurious correlationscomputational barriersmulti-environment learningsparse recoveryidentifiabilityminimax risk
0
0 comments X

The pith

Even when the invariant subspace is statistically identifiable, removing spurious correlations can remain computationally intractable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that invariant learning can fail computationally even when the invariant structure is statistically identifiable. Under a black-box samplable supervised sparse recovery primitive, there exist samplable multi-environment instances with a one-dimensional predictive invariant subspace that allow polynomial-sample learning by exhaustive search. Any polynomial-time constant-accuracy recovery algorithm on these instances would contradict the primitive. The work further quantifies environment diversity via a separation parameter gamma that governs identifiability, objective curvature, minimax risk, and a phase transition in sample complexity.

Core claim

Under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist samplable multi-environment instances with a one-dimensional predictive invariant subspace (k=1) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. Under sufficient diversity and local Gaussian regularity the minimax risk is E[dist(Vhat, Vinv)^2] = Theta(k(d-k)/(n|E|)), and under label-induced shifts a phase transition occurs at n* proportional to k(d-k)/(|E| gamma^2) with refined estimation error scaling proportional to 1/gamma^2.

What carries the argument

The black-box samplable supervised sparse recovery primitive that encodes computational hardness for recovering the invariant subspace from multi-environment data.

If this is right

  • Samplable multi-environment instances exist where exhaustive search succeeds polynomially but efficient recovery fails.
  • The separation parameter gamma controls both identifiability and the curvature of invariance objectives.
  • The minimax risk for estimating the invariant subspace is Theta(k(d-k)/(n|E|)) under Gaussian regularity.
  • Under label-induced shifts a phase transition in sample complexity occurs at n* proportional to k(d-k)/(|E| gamma^2) with error scaling as 1/gamma^2.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical invariant learning may need heuristics or extra structure to avoid this computational gap.
  • Dataset construction should prioritize high environment diversity to cross the identifiability threshold.
  • The reduction technique could apply to other settings where statistical identifiability does not guarantee efficient algorithms.

Load-bearing premise

The black-box supervised sparse recovery primitive is computationally hard for any polynomial-time algorithm achieving constant accuracy.

What would settle it

A polynomial-time algorithm achieving constant accuracy on recovering the invariant subspace from the constructed multi-environment instances without solving the sparse recovery primitive.

Figures

Figures reproduced from arXiv: 2606.12930 by Bo Li, Hai-Miao Hu, Hanzi Wang, Ruifan Zhang, Xiaokang Zhang, Yibo Zhou.

Figure 1
Figure 1. Figure 1: Compute–sample gap on synthetic instances: exhaustive search needs far fewer samples than the best polynomial-time method to reach the same feature-selection accuracy. relations among classes rather than class-specific evidence (Zhou et al., 2023; Zhang & Hu, 2025b; Zhou et al., 2025a; 2024; 2025b; Zhang et al., 2023b). Invariant learning uses multi-environment data to target features whose predictive rela… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical signatures: (a) compute–accuracy tradeoff; (b) phase transition vs. sample size; (c) effect of diversity [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Training dynamics: worst-group accuracy lags behind OOD accuracy and saturates later. scaling n ∗ ∝ keff(deff − keff)/(|E|γˆ 2 repr). In practice, these estimates should be combined with validation performance and worst-group metrics when available. Step-by-step guide. 1. Train a preliminary ERM model and compute γˆrepr on penultimate features. 2. If γˆrepr is small, prioritize collecting more diverse envi… view at source ↗
Figure 4
Figure 4. Figure 4: Decision flowchart for applying the diversity and computational-statistical framework. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
read the original abstract

Invariant learning can fail even when the invariant structure is statistically identifiable. We show a conditional computational barrier: under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist \emph{samplable} multi-environment instances with a one-dimensional predictive invariant subspace ($k=1$) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. We further quantify environment diversity by a separation parameter $\gamma$, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity and local Gaussian regularity, the minimax risk is $\mathbb{E}[\dist(\hat{V},V_{\mathrm{inv}})^2]=\Theta(k(d-k)/(n|\mathcal{E}|))$, and under label-induced shifts a phase transition occurs at $n^*\propto k(d-k)/(|\mathcal{E}|\gamma^2)$ with refined estimation error scaling proportional to $1/\gamma^2$. Synthetic and real datasets illustrate the predicted gaps and transitions and motivate simple diversity diagnostics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims a conditional computational barrier for invariant learning: under a black-box samplable supervised sparse recovery primitive (motivated by average-case sparse recovery), there exist samplable multi-environment distributions possessing a one-dimensional predictive invariant subspace (k=1) that exhaustive search can learn with polynomial samples, yet any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. A separation parameter γ quantifies environment diversity and controls identifiability; under sufficient diversity and local Gaussian regularity the statistical minimax risk is Θ(k(d-k)/(n|E|)), while label-induced shifts induce a phase transition at n* ∝ k(d-k)/(|E|γ²) with refined error scaling as 1/γ². Synthetic and real-data experiments illustrate the predicted statistical-computational gaps and transitions.

Significance. If the reduction holds, the result establishes that statistical identifiability of an invariant subspace does not imply efficient learnability, separating statistical and computational aspects of spurious-correlation removal in a manner analogous to other average-case hardness reductions. The explicit γ-controlled rates, phase transition, and samplable construction provide concrete, falsifiable predictions. Credit is due for the black-box primitive formulation, the minimax derivation, and the inclusion of both synthetic and real-dataset validation that directly tests the predicted gaps.

major comments (2)
  1. [Reduction construction (§4)] Reduction (likely §4–5): the central claim requires that the explicit construction simultaneously (i) produces a joint distribution samplable in poly time, (ii) keeps V_inv statistically identifiable so that exhaustive search succeeds with poly samples, and (iii) ensures no poly-time constant-accuracy algorithm can recover an approximation without solving the primitive. The presentation does not contain an explicit argument ruling out the possibility that environment sampling or label-induced shifts leak information that an efficient algorithm could exploit to bypass the hardness.
  2. [Minimax analysis (§3)] Minimax risk derivation (Eq. for Θ(k(d-k)/(n|E|))): the statistical lower bound appears to treat environments as fixed and independent; it is unclear whether the same rate continues to hold once the construction is required to embed the sparse-recovery primitive while preserving samplability, or whether the embedding introduces additional statistical cost that would alter the Θ(·) expression.
minor comments (2)
  1. [Notation] Notation: the symbol |E| is used both for the number of environments and (implicitly) for the cardinality of the environment set; a single consistent definition would avoid ambiguity when the phase-transition threshold is stated.
  2. [Experiments] Figure captions for the synthetic experiments should explicitly state the value of γ and the dimension d used in each panel so that the observed gaps can be directly compared to the predicted n* scaling.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments on our manuscript. We address each major comment below, providing clarifications on the reduction and minimax analysis while committing to revisions that strengthen the presentation.

read point-by-point responses
  1. Referee: [Reduction construction (§4)] Reduction (likely §4–5): the central claim requires that the explicit construction simultaneously (i) produces a joint distribution samplable in poly time, (ii) keeps V_inv statistically identifiable so that exhaustive search succeeds with poly samples, and (iii) ensures no poly-time constant-accuracy algorithm can recover an approximation without solving the primitive. The presentation does not contain an explicit argument ruling out the possibility that environment sampling or label-induced shifts leak information that an efficient algorithm could exploit to bypass the hardness.

    Authors: The construction in Section 4 embeds the sparse-recovery primitive into the invariant subspace such that the joint distribution remains samplable in polynomial time via the black-box primitive and independent Gaussian noise. Identifiability for exhaustive search is ensured by the γ-separation condition, which guarantees a unique population minimizer at V_inv. Environment sampling and label-induced shifts are generated independently of the sparse vector and orthogonal to it in the sense that they do not reveal its support; any algorithm attempting to exploit them for constant-accuracy recovery must still solve an instance of the primitive. We will add an explicit paragraph in the revised manuscript (new Remark 4.3) that formally rules out leakage by showing that any bypass would imply an efficient solver for the primitive, contradicting the assumption. revision: yes

  2. Referee: [Minimax analysis (§3)] Minimax risk derivation (Eq. for Θ(k(d-k)/(n|E|))): the statistical lower bound appears to treat environments as fixed and independent; it is unclear whether the same rate continues to hold once the construction is required to embed the sparse-recovery primitive while preserving samplability, or whether the embedding introduces additional statistical cost that would alter the Θ(·) expression.

    Authors: The minimax lower bound of Section 3 is established for the broad class of distributions satisfying the local Gaussian regularity and diversity assumptions (Assumptions 1–3), without reference to computational primitives. The Section 4 construction is explicitly verified to satisfy these assumptions: the embedding of the sparse instance into the means preserves the local curvature, environment independence, and γ-controlled separation, introducing no extra statistical cost. Consequently the Θ(k(d-k)/(n|E|)) rate and the n* phase transition remain unchanged. We will insert a short verification paragraph after the construction to confirm that the embedded distributions lie within the minimax class. revision: yes

Circularity Check

0 steps flagged

No circularity: conditional hardness via external black-box primitive

full rationale

The paper's central result is a conditional computational barrier obtained by explicit reduction to a stated external black-box samplable supervised sparse recovery primitive (motivated by average-case sparse-recovery reductions but not derived internally). The construction is presented as preserving samplability while ensuring statistical identifiability for exhaustive search, with no equations or steps that reduce a claimed prediction or uniqueness result to a self-definition, fitted input, or self-citation chain. The separation parameter γ and minimax rate are derived from standard statistical arguments under the stated assumptions, without circular renaming or ansatz smuggling. This is the normal case of an externally grounded hardness claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the hardness of the external sparse recovery primitive and the samplability of the constructed instances; gamma is introduced as a modeling parameter but not fitted to data in the abstract.

free parameters (1)
  • gamma
    Separation parameter controlling identifiability and curvature of invariance objectives; introduced to quantify environment diversity.
axioms (1)
  • domain assumption Existence of a black-box samplable supervised sparse recovery primitive that is hard for polynomial-time constant-accuracy algorithms, motivated by average-case sparse-recovery reductions.
    The computational barrier is conditional on this primitive; invoked to separate statistical identifiability from computational tractability.

pith-pipeline@v0.9.1-grok · 5722 in / 1365 out tokens · 36179 ms · 2026-06-27T07:10:14.425308+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 7 canonical work pages

  1. [1]

    International Conference on Learning Representations , year =

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , author =. International Conference on Learning Representations , year =

  2. [3]

    , journal =

    Gao, Chao and Ma, Zongming and Zhou, Harrison H. , journal =. Sparse. 2017 , month =

  3. [4]

    Proceedings of the 38th International Conference on Machine Learning , series =

    Out-of-Distribution Generalization via Risk Extrapolation (REx) , author =. Proceedings of the 38th International Conference on Machine Learning , series =. 2021 , publisher =

  4. [7]

    Proceedings of the 38th International Conference on Machine Learning , series =

    WILDS: A Benchmark of in-the-Wild Distribution Shifts , author =. Proceedings of the 38th International Conference on Machine Learning , series =. 2021 , publisher =

  5. [8]

    Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , series =

    Does Invariant Risk Minimization Capture Invariance? , author =. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , series =. 2021 , publisher =

  6. [9]

    Proceedings of the 30th Conference on Learning Theory , series =

    A General Characterization of the Statistical Query Complexity , author =. Proceedings of the 30th Conference on Learning Theory , series =. 2017 , publisher =

  7. [10]

    Banach Center Publications , volume =

    Metric Entropy of Homogeneous Spaces , author =. Banach Center Publications , volume =. 1998 , publisher =. doi:10.4064/-43-1-395-410 , url =

  8. [11]

    Random Structures & Algorithms , volume =

    Jerrum, Mark , title =. Random Structures & Algorithms , volume =. 1992 , doi =

  9. [12]

    Random Structures & Algorithms , volume =

    Alon, Noga and Krivelevich, Michael and Sudakov, Benny , title =. Random Structures & Algorithms , volume =. 1998 , doi =

  10. [13]

    Random Structures & Algorithms , volume =

    Feige, Uriel and Krauthgamer, Robert , title =. Random Structures & Algorithms , volume =. 2000 , doi =

  11. [14]

    Proceedings of the 26th Annual Conference on Learning Theory , pages =

    Berthet, Quentin and Rigollet, Philippe , title =. Proceedings of the 26th Annual Conference on Learning Theory , pages =. 2013 , editor =

  12. [15]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

    Zhou, Yibo , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  13. [16]

    Proceedings of the 41st International Conference on Machine Learning , pages =

    Pedestrian Attribute Recognition as Label-balanced Multi-label Learning , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =

  14. [18]

    International Journal of Computer Vision , volume =

    Zhou, Yibo and Hu, Hai-Miao and Yu, Jinzuo and Wu, Haotian and Pu, Shiliang and Wang, Hanzi , title =. International Journal of Computer Vision , volume =. 2025 , doi =

  15. [19]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

    Zhou, Yibo and Li, Bo and Hu, Hai-Miao and Zhang, Xiaokang and Zhang, Dongping and Wang, Hanzi , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2025 , doi =

  16. [20]

    Neural Networks , pages=

    ASRL: Correlation-Robust Pedestrian Attribute Recognition via Fixed Orthogonal Classifier , author=. Neural Networks , pages=. 2025 , publisher=

  17. [21]

    IEEE Transactions on Industrial Informatics , volume=

    Dual-constraint autoencoder and adaptive weighted similarity spatial attention for unsupervised anomaly detection , author=. IEEE Transactions on Industrial Informatics , volume=. 2024 , publisher=

  18. [22]

    IEEE Transactions on Image Processing , year=

    A Multi-Category Anomaly Editing Network With Correlation Exploration and Voxel-Level Attention for Unsupervised Surface Anomaly Detection , author=. IEEE Transactions on Image Processing , year=

  19. [23]

    IEEE Transactions on Instrumentation and Measurement , volume=

    JRCC-Net: A segmentation network with joint representation and contrast clustering for surface anomaly detection , author=. IEEE Transactions on Instrumentation and Measurement , volume=. 2023 , publisher=

  20. [24]

    Journal of Intelligent & Fuzzy Systems , volume=

    MFFA: Multi-level feature fusion and anomaly map compensation for anomaly detection , author=. Journal of Intelligent & Fuzzy Systems , volume=. 2023 , publisher=

  21. [25]

    Finding a large hidden clique in a random graph

    Alon, N., Krivelevich, M., and Sudakov, B. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13 0 (3--4): 0 457--466, 1998. doi:10.1002/(SICI)1098-2418(199810/12)13:3/4<457::AID-RSA14>3.0.CO;2-W

  22. [26]

    Invariant risk minimization

    Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019. URL https://arxiv.org/abs/1907.02893

  23. [27]

    and Rigollet, P

    Berthet, Q. and Rigollet, P. Complexity theoretic lower bounds for sparse principal component detection. In Shalev-Shwartz, S. and Steinwart, I. (eds.), Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pp.\ 1046--1066, Princeton, NJ, USA, 12--14 Jun 2013. PMLR. URL https://proceedings.mlr...

  24. [28]

    and Krauthgamer, R

    Feige, U. and Krauthgamer, R. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16 0 (2): 0 195--208, 2000. doi:10.1002/(SICI)1098-2418(200003)16:2<195::AID-RSA5>3.0.CO;2-A

  25. [29]

    and Lopez-Paz, D

    Gulrajani, I. and Lopez-Paz, D. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020. URL https://arxiv.org/abs/2007.01434

  26. [30]

    Large cliques elude the Metropolis process

    Jerrum, M. Large cliques elude the Metropolis process. Random Structures & Algorithms, 3 0 (4): 0 347--359, 1992. doi:10.1002/rsa.3240030402

  27. [31]

    W., Sagawa, S., Marklund, H., Xie, S

    Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B., Haque, I., Beery, S. M., Leskovec, J., Kundaje, A., Pierson, E., Levine, S., Finn, C., and Liang, P. Wilds: A benchmark of in-the-wild distribution shifts. In Proceedings of...

  28. [32]

    Out-of-distribution generalization via risk extrapolation (rex)

    Krueger, D., Caballero, E., Jacobsen, J.-H., Zhang, A., Binas, J., Zhang, D., Le Priol, R., and Courville, A. Out-of-distribution generalization via risk extrapolation (rex). In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.\ 5815--5826. PMLR, 2021. URL https://proceedings....

  29. [33]

    The risks of invariant risk minimization

    Rosenfeld, E., Ravikumar, P., and Risteski, A. The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761, 2020. URL https://arxiv.org/abs/2010.05761

  30. [34]

    W., Hashimoto, T

    Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS

  31. [35]

    and Hu, H.-M

    Zhang, R. and Hu, H.-M. A multi-category anomaly editing network with correlation exploration and voxel-level attention for unsupervised surface anomaly detection. IEEE Transactions on Image Processing, 2025 a

  32. [36]

    Jrcc-net: A segmentation network with joint representation and contrast clustering for surface anomaly detection

    Zhang, R., Wang, H., Feng, M., Liu, Y., and Yang, G. Jrcc-net: A segmentation network with joint representation and contrast clustering for surface anomaly detection. IEEE Transactions on Instrumentation and Measurement, 72: 0 1--14, 2023 a

  33. [37]

    Mffa: Multi-level feature fusion and anomaly map compensation for anomaly detection

    Zhang, R., Wang, H., and Yang, G. Mffa: Multi-level feature fusion and anomaly map compensation for anomaly detection. Journal of Intelligent & Fuzzy Systems, 44 0 (5): 0 7195--7210, 2023 b

  34. [38]

    Dual-constraint autoencoder and adaptive weighted similarity spatial attention for unsupervised anomaly detection

    Zhang, R., Wang, H., Feng, M., Liu, Y., and Yang, G. Dual-constraint autoencoder and adaptive weighted similarity spatial attention for unsupervised anomaly detection. IEEE Transactions on Industrial Informatics, 20 0 (7): 0 9393--9403, 2024

  35. [39]

    and Hu, H.-M

    Zhang, X. and Hu, H.-M. Asrl: Correlation-robust pedestrian attribute recognition via fixed orthogonal classifier. Neural Networks, pp.\ 108408, 2025 b

  36. [40]

    Rethinking reconstruction autoencoder-based out-of-distribution detection

    Zhou, Y. Rethinking reconstruction autoencoder-based out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 7379--7387, June 2022

  37. [41]

    A solution to co-occurence bias: Attributes disentanglement via mutual information minimization for pedestrian attribute recognition

    Zhou, Y., Hu, H.-M., Yu, J., Xu, Z., Lu, W., and Cao, Y. A solution to co-occurence bias: Attributes disentanglement via mutual information minimization for pedestrian attribute recognition. In Elkind, E. (ed.), Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23 , pp.\ 1831--1839. International Joint Confe...

  38. [42]

    Pedestrian attribute recognition as label-balanced multi-label learning

    Zhou, Y., Hu, H.-M., Xiang, Y., Zhang, X., and Wu, H. Pedestrian attribute recognition as label-balanced multi-label learning. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Res...

  39. [43]

    A solution to co-occurrence bias in pedestrian attribute recognition: Theory, algorithms, and improvements

    Zhou, Y., Hu, H.-M., Yu, J., Wu, H., Pu, S., and Wang, H. A solution to co-occurrence bias in pedestrian attribute recognition: Theory, algorithms, and improvements. International Journal of Computer Vision, 133 0 (7): 0 4712--4726, 2025 a . doi:10.1007/s11263-025-02405-7. URL https://doi.org/10.1007/s11263-025-02405-7

  40. [44]

    Heterogeneous feature re-sampling for balanced pedestrian attribute recognition

    Zhou, Y., Li, B., Hu, H.-M., Zhang, X., Zhang, D., and Wang, H. Heterogeneous feature re-sampling for balanced pedestrian attribute recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47 0 (4): 0 2706--2722, 2025 b . doi:10.1109/TPAMI.2025.3526930. URL https://doi.org/10.1109/TPAMI.2025.3526930