Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

· 2026 · cs.LG · arXiv 2604.21395

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

PGD adversarial training, the standard robustness method, can reduce Jacobian Frobenius norm yet worsen clean-input geometry (e.g., TDI 1.336 vs. ERM 1.093). We show this is not an implementation artifact but a theorem-level consequence of supervised learning. We prove that any encoder minimizing supervised loss must retain non-zero sensitivity along directions correlated with training labels, including directions that are nuisance at test time. This holds across proper scoring rules, architectures, and dataset sizes. We call this the geometric blind spot of supervised learning. This theorem unifies four empirical phenomena often treated separately: non-robust features, texture bias, corruption fragility, and the robustness-accuracy tradeoff. It also explains why suppressing sensitivity in one adversarial direction can redistribute sensitivity elsewhere. We introduce Trajectory Deviation Index (TDI), a diagnostic of geometric isotropy. Unlike CKA, intrinsic dimension, or Jacobian Frobenius norm alone, TDI captures the failure mode above. In our experiments, PGD attains low Frobenius norm but high TDI, while PMH attains the lowest TDI with one additional training term and no architectural changes. Across seven tasks, BERT/SST-2, and ImageNet ViT-B/16 (backbone family underlying CLIP/DINO/SAM), the blind spot is measurable and repairable. It appears at foundation-model scale, worsens with model scale and task-specific fine-tuning, and is substantially reduced by PMH. PMH also leads on non-Gaussian corruption types (blur/brightness/contrast) without corruption-specific training.

representative citing papers

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

cs.LG · 2026-05-21 · unverdicted · novelty 8.0 · 2 refs

Robustness methods estimate the task covariance Sigma_task, and the matching principle requires penalty matrices to have range covering that of Sigma_task to zero deployment drift.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning cs.LG · 2026-05-21 · unverdicted · none · ref 15 · 2 links · internal anchor
Robustness methods estimate the task covariance Sigma_task, and the matching principle requires penalty matrices to have range covering that of Sigma_task to zero deployment drift.

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

fields

years

verdicts

representative citing papers

citing papers explorer