Feature-Space Smoothing: Certified Robustness of Deep Representations

arxiv: 2601.16200 · v3 · pith:IBNGABSFnew · submitted 2026-01-22 · 💻 cs.LG · cs.CV

Feature-Space Smoothing: Certified Robustness of Deep Representations

Song Xia , Meiwen Ding , Chenqi Kong , Wenhan Yang , Xudong Jiang This is my paper

classification 💻 cs.LG cs.CV

keywords robustnesscertifiedfeaturefeature-spacegaussianundercosineencoder

0 comments p. Extension

pith:IBNGABSF Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{IBNGABSF}

Prints a linked pith:IBNGABSF badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Modern deep learning models exhibit strong capabilities across diverse applications, yet remain vulnerable to malicious inputs that induce erroneous predictions via feature-space distortion. To address this vulnerability, we propose Feature-space Smoothing (FS), a general defense framework that provides certified robustness at the feature representation level. We show that FS converts a given feature encoder into a smoothed variant that is guaranteed to maintain a certified lower bound on the cosine similarity between clean and adversarial features under l_2-bounded perturbations. We then establish that this Feature Cosine Similarity Bound (FCSB) can be extended to the prediction-wise certification under the cosine similarity measure, and the value of FCSB is determined by the encoder intrinsic Gaussian robustness score. Building on those insights, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module to improve the encoder Gaussian robustness score. Specifically, the GSB module is plugged to enhance the feature-space consistency and maintain the feature utility for downstream tasks under Gaussian perturbations. This design enables seamless integration of FS on the protected model, e.g., Multimodal Large Language Models (MLLMs), without additional model retraining or alignment, improving its robustness while preserving the performance for downstream task-oriented decoding. Extensive experiments demonstrate that integrating FS consistently provides non-trivial certified robustness and significantly improves task-oriented performance under strong white-box adversarial attacks across diverse models and applications.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings
cs.CY 2026-05 unverdicted novelty 4.0

The study shows clinical AI accuracy collapsing from 89% to 62% on X-rays under imperceptible adversarial perturbations and from 85% to 55% on clinical cases in Nigerian Pidgin and Yoruba-inflected English.