HyPE detects harmful prompts as outliers in hyperbolic space and HyPS sanitizes them using explainable attribution, outperforming prior defenses in accuracy and robustness across datasets and adversarial scenarios.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
HyPE detects harmful prompts as outliers in hyperbolic space and HyPS sanitizes them using explainable attribution, outperforming prior defenses in accuracy and robustness across datasets and adversarial scenarios.