Sci-Rho is a dynamic multilingual visually-grounded symbolic benchmark for STEM problems that reveals robustness gaps in current VLMs between average and worst-case performance.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
TLO is a logit-based diagnostic that visualizes temporal patterns of LLM jailbreak failures on a calibrated 2D plane, distinguishing attacks with identical ASR and enabling early stopping that reduces successful jailbreaks by more than half.
Behavioral assurance is structurally unable to verify the latent safety properties demanded by AI governance frameworks enacted 2019-2026.
Adapts RFM-AGOP to identify multi-dimensional refusal subspaces in LLMs faster than prior methods with improved ablation performance.
citing papers explorer
-
Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures
TLO is a logit-based diagnostic that visualizes temporal patterns of LLM jailbreak failures on a calibrated 2D plane, distinguishing attacks with identical ASR and enabling early stopping that reduces successful jailbreaks by more than half.
-
Fast Multi-dimensional Refusal Subspaces via RFM-AGOP
Adapts RFM-AGOP to identify multi-dimensional refusal subspaces in LLMs faster than prior methods with improved ablation performance.