Title resolution pending

https://arxiv · 2025 · arXiv 2510.06036

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems

cs.CV · 2026-06-06 · unverdicted · novelty 7.0

Sci-Rho is a dynamic multilingual visually-grounded symbolic benchmark for STEM problems that reveals robustness gaps in current VLMs between average and worst-case performance.

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

TLO is a logit-based diagnostic that visualizes temporal patterns of LLM jailbreak failures on a calibrated 2D plane, distinguishing attacks with identical ASR and enabling early stopping that reduces successful jailbreaks by more than half.

Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Behavioral assurance is structurally unable to verify the latent safety properties demanded by AI governance frameworks enacted 2019-2026.

Fast Multi-dimensional Refusal Subspaces via RFM-AGOP

cs.AI · 2026-07-02 · unverdicted · novelty 4.0

Adapts RFM-AGOP to identify multi-dimensional refusal subspaces in LLMs faster than prior methods with improved ablation performance.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures cs.AI · 2026-05-28 · unverdicted · none · ref 28
TLO is a logit-based diagnostic that visualizes temporal patterns of LLM jailbreak failures on a calibrated 2D plane, distinguishing attacks with identical ASR and enabling early stopping that reduces successful jailbreaks by more than half.
Fast Multi-dimensional Refusal Subspaces via RFM-AGOP cs.AI · 2026-07-02 · unverdicted · none · ref 3
Adapts RFM-AGOP to identify multi-dimensional refusal subspaces in LLMs faster than prior methods with improved ablation performance.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer