Title resolution pending

Fang, Haishuo, Zhu, Xiaodan, Gurevych, Iryna , month = dec, year = · arXiv 2407.11843

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

cs.AI · 2026-05-20 · conditional · novelty 6.0

Introduces MOOD benchmark for OOD LLM alignment failures and shows guard models plus Mahalanobis and perplexity OOD detectors improve recall from 39% to 45% with positive scaling.

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

cs.SE · 2026-04-04 · unverdicted · novelty 6.0

Independent evaluation of Claude Code auto mode finds 81% false negative rate on ambiguous authorization tasks due to unmonitored file edits.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs cs.AI · 2026-05-20 · conditional · none · ref 82
Introduces MOOD benchmark for OOD LLM alignment failures and shows guard models plus Mahalanobis and perplexity OOD detectors improve recall from 39% to 45% with positive scaling.
Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode cs.SE · 2026-04-04 · unverdicted · none · ref 3
Independent evaluation of Claude Code auto mode finds 81% false negative rate on ambiguous authorization tasks due to unmonitored file edits.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer