pith. sign in

To- wards interactive evaluations for interaction harms in human-ai systems.arXiv preprint arXiv:2405.10632, 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 1 2025 2

representative citing papers

Benchmarking Misuse Mitigation Against Covert Adversaries

cs.CR · 2025-06-06 · unverdicted · novelty 6.0

Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

citing papers explorer

Showing 3 of 3 citing papers.