pith. sign in

To- wards interactive evaluations for interaction harms in human-ai systems

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 2 2025 2

roles

background 1

polarities

support 1

clear filters

representative citing papers

Benchmarking Misuse Mitigation Against Covert Adversaries

cs.CR · 2025-06-06 · unverdicted · novelty 6.0

Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Benchmarking Misuse Mitigation Against Covert Adversaries cs.CR · 2025-06-06 · unverdicted · none · ref 70

    Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

  • Measuring and mitigating overreliance to build human-compatible AI cs.CY · 2025-09-08 · conditional · none · ref 64

    The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.