Scan the cluster metadata first, including any topic labels, suspicion scores, sizes, and representative traces, and use that to decide which groups deserve deeper investigation

If the repository includes clusters

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Detecting Safety Violations Across Many Agent Traces

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

Meerkat uses clustering plus agentic search to detect sparse safety violations across many agent traces, outperforming baselines and finding nearly 4x more reward-hacking cases on CyBench.

citing papers explorer

Showing 1 of 1 citing paper.

Detecting Safety Violations Across Many Agent Traces cs.AI · 2026-04-13 · unverdicted · none · ref 11
Meerkat uses clustering plus agentic search to detect sparse safety violations across many agent traces, outperforming baselines and finding nearly 4x more reward-hacking cases on CyBench.

Scan the cluster metadata first, including any topic labels, suspicion scores, sizes, and representative traces, and use that to decide which groups deserve deeper investigation

fields

years

verdicts

representative citing papers

citing papers explorer