pith. sign in

hub Baseline reference

Os-harm: A benchmark for measuring safety of computer use agents

Baseline reference. 67% of citing Pith papers use this work as a benchmark or comparison.

14 Pith papers citing it
Baseline 67% of classified citations

hub tools

citation-role summary

dataset 3 background 2 baseline 1

citation-polarity summary

years

2026 12 2025 2

representative citing papers

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

citing papers explorer

Showing 14 of 14 citing papers.