pith. sign in

hub Baseline reference

Os-harm: A benchmark for measuring safety of computer use agents

Baseline reference. 57% of citing Pith papers use this work as a benchmark or comparison.

20 Pith papers citing it
Baseline 57% of classified citations

hub tools

citation-role summary

background 3 dataset 3 baseline 1

citation-polarity summary

years

2026 18 2025 2

clear filters

representative citing papers

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

citing papers explorer

Showing 20 of 20 citing papers.