pith. sign in

arXiv preprint arXiv:2311.10538 , year=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.AI 2 cs.CR 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

representative citing papers

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

FORTIS: Benchmarking Over-Privilege in Agent Skills

cs.AI · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.

citing papers explorer

Showing 3 of 3 citing papers.

  • ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents cs.AI · 2026-05-13 · unverdicted · none · ref 62 · 2 links

    ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

  • FORTIS: Benchmarking Over-Privilege in Agent Skills cs.AI · 2026-05-09 · unverdicted · none · ref 10 · 2 links

    FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.

  • AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration cs.CR · 2025-12-08 · unverdicted · none · ref 26

    AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.