pith. sign in

Title resolution pending

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.SE 4 cs.CR 1

years

2026 5

verdicts

UNVERDICTED 5

roles

background 2

polarities

background 2

clear filters

representative citing papers

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

cs.SE · 2026-05-13 · unverdicted · novelty 7.0 · 3 refs

PBT-Bench is a new benchmark with 100 property-based testing problems across 40 Python libraries that measures LLM bug recall rates of 42.1-83.4% under guided prompting versus 31.4-76.7% in baseline.

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Synthesizing Multi-Agent Harnesses for Vulnerability Discovery cs.CR · 2026-04-22 · unverdicted · none · ref 14

    AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.