The Twelfth International Conference on Learning Representations , year=

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

ABRA: Agent Benchmark for Radiology Applications

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

cs.SE · 2025-02-25 · unverdicted · novelty 7.0

SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

citing papers explorer

Showing 2 of 2 citing papers.

ABRA: Agent Benchmark for Radiology Applications cs.CV · 2026-05-11 · unverdicted · none · ref 17
ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution cs.SE · 2025-02-25 · unverdicted · none · ref 165
SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

The Twelfth International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer