CodeClash is a tournament benchmark that tests language models on iterative, goal-oriented codebase development through competitive arenas, showing shared weaknesses in strategic reasoning and maintenance despite diverse styles.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SWE-Pruner trains a lightweight neural skimmer to perform task-aware pruning of code contexts for LLM agents, delivering 23-54% token reduction on SWE-Bench Verified with improved success rates and up to 14.84x compression on LongCodeQA.
AgentStop uses execution signals to early-terminate failing local LLM agent trajectories, cutting energy use 15-20% with minimal utility loss.
citing papers explorer
-
CodeClash: Benchmarking Goal-Oriented Software Engineering
CodeClash is a tournament benchmark that tests language models on iterative, goal-oriented codebase development through competitive arenas, showing shared weaknesses in strategic reasoning and maintenance despite diverse styles.
-
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
SWE-Pruner trains a lightweight neural skimmer to perform task-aware pruning of code contexts for LLM agents, delivering 23-54% token reduction on SWE-Bench Verified with improved success rates and up to 14.84x compression on LongCodeQA.
-
AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices
AgentStop uses execution signals to early-terminate failing local LLM agent trajectories, cutting energy use 15-20% with minimal utility loss.