CodeClash is a tournament benchmark that tests language models on iterative, goal-oriented codebase development through competitive arenas, showing shared weaknesses in strategic reasoning and maintenance despite diverse styles.
Execution is crucial to enable models to create and use their own constructs (e.g., analysis scripts, memory systems)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CodeClash: Benchmarking Goal-Oriented Software Engineering
CodeClash is a tournament benchmark that tests language models on iterative, goal-oriented codebase development through competitive arenas, showing shared weaknesses in strategic reasoning and maintenance despite diverse styles.