SWE-bench mobile: An evaluation benchmark for mobile app engineering

Muxin Tian, Zhe Wang, Blair Yang, Zhenwei Tang, Kunlun Zhu, Honghua Dong, Hanchen Li, Xinni Xie, Guangjing Wang, Jiaxuan You · 2026 · arXiv 2602.09540

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Towards Direct Evaluation of Harness Optimizers via Priority Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.

SWE-Bench 5G: Benchmarking AI Coding Agents on Telecom Network Engineering Tasks

cs.NI · 2026-04-29 · unverdicted · novelty 6.0

SWE-Bench 5G is the first benchmark for AI agents fixing bugs in 5G core network software, showing high diagnosis rates but low resolution that improves conditionally with specification context.

citing papers explorer

Showing 2 of 2 citing papers.

Towards Direct Evaluation of Harness Optimizers via Priority Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 8
Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.
SWE-Bench 5G: Benchmarking AI Coding Agents on Telecom Network Engineering Tasks cs.NI · 2026-04-29 · unverdicted · none · ref 16
SWE-Bench 5G is the first benchmark for AI agents fixing bugs in 5G core network software, showing high diagnosis rates but low resolution that improves conditionally with specification context.

SWE-bench mobile: An evaluation benchmark for mobile app engineering

fields

years

verdicts

representative citing papers

citing papers explorer