LearnWeak specializes small CUAs via weakness detection by a reference agent, targeted task synthesis, and error-aware training, delivering 11+ point gains on OSWorld.
Pptarena: A benchmark for agentic powerpoint editing
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LLMs corrupt an average of 25% of document content during long delegated editing workflows across 52 domains, even frontier models, and agentic tools do not mitigate the issue.
DeepSlide introduces a multi-agent system for full presentation preparation that matches baselines on slide quality but improves narrative flow, pacing, and script synergy via a new dual-scoreboard benchmark.
citing papers explorer
-
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
LearnWeak specializes small CUAs via weakness detection by a reference agent, targeted task synthesis, and error-aware training, delivering 11+ point gains on OSWorld.
-
LLMs Corrupt Your Documents When You Delegate
LLMs corrupt an average of 25% of document content during long delegated editing workflows across 52 domains, even frontier models, and agentic tools do not mitigate the issue.
-
DeepSlide: From Artifacts to Presentation Delivery
DeepSlide introduces a multi-agent system for full presentation preparation that matches baselines on slide quality but improves narrative flow, pacing, and script synergy via a new dual-scoreboard benchmark.