SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.
Token-Hungry, Yet Precise: DeepSeek R1 highlights the need for multi-step reasoning over speed in MATH,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.
citing papers explorer
-
SE-GA: Memory-Augmented Self-Evolution for GUI Agents
SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.
-
A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.