pith. sign in

hub

Sweet-rl: Training multi-turn llm agents on collaborative reasoning tasks

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

years

2026 10 2025 3

roles

background 3

polarities

background 3

representative citing papers

Step Rejection Fine-Tuning: A Practical Distillation Recipe

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Step Rejection Fine-Tuning masks loss on erroneous steps identified by a critic LLM in unresolved trajectories, raising SWE-bench Verified resolution rate by 3.7% to 32.2% versus 2.4% for trajectory-level rejection.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 13 of 13 citing papers.