A new evaluation framework shows that even the best tested LLM only reliably adjusts response complexity in the intended direction 46% of the time across 98 scientific queries.
Brevity is the soul of sustainability: Characterizing LLM response lengths
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SlimSearcher reduces tool-call rounds by 17-58% on GAIA, BrowseComp and XBenchDeepSearch while maintaining accuracy via Pareto filtration in SFT and Adaptive Reward Gating in RL.
citing papers explorer
-
Explain Like I'm 5 or Whatever I Choose: Evaluating the Interactive Potential of Language Model Responses
A new evaluation framework shows that even the best tested LLM only reliably adjusts response complexity in the intended direction 46% of the time across 98 scientific queries.
-
SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating
SlimSearcher reduces tool-call rounds by 17-58% on GAIA, BrowseComp and XBenchDeepSearch while maintaining accuracy via Pareto filtration in SFT and Adaptive Reward Gating in RL.