An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
T -Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
AURA improves implicit-need coverage by 0.07 over ReAct baselines on a 100-query benchmark by inserting an intent inference step controlled by a gap score, while cutting probes 82% on factual tasks.
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.
citing papers explorer
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents
AURA improves implicit-need coverage by 0.07 over ReAct baselines on a 100-query benchmark by inserting an intent inference step controlled by a gap score, while cutting probes 82% on factual tasks.
-
PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.