{"paper":{"title":"Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback","license":"http://creativecommons.org/licenses/by/4.0/","headline":"LLM agents face cognitive poisoning when tools build trust through benign feedback before executing harmful final actions.","cross_cats":["cs.CL"],"primary_cat":"cs.CR","authors_text":"Binwu Wang, Chenyang Lyu, Guanhua Chen, Lecheng Yan, Longyue Wang, Ruizhe Li, Wenxi Li, Xicheng Han","submitted_at":"2026-05-17T13:51:34Z","abstract_excerpt":"Tool-using LLM agents increasingly rely on external tools to make consequential decisions, yet most existing agent-security benchmarks and defenses implicitly assume that tool feedback is trustworthy once a tool has been selected. We study a different failure mode, cognitive poisoning, in which a malicious tool behaves plausibly during exploration, accumulates trust through benign-looking feedback, and becomes harmful only when hidden state conditions align with the final executable action. To study this setting, we construct TRUST-Bench, a task-conditioned benchmark of 1,970 hidden-trigger to"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Trajectory-aware final-action scoring yields strong in-domain discrimination and remains effective under balanced out-of-distribution transfer; under GuardedJoint, VISTA-Guard reaches 84.2 in-domain and 56.9 on balanced out-of-distribution while methods optimizing only one side of the safety-utility tradeoff collapse to zero.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The constructed TRUST-Bench episodes with hidden triggers and matched safe controls sufficiently represent real-world malicious tool behaviors in black-box ecosystems, and abstracting multi-step interactions into environment variables that encode trust-formation dynamics provides a faithful enough representation for reliable final-action risk scoring.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LLM agents face cognitive poisoning when tools build trust through benign feedback before executing harmful final actions.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f65eb8fa3c2fc879c399ada7ea3c5af050ee9cb93b6cacc3cfee7044295ebf89"},"source":{"id":"2605.17453","kind":"arxiv","version":1},"verdict":{"id":"5be1b546-f98a-4bf7-b302-590a0cd17dd6","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T23:23:36.932928Z","strongest_claim":"Trajectory-aware final-action scoring yields strong in-domain discrimination and remains effective under balanced out-of-distribution transfer; under GuardedJoint, VISTA-Guard reaches 84.2 in-domain and 56.9 on balanced out-of-distribution while methods optimizing only one side of the safety-utility tradeoff collapse to zero.","one_line_summary":"Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The constructed TRUST-Bench episodes with hidden triggers and matched safe controls sufficiently represent real-world malicious tool behaviors in black-box ecosystems, and abstracting multi-step interactions into environment variables that encode trust-formation dynamics provides a faithful enough representation for reliable final-action risk scoring.","pith_extraction_headline":"LLM agents face cognitive poisoning when tools build trust through benign feedback before executing harmful final actions."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.17453/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T23:31:19.912691Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T23:30:57.840691Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T21:41:57.711603Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T21:33:23.665031Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"13ff10359d7169e4cf570bca5ee144f0a06ce55230945d5c95c2bad6c281a401"},"references":{"count":50,"sample":[{"doi":"","year":2023,"title":"Identifying the Risks of LM Agents with an LM-Emulated Sandbox","work_id":"3d4c3b66-d749-4939-b1bc-62b10b2ebbb6","ref_index":1,"cited_arxiv_id":"2309.15817","is_internal_anchor":true},{"doi":"","year":2024,"title":"Stabletoolbench: Towards stable large-scale benchmarking on tool learning of large language models","work_id":"ff6a6612-7102-422d-8755-6737e77bab2c","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Toolsandbox: A stateful, conversational, inter- active evaluation benchmark for llm tool use capabilities","work_id":"7672e9c0-db5a-41db-8282-2f27e81b0489","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"The tool decathlon: Benchmarking language agents for diverse, realistic, and long-horizon task execution","work_id":"9263d28f-4683-4cd6-8a1e-5f6b1e15f3a2","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection","work_id":"d0cff0ab-f525-4eb0-918f-be91aeff3786","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":50,"snapshot_sha256":"5c229d6df0a63d6d9968185eae28e1dd16271414519f2c1b75ae9f343fbd5952","internal_anchors":10},"formal_canon":{"evidence_count":2,"snapshot_sha256":"78ca3b64e669eb8fa116f87e1f04e3e70599ee94b7a17406fa5bac0f1ee2695f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}