pith. sign in
Pith Number

pith:SJKJBZBO

pith:2026:SJKJBZBO3DTDVL3EXNLGJUFX65
not attested not anchored not stored refs resolved

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

Binwu Wang, Chenyang Lyu, Guanhua Chen, Lecheng Yan, Longyue Wang, Ruizhe Li, Wenxi Li, Xicheng Han

LLM agents face cognitive poisoning when tools build trust through benign feedback before executing harmful final actions.

arxiv:2605.17453 v1 · 2026-05-17 · cs.CR · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SJKJBZBO3DTDVL3EXNLGJUFX65}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Trajectory-aware final-action scoring yields strong in-domain discrimination and remains effective under balanced out-of-distribution transfer; under GuardedJoint, VISTA-Guard reaches 84.2 in-domain and 56.9 on balanced out-of-distribution while methods optimizing only one side of the safety-utility tradeoff collapse to zero.

C2weakest assumption

The constructed TRUST-Bench episodes with hidden triggers and matched safe controls sufficiently represent real-world malicious tool behaviors in black-box ecosystems, and abstracting multi-step interactions into environment variables that encode trust-formation dynamics provides a faithful enough representation for reliable final-action risk scoring.

C3one line summary

Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.

References

50 extracted · 50 resolved · 10 Pith anchors

[1] Identifying the Risks of LM Agents with an LM-Emulated Sandbox 2023 · arXiv:2309.15817
[2] Stabletoolbench: Towards stable large-scale benchmarking on tool learning of large language models 2024
[3] Toolsandbox: A stateful, conversational, inter- active evaluation benchmark for llm tool use capabilities 2025
[4] The tool decathlon: Benchmarking language agents for diverse, realistic, and long-horizon task execution 2025
[5] Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection 2023

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:04:39.804688Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

925490e42ed8e63aaf64bb5664d0b7f7581c7a88ed674cbf5f9ff0936f550d01

Aliases

arxiv: 2605.17453 · arxiv_version: 2605.17453v1 · doi: 10.48550/arxiv.2605.17453 · pith_short_12: SJKJBZBO3DTD · pith_short_16: SJKJBZBO3DTDVL3E · pith_short_8: SJKJBZBO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SJKJBZBO3DTDVL3EXNLGJUFX65 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 925490e42ed8e63aaf64bb5664d0b7f7581c7a88ed674cbf5f9ff0936f550d01
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "63f98a9a551a058eef17ceb64d5bd035502a7b3b3b24ce832ead9bb0eba95d35",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-17T13:51:34Z",
    "title_canon_sha256": "3939805b5c1d5d6460cf26ebbec780b3f3a786e2444d3e5ee123b417e8c3ae5e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17453",
    "kind": "arxiv",
    "version": 1
  }
}