Trustagent: Towards safe and trustworthy llm-based agents

Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, Yongfeng Zhang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

cs.CL · 2025-05-20 · unverdicted · novelty 6.0

Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.

citing papers explorer

Showing 1 of 1 citing paper.

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs cs.CL · 2025-05-20 · unverdicted · none · ref 12
Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.

Trustagent: Towards safe and trustworthy llm-based agents

fields

years

verdicts

representative citing papers

citing papers explorer