pith. machine review for the scientific record. sign in

arxiv: 2604.03976 · v2 · submitted 2026-04-05 · 💻 cs.AI · cs.CE

Recognition: no theorem link

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:13 UTC · model grok-4.3

classification 💻 cs.AI cs.CE
keywords trustworthy AIAI agentsrisk managementfinancial underwritingAgentic Risk Standardcompensationautonomous systemsAI safety
0
0 comments X

The pith

The Agentic Risk Standard turns implicit expectations of AI agent behavior into explicit, contractually enforceable compensation for failures and misalignments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Prior work on trustworthy AI centers on internal model properties such as bias mitigation and robustness. As agents become autonomous and handle payments or assets, trust instead requires reliable end-to-end outcomes that stochastic behavior makes impossible to guarantee through technical safeguards alone. The paper proposes the Agentic Risk Standard as a financial underwriting framework that embeds risk assessment, underwriting, and predefined compensation directly into each transaction. Users thereby receive contractually binding payouts for execution failures, intent misalignment, or unintended harms. A simulation study examines the resulting social benefits of this shift from model-level reliability to product-level guarantees.

Core claim

The paper establishes that the Agentic Risk Standard integrates risk assessment, underwriting, and compensation into a single transaction framework for AI-mediated transactions. Under ARS, users receive predefined and contractually enforceable compensation in cases of execution failure, misalignment, or unintended outcomes. This shifts trust from an implicit expectation about model behavior to an explicit, measurable, and enforceable product guarantee.

What carries the argument

The Agentic Risk Standard (ARS), a payment settlement standard that combines risk assessment, underwriting, and compensation into AI agent transactions to create enforceable user guarantees.

Load-bearing premise

Agent risks are fundamentally product-level and cannot be eliminated by technical safeguards alone, so a financial compensation layer is required to create enforceable trust.

What would settle it

A real-world deployment in which technical safeguards alone eliminate all material user harms from agent stochasticity without any compensation mechanism.

Figures

Figures reproduced from arXiv: 2604.03976 by Bryan Lim, Chandler Fang, Chi Wang, Ian Kaufman, Jiaxin Pei, Tianyi Peng, Wenyue Hua.

Figure 1
Figure 1. Figure 1: ARS is a transaction-layer assurance standard for agentic services that converts stochastic, outcome-level risk into explicit settlement rules. Without ARS, users must prepay agents (and in fund￾moving tasks, also hand over execution capital), exposing them to non-delivery, misexecution, and downstream harms. With ARS, service fees are locked in an escrow vault and released only upon successful evaluation;… view at source ↗
Figure 2
Figure 2. Figure 2: The requestor first sends a task specification to the business agent. Both parties may then enter a [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Fee track in the transaction phase: The requestor locks the service fee in an escrow vault before [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Principal Track in the transaction phase: When execution involves user funds (principal), [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: AP2+ARS. AP2 provides authorization evidence and bounded delegation, and ARS adds settlement semantics over the authorized transaction. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: VI+ARS: VI governs privacy-preserving authorization through layered credentials and selective disclosure; ARS governs settlement and compensation over the authorized transaction. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Loading factor sweep: adoption rate, loss reduction rate, failure reduction rate, and underwriter [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FP/FN sweep: adoption rate, loss reduction rate, failure reduction rate, and underwriter wallet [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sigmoid collateral sweep: adoption rate, loss reduction rate, failure reduction rate, and wallet [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
read the original abstract

Prior work on trustworthy AI emphasizes model-internal properties such as bias mitigation, adversarial robustness, and interpretability. As AI systems evolve into autonomous agents deployed in open environments and increasingly connected to payments or assets, the operational meaning of trust shifts to end-to-end outcomes: whether an agent completes tasks, follows user intent, and avoids failures that cause material or psychological harm. These risks are fundamentally product-level and cannot be eliminated by technical safeguards alone because agent behavior is inherently stochastic. To address this gap between model-level reliability and user-facing assurance, we propose a complementary framework based on risk management. Drawing inspiration from financial underwriting, we introduce the \textbf{Agentic Risk Standard (ARS)}, a payment settlement standard for AI-mediated transactions. ARS integrates risk assessment, underwriting, and compensation into a single transaction framework that protects users when interacting with agents. Under ARS, users receive predefined and contractually enforceable compensation in cases of execution failure, misalignment, or unintended outcomes. This shifts trust from an implicit expectation about model behavior to an explicit, measurable, and enforceable product guarantee. We also present a simulation study analyzing the social benefits of applying ARS to agentic transactions. ARS's implementation can be found at https://github.com/t54-labs/AgenticRiskStandard.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes the Agentic Risk Standard (ARS), a financial risk-management framework for AI agents that integrates assessment, underwriting, and compensation into transactions. Under ARS, users receive predefined, contractually enforceable payouts for execution failures, misalignment, or unintended outcomes, shifting trust from implicit model behavior to an explicit product guarantee. A simulation study is included to illustrate social benefits of applying ARS to agentic transactions.

Significance. If the ARS framework is sound and implementable, it supplies a complementary, product-level mechanism for managing stochastic risks that technical safeguards cannot fully eliminate, potentially increasing user adoption of payment-connected AI agents. The simulation study is offered as illustrative evidence of positive social impacts rather than a rigorous proof of necessity or sufficiency.

major comments (1)
  1. Simulation study section: the manuscript states that a simulation was performed to analyze social benefits, yet supplies no quantitative metrics, baseline comparisons, statistical tests, or sensitivity analysis. Because the effectiveness and benefit claims rest on this study, the absence of these details is load-bearing for evaluating the framework.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We agree that the simulation study requires substantial expansion to include quantitative metrics, baselines, statistical tests, and sensitivity analysis. The revised manuscript will address this directly while preserving the illustrative intent of the study.

read point-by-point responses
  1. Referee: Simulation study section: the manuscript states that a simulation was performed to analyze social benefits, yet supplies no quantitative metrics, baseline comparisons, statistical tests, or sensitivity analysis. Because the effectiveness and benefit claims rest on this study, the absence of these details is load-bearing for evaluating the framework.

    Authors: We concur that the current presentation of the simulation study is insufficiently detailed for rigorous evaluation. In the revised version, we will expand the section to report concrete quantitative metrics such as mean user payout amounts, aggregate social welfare gains, and risk reduction percentages; include explicit baseline comparisons against non-ARS agent transactions; apply appropriate statistical tests (e.g., paired t-tests or Wilcoxon rank-sum tests) with reported p-values and effect sizes; and conduct sensitivity analyses over key parameters including agent failure rates, compensation levels, and transaction volumes. These additions will be presented with tables and figures to allow independent assessment of the claimed social benefits. We maintain that the study remains illustrative of potential impacts rather than a definitive proof, but we will strengthen its methodological transparency as requested. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript advances a conceptual proposal for the Agentic Risk Standard (ARS) by drawing explicit inspiration from established financial underwriting and risk-management practices. No equations, fitted parameters, or derived quantities appear in the provided text. The central claim—that a contractual compensation layer can shift trust to an explicit product guarantee—rests on the premise that agent risks are stochastic and product-level, which is stated as an assumption rather than derived from any self-referential definition or prior author work. The simulation study is presented as illustrative evidence of social benefits, not as a proof that reduces to fitted inputs or self-citations. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the domain assumption that technical model safeguards are inherently insufficient for stochastic agent behavior and introduces ARS as a new contractual construct without external validation data in the abstract.

axioms (1)
  • domain assumption Agent behavior is inherently stochastic and risks cannot be eliminated by technical safeguards alone.
    Explicitly stated in the abstract as the reason a complementary risk-management layer is needed.
invented entities (1)
  • Agentic Risk Standard (ARS) no independent evidence
    purpose: A payment settlement standard that integrates risk assessment, underwriting, and compensation for AI-mediated transactions.
    Newly defined framework presented without independent empirical support or external benchmarks in the abstract.

pith-pipeline@v0.9.0 · 5538 in / 1298 out tokens · 33717 ms · 2026-05-13T17:13:31.198967+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 11 internal anchors

  1. [1]

    Impact of model interpretability and outcome feedback on trust in ai

    Daehwan Ahn, Abdullah Almaatouq, Monisha Gulabani, and Kartik Hosanagar. Impact of model interpretability and outcome feedback on trust in ai. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–25,

  2. [2]

    Rest meets react: Self-improvement for multi-step reasoning llm agent,

    Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, et al. Rest meets react: Self-improvement for multi-step reasoning llm agent.arXiv preprint arXiv:2312.10003,

  3. [3]

    Frontier AI regulation: Managing emerging risks to public safety.arXiv preprint arXiv:2307.03718,

    Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittlestone, Sha- har Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, et al. Frontier ai regulation: Managing emerging risks to public safety.arXiv preprint arXiv:2307.03718,

  4. [4]

    arXiv preprint arXiv:2404.14082 (2024)

    Leonard Bereska and Efstratios Gavves. Mechanistic interpretability for ai safety–a review.arXiv preprint arXiv:2404.14082,

  5. [5]

    Towards llm-guided causal explainability for black-box text classifiers.arXiv preprint arXiv:2309.13340,

    Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, and Huan Liu. Towards llm-guided causal explainability for black-box text classifiers.arXiv preprint arXiv:2309.13340,

  6. [6]

    Llms for explainable ai: A comprehensive survey, 2025

    Ahsan Bilal, David Ebert, and Beiyu Lin. Llms for explainable ai: A comprehensive survey.arXiv preprint arXiv:2504.00125,

  7. [7]

    Towards implicit bias detection and mitigation in multi-agent llm interactions

    Angana Borah and Rada Mihalcea. Towards implicit bias detection and mitigation in multi-agent llm interactions. InFindings of the Association for Computational Linguistics: EMNLP 2024, pp. 9306–9326,

  8. [8]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,

    26 Financial Risk Management for Trustworthy AI Agents Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,

  9. [9]

    com/news/articles/c62n410w5yno

    Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, and Wenjie Wang. A trajectory-based safety audit of clawdbot (openclaw).arXiv preprint arXiv:2602.14364,

  10. [10]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261,

  11. [11]

    Composerx: Multi-agent symbolic music composition with llms,

    Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, et al. Composerx: Multi-agent symbolic music composition with llms.arXiv preprint arXiv:2404.18081,

  12. [12]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp. 4171–4186,

  13. [13]

    Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He

    Han Ding, Yinheng Li, Junhao Wang, and Hang Chen. Large language model agent in financial trading: A survey.arXiv preprint arXiv:2408.06361,

  14. [14]

    The paradox of stochasticity: Limited creativity and computational decoupling in temperature-varied llm outputs of structured fictional data.arXiv preprint arXiv:2502.08515,

    Evgenii Evstafev. The paradox of stochasticity: Limited creativity and computational decoupling in temperature-varied llm outputs of structured fictional data.arXiv preprint arXiv:2502.08515,

  15. [15]

    Safethinker: Reasoning about risk to deepen safety beyond shallow alignment

    Xianya Fang, Xianying Luo, Yadong Wang, Xiang Chen, Yu Tian, Zequn Sun, Rui Liu, Jun Fang, Naiqiang Tan, Yuanning Cui, et al. Safethinker: Reasoning about risk to deepen safety beyond shallow alignment. arXiv preprint arXiv:2601.16506,

  16. [16]

    Faithful ex- planations of black-box nlp models using llm-generated counterfactuals.arXiv preprint arXiv:2310.00603,

    Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma, and Roi Reichart. Faithful ex- planations of black-box nlp models using llm-generated counterfactuals.arXiv preprint arXiv:2310.00603,

  17. [17]

    Mart: Improving llm safety with multi-round automatic red-teaming

    Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, and Yuning Mao. Mart: Improving llm safety with multi-round automatic red-teaming. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 1927–1937,

  18. [18]

    Mcp- agentbench: Evaluating real-world language agent performance with mcp-mediated tools.arXiv preprint arXiv:2509.09734,

    Zikang Guo, Benfeng Xu, Chiwei Zhu, Wentao Hong, Xiaorui Wang, and Zhendong Mao. Mcp- agentbench: Evaluating real-world language agent performance with mcp-mediated tools.arXiv preprint arXiv:2509.09734,

  19. [19]

    Ai regulation in europe: from the ai act to future regulatory challenges.arXiv preprint arXiv:2310.04072,

    Philipp Hacker. Ai regulation in europe: from the ai act to future regulatory challenges.arXiv preprint arXiv:2310.04072,

  20. [20]

    Memory in the Age of AI Agents

    Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564,

  21. [21]

    Memory os of ai agent

    Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of ai agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 25972–25981,

  22. [22]

    doi:10.48550/arXiv.2406.06469

    Joongwon Kim, Bhargavi Paranjape, Tushar Khot, and Hannaneh Hajishirzi. Husky: A unified, open- source language agent for multi-step reasoning.arXiv preprint arXiv:2406.06469,

  23. [23]

    Certifying llm safety against adversarial prompting

    Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705,

  24. [24]

    Decoding biases: Automated methods and llm judges for gender bias detection in language models.arXiv preprint arXiv:2408.03907,

    Shachi H Kumar, Saurav Sahay, Sahisnu Mazumder, Eda Okur, Ramesh Manuvinakurike, Nicole Beckage, Hsuan Su, Hung-yi Lee, and Lama Nachman. Decoding biases: Automated methods and llm judges for gender bias detection in language models.arXiv preprint arXiv:2408.03907,

  25. [25]

    Cryptotrade: A reflective llm-based agent to guide zero-shot cryptocurrency trading

    Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, and Bingsheng He. Cryptotrade: A reflective llm-based agent to guide zero-shot cryptocurrency trading. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 1094–1106,

  26. [26]

    Slidegen: Collaborative multimodal agents for scientific slide generation.arXiv preprint arXiv:2512.04529, 2025

    Xin Liang, Xiang Zhang, Yiwei Xu, Siqi Sun, and Chenyu You. Paper2slide: A multi-agent framework for automatic scientific slide generation. Xin Liang, Xiang Zhang, Yiwei Xu, Siqi Sun, and Chenyu You. Slidegen: Collaborative multimodal agents for scientific slide generation.arXiv preprint arXiv:2512.04529,

  27. [27]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437,

  28. [28]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection attack against llm-integrated applications.arXiv preprint arXiv:2306.05499,

  29. [29]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692,

  30. [30]

    arXiv preprint arXiv:2404.11584 , year=

    28 Financial Risk Management for Trustworthy AI Agents Tula Masterman, Sandi Besen, Mason Sawtell, and Alex Chao. The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584,

  31. [31]

    arXiv preprint arXiv:2501.09967 , year=

    Fuseini Mumuni and Alhassan Mumuni. Explainable artificial intelligence (xai): from inherent explain- ability to large language models.arXiv preprint arXiv:2501.09967,

  32. [32]

    Mobileflow: A multimodal llm for mobile gui agent.arXiv preprint arXiv:2407.04346, 2024

    Songqin Nong, Jiali Zhu, Rui Wu, Jiongchao Jin, Shuo Shan, Xiutian Huang, and Wenhao Xu. Mobileflow: A multimodal llm for mobile gui agent.arXiv preprint arXiv:2407.04346,

  33. [33]

    Investorbench: A benchmark for financial decision-making tasks with llm-based agent

    Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, et al. When agents trade: Live multi-market trading benchmark for llm agents.arXiv preprint arXiv:2510.11695,

  34. [34]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis.arXiv preprint arXiv:2307.16789,

  35. [35]

    A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

    Pascal J Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F Grewe, and Thilo Stadelmann. A comprehensive survey of agents for computer use: Foundations, challenges, and future directions.arXiv preprint arXiv:2501.16150,

  36. [36]

    Enhancing

    Sivan Schwartz, Avi Yaeli, and Segev Shlomov. Enhancing trust in llm-based ai automation agents: New considerations and future challenges.arXiv preprint arXiv:2308.05391,

  37. [37]

    Rethinking interpretability in the era of large language models.arXiv preprint arXiv:2402.01761, 2024

    Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, and Jianfeng Gao. Rethinking interpretability in the era of large language models.arXiv preprint arXiv:2402.01761,

  38. [38]

    Autoagent: A fully-automated and zero-code framework for llm agents.arXiv preprint arXiv:2502.05957,

    Jiabin Tang, Tianyu Fan, and Chao Huang. Autoagent: A fully-automated and zero-code framework for llm agents.arXiv preprint arXiv:2502.05957,

  39. [39]

    Adversarial preference learning for robust llm alignment

    Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, et al. Adversarial preference learning for robust llm alignment. In Findings of the Association for Computational Linguistics: ACL 2025, pp. 21865–21881,

  40. [40]

    arXiv preprint arXiv:2412.20138 , year =

    29 Financial Risk Management for Trustworthy AI Agents Yijia Xiao, Edward Sun, Di Luo, and Wei Wang. Tradingagents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138,

  41. [41]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, and Dahua Lin. Shadow alignment: The ease of subverting safely-aligned language models.ar...

  42. [42]

    A survey of ai agent protocols.arXiv preprint arXiv:2504.16736,

    Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li, Junwei Liao, Haoyi Hu, Jianghao Lin, Gaowei Chang, et al. A survey of ai agent protocols.arXiv preprint arXiv:2504.16736, 2025b. Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, and Jiang Bian. Musicagent: An ai agent for music understanding and generatio...

  43. [43]

    Large language model-brained gui agents: A survey,

    Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, et al. Large language model-brained gui agents: A survey.arXiv preprint arXiv:2411.18279,

  44. [44]

    PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs

    Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–20, 2025a. Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, and Chenyu You. Postergen: Aesthetic-aware paper-to- poster ...

  45. [45]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043,