pith. machine review for the scientific record. sign in

arxiv: 2604.09056 · v1 · submitted 2026-04-10 · 💻 cs.CR · cs.CE

Recognition: 2 theorem links

· Lean Theorem

Conversations Risk Detection LLMs in Financial Agents via Multi-Stage Generative Rollout

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:51 UTC · model grok-4.3

classification 💻 cs.CR cs.CE
keywords financial dialogue securityLLM risk detectionmulti-stage frameworkagent safetyregulatory complianceconversational AIcybersecurity
0
0 comments X

The pith

FinSec is a four-tier framework that detects financial risks in LLM agent dialogues through staged pattern analysis, risk inference, semantic checks, and integrated decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models in financial services create conversations that can evolve across turns and involve complex rules, making single-rule or single-dimension checks insufficient. The paper presents FinSec as a structured alternative that breaks detection into four linked stages to catch actual risks in an interpretable way. This design aims to lower unsafe outputs while preserving the agent's normal function. Experiments on financial dialogues show clear gains over prior methods in detection metrics.

Core claim

FinSec is a four-tier security detection framework for financial agents that incorporates suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, and integrated risk-based decision-making to enable structured, interpretable, and end-to-end identification of actual financial risks in multi-turn dialogues.

What carries the argument

The four-tier security detection framework (suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, integrated risk-based decision-making) that structures the detection process.

If this is right

  • Overall risk detection reaches an F1 score of 90.13 percent, six to fourteen points above baseline models.
  • Attack success rate on unsafe outputs falls to 9.09 percent.
  • Area under the precision-recall curve rises to 0.9189, roughly a 9.7 percent lift.
  • The composite utility-safety score reaches 0.9098 while robustness against high-risk exchanges improves.
  • The staged structure supplies explicit reasoning steps for each risk judgment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same staged rollout pattern could be tested on dialogue safety in other regulated sectors such as medical or legal agents.
  • Real-time deployment would require checking whether the four stages add acceptable latency to live financial conversations.
  • The framework's emphasis on delayed-risk inference suggests it may handle adversarial prompt chains better than single-pass classifiers.

Load-bearing premise

Existing single-dimensional checks are inadequate for evolving multi-turn dialogues, and the four new stages can be built and run without creating fresh failure modes.

What would settle it

A controlled test set of financial dialogues in which FinSec either misses risks caught by baselines or flags safe exchanges as risky at rates that erase its reported F1, ASR, and AUPRC gains.

Figures

Figures reproduced from arXiv: 2604.09056 by Jun Wu, Xiaotong Jiang.

Figure 1
Figure 1. Figure 1: Overview of the Financial Agent Risk Detection Framework [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed architecture of the FinSec framework. The system operates through a hierarchical data flow: (1) SAR Pattern Detection for structured compliance checking; (2) Deferred Risk Assessment via generative rollout; (3) Semantic Safety Assessment using deep audit models; and (4) Risk Fusion for the final calibrated decision RFinSec(I). small-sample simulations to reveal potential high-risk trajec￾tories. W… view at source ↗
Figure 3
Figure 3. Figure 3: Performance Comparison of 10 LLM Models on Financial Security Risk Assessment [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance Sensitivity and Optimal Weight Determination for [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trade-off analysis and capability balance under different weights [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance Evaluation: Trade-off Analysis and Final Verdict: (a) [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance Trade-off: Injection and Unintended Risks [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

With the rapid adoption of large language models (LLMs) in financial service scenarios, dialogue security detection under high regulatory risk presents significant challenges. Existing methods mainly rely on single-dimensional semantic judgments or fixed rules, making them inadequate for handling multi-turn semantic evolution and complex regulatory clauses; moreover, they lack models specifically designed for financial security detection. To address these issues, this paper proposes FinSec, a four-tier security detection framework for financial agent. FinSec enables structured, interpretable, and end-to-end identification of actual financial risks, incorporating suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, and integrated risk-based decision-making. Notably, FinSec significantly enhances the robustness of high-risk dialogue detection while maintaining model utility. Experimental results demonstrate FinSec's leading performance. In terms of overall detection capability, FinSec achieves an F1 score of 90.13%, improving upon baseline models by 6--14 percentage points; its ASR is reduced to 9.09%, markedly lowering the probability of unsafe outputs; and the AUPRC increases to 0.9189 -- an approximate 9.7% gain over general frameworks. Additionally, in balancing utility and safety, FinSec obtains a composite score of 0.9098, delivering robust and efficient protection for financial agent dialogues.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FinSec, a four-tier security detection framework for LLM-based financial agents. The tiers are suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, and integrated risk-based decision-making. It claims this structure enables structured, interpretable, end-to-end risk identification in multi-turn dialogues, yielding an F1 score of 90.13% (6-14 point improvement over baselines), ASR reduced to 9.09%, AUPRC of 0.9189 (approximately 9.7% gain), and a composite utility-safety score of 0.9098.

Significance. If the performance claims are supported by reproducible experiments on representative data, the work could provide a useful multi-stage approach for handling semantic evolution and regulatory complexity in high-risk financial dialogues, where single-dimensional methods fall short. The emphasis on interpretability and balanced utility-safety is relevant to deployed financial agents.

major comments (2)
  1. [Experimental Results] Experimental Results section: the central performance claims (F1=90.13%, ASR=9.09%, AUPRC=0.9189, 6-14 point gains) are stated without any description of dataset source/size, labeling protocol, train/test split, baseline re-implementations (including prompts and temperatures), or statistical significance tests. This directly undermines attribution of the reported lifts to the four-tier structure rather than evaluation choices.
  2. [Methodology] Methodology / Framework Description: the four-tier generative rollout is presented at a high level with no analysis of whether the added stages introduce new failure modes (e.g., error propagation across inference steps) or require undisclosed data curation, which is load-bearing for the claim that the framework is robust and adequate for complex regulatory clauses.
minor comments (2)
  1. [Abstract] Abstract: the phrasing 'leading performance' and specific metric values should be accompanied by the names and scores of the compared baselines to allow immediate assessment.
  2. The manuscript title in the submission metadata does not match the abstract's focus on FinSec; ensure consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The two major comments identify important gaps in experimental transparency and methodological analysis that we will address through targeted revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results section: the central performance claims (F1=90.13%, ASR=9.09%, AUPRC=0.9189, 6-14 point gains) are stated without any description of dataset source/size, labeling protocol, train/test split, baseline re-implementations (including prompts and temperatures), or statistical significance tests. This directly undermines attribution of the reported lifts to the four-tier structure rather than evaluation choices.

    Authors: We agree that the Experimental Results section as currently written lacks the necessary details for reproducibility and for clearly attributing improvements to the four-tier design. In the revised manuscript we will insert a new 'Experimental Setup' subsection that specifies the dataset source and size, the labeling protocol (including inter-annotator agreement), the train/test split ratio, the exact prompts and temperature settings used for all baselines, and the statistical significance tests performed (with p-values). These additions will allow readers to evaluate whether the reported gains are attributable to FinSec rather than evaluation choices. revision: yes

  2. Referee: [Methodology] Methodology / Framework Description: the four-tier generative rollout is presented at a high level with no analysis of whether the added stages introduce new failure modes (e.g., error propagation across inference steps) or require undisclosed data curation, which is load-bearing for the claim that the framework is robust and adequate for complex regulatory clauses.

    Authors: We acknowledge that the current high-level description does not explicitly examine potential new failure modes such as error propagation or the data curation steps required by the staged architecture. In the revision we will add a dedicated 'Robustness and Failure-Mode Analysis' paragraph within the Methodology section. This paragraph will discuss (i) mechanisms intended to limit error propagation (e.g., delayed inference and cross-tier consistency checks), (ii) the data curation process used for each tier, and (iii) observed failure cases from our experiments. The addition will provide a more balanced assessment of the framework's suitability for regulatory financial dialogues. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical metrics reported without self-referential derivation

full rationale

The paper proposes FinSec as a four-tier framework and states experimental outcomes (F1 90.13%, ASR 9.09%, AUPRC 0.9189) as measured results. No equations, fitted parameters, or derivation steps are shown that reduce these metrics to the framework definition by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked to force the claims. The central performance assertions remain independent empirical statements rather than tautological outputs of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework is described at the level of named stages without mathematical formulation or assumptions stated.

pith-pipeline@v0.9.0 · 5534 in / 1264 out tokens · 33628 ms · 2026-05-10T17:51:42.363195+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 20 canonical work pages · 5 internal anchors

  1. [1]

    Fn-agents: Analysis of exchange rate volatility prediction based on multi-agent systems,

    Z. Guo, W. Guo, Q. Li, Y . Zou, and J. Cai, “Fn-agents: Analysis of exchange rate volatility prediction based on multi-agent systems,” in2024 5th International Conference on Computers and Artificial Intelligence Technology (CAIT), 2024, pp. 350–355

  2. [2]

    Microsoft 365 copilot for finance: Release wave 2 plan,

    Microsoft, “Microsoft 365 copilot for finance: Release wave 2 plan,” Microsoft Dynamics 365 Release Plan, Oct. 2024

  3. [3]

    Morgan stanley wealth management: Ai@ms debrief case study,

    Celent, “Morgan stanley wealth management: Ai@ms debrief case study,” Celent Industry Research Report, Jun. 2025

  4. [4]

    Prompt- to-sql injections in llm-integrated web applications: Risks and defenses,

    R. Pedro, M. E. Coimbra, D. Castro, P. Carreira, and N. Santos, “Prompt- to-sql injections in llm-integrated web applications: Risks and defenses,” in2025 IEEE/ACM 47th International Conference on Software Engi- neering (ICSE), 2025, pp. 1768–1780

  5. [6]

    Prompt Injection attack against LLM-integrated Applications

    Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, and Y . Liu, “Prompt injection attack against llm-integrated applications,” 2024. [Online]. Available: https://arxiv.org/abs/2306.05499

  6. [7]

    Attack and defense techniques in large language models: A survey and new perspectives,

    Z. Liao, K. Chen, Y . Lin, K. Li, Y . Liu, H. Chen, X. Huang, and Y . Yu, “Attack and defense techniques in large language models: A survey and new perspectives,” 2025. [Online]. Available: https://arxiv.org/abs/2505.00976

  7. [8]

    Decodingtrust: a comprehensive assessment of trustworthiness in gpt models,

    B. Wang, W. Chen, H. Pei, C. Xie, M. Kang, C. Zhang, C. Xu, Z. Xiong, R. Dutta, R. Schaeffer, S. T. Truong, S. Arora, M. Mazeika, D. Hendrycks, Z. Lin, Y . Cheng, S. Koyejo, D. Song, and B. Li, “Decodingtrust: a comprehensive assessment of trustworthiness in gpt models,” inProceedings of the 37th International Conference on Neural Information Processing S...

  8. [9]

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y . Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnofsky, P. Chris...

  9. [10]

    Safety-tuned LLaMAs: Lessons from improving the safety of large language models that follow instructions,

    F. Bianchi, M. Suzgun, G. Attanasio, P. Rottger, D. Jurafsky, T. Hashimoto, and J. Zou, “Safety-tuned LLaMAs: Lessons from improving the safety of large language models that follow instructions,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https: //openreview.net/forum?id=gT5hALch9z

  10. [11]

    Identifying the risks of lm agents with an lm-emulated sandbox,

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” inThe Twelfth International Conference on Learning Representations, 2024

  11. [12]

    Multitask-based evaluation of open- source llm on software vulnerability,

    X. Yin, C. Ni, and S. Wang, “Multitask-based evaluation of open- source llm on software vulnerability,”IEEE Transactions on Software Engineering, vol. 50, no. 11, pp. 3071–3087, 2024

  12. [13]

    R-judge: Benchmarking safety risk awareness for llm agents,

    T. Yuan, Z. He, L. Dong, Y . Wang, R. Zhao, T. Xia, L. Xu, B. Zhou, F. Li, Z. Zhang, R. Wang, and G. Liu, “R-judge: Benchmarking safety risk awareness for llm agents,”arXiv preprint arXiv:2401.10019, 2024

  13. [14]

    Agent Smith : A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast , June 2024

    X. Gu, X. Zheng, T. Pang, C. Du, Q. Liu, Y . Wang, J. Jiang, and M. Lin, “Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast,”arXiv preprint arXiv:2402.08567, 2024

  14. [15]

    Why Do Multi-Agent LLM Systems Fail?

    M. Cemri, M. Z. Pan, S. Yang, L. A. Agrawal, B. Chopra, R. Tiwari, K. Keutzer, A. Parameswaran, D. Klein, K. Ramchandranet al., “Why do multi-agent llm systems fail?”arXiv preprint arXiv:2503.13657, 2025

  15. [16]

    Combining domain and alignment vectors provides better knowledge-safety trade-offs in LLMs,

    M. Thakkar, Q. Fournier, M. Riemer, P.-Y . Chen, A. Zouaq, P. Das, and S. Chandar, “Combining domain and alignment vectors provides better knowledge-safety trade-offs in LLMs,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vienna, Austria: Association for Computational Linguistics, Jul....

  16. [17]

    The finben: An holistic financial benchmark for large language models,

    Q. Xie, W. Han, Z. Chen, R. Xiang, X. Zhang, Y . He, M. Xiao, D. Li, Y . Dai, D. Feng, Y . Xu, H. Kang, Z. Kuang, C. Yuan, K. Yang, Z. Luo, T. Zhang, Z. Liu, G. Xiong, Z. Deng, Y . Jiang, Z. Yao, H. Li, Y . Yu, G. Hu, J. Huang, X.-Y . Liu, A. Lopez-Lira, B. Wang, Y . Lai, H. Wang, M. Peng, S. Ananiadou, and J. Huang, “The finben: An holistic financial ben...

  17. [18]

    Finvet: A collaborative framework of rag and external fact-checking agents for financial misinformation detection,

    D. B. Araya and D. Liao, “Finvet: A collaborative framework of rag and external fact-checking agents for financial misinformation detection,” 2025. [Online]. Available: https://arxiv.org/abs/2510.11654

  18. [19]

    Finrobot: An open- source ai agent platform for financial applications using large language models, 2024

    H. Yang, B. Zhang, N. Wang, C. Guo, X. Zhang, L. Lin, J. Wang, T. Zhou, M. Guan, R. Zhang, and C. D. Wang, “Finrobot: An open- source ai agent platform for financial applications using large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.14767

  19. [20]

    Financebench: A new benchmark for financial question answering,

    P. Islam, A. Kannappan, D. Kiela, R. Qian, N. Scherrer, and B. Vidgen, “Financebench: A new benchmark for financial question answering,”

  20. [21]
  21. [22]

    Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E

    Z. Chen, J. Chen, J. Chen, and M. Sra, “Standard benchmarks fail – auditing llm agents in finance must prioritize risk,” 2025. [Online]. Available: https://arxiv.org/abs/2502.15865

  22. [23]

    Attacks, defenses and evaluations for LLM conversation safety: A survey,

    Z. Dong, Z. Zhou, C. Yang, J. Shao, and Y . Qiao, “Attacks, defenses and evaluations for LLM conversation safety: A survey,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gomez, and S. Bethard, Eds. Mexico City, Mexico: As...

  23. [24]

    Available: http://dx.doi.org/10.1145/3773080

    F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of llm agent: A survey with case studies,” ACM Comput. Surv., vol. 58, no. 6, Dec. 2025. [Online]. Available: https://doi.org/10.1145/3773080

  24. [25]

    URL https: //doi.org/10.1145/3605764.3623985

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection,” ser. AISec ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 79–90. [Online]. Available: https://doi.org/10.1145/3605764.3623985

  25. [26]

    Jailbroken: how does llm safety training fail?

    A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: how does llm safety training fail?” inProceedings of the 37th International Conference on Neural Information Processing Systems, ser. NIPS ’23. Red Hook, NY , USA: Curran Associates Inc., 2023

  26. [27]

    arXiv preprint arXiv:2307.04657 , year =

    J. Ji, M. Liu, J. Dai, X. Pan, C. Zhang, C. Bian, C. Zhang, R. Sun, Y . Wang, and Y . Yang, “Beavertails: Towards improved safety alignment of llm via a human-preference dataset,” 2023. [Online]. Available: https://arxiv.org/abs/2307.04657

  27. [28]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, and J. Kaplan, ...

  28. [29]

    Prompt- shield: Deployable detection for prompt injection attacks,

    D. Jacob, H. Alzahrani, Z. Hu, B. Alomair, and D. Wagner, “Prompt- shield: Deployable detection for prompt injection attacks,” inProceed- ings of the Fifteenth ACM Conference on Data and Application Security and Privacy, ser. CODASPY ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 341–352

  29. [30]

    Attention tracker: Detecting prompt injection attacks in LLMs,

    K.-H. Hung, C.-Y . Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y . Chen, “Attention tracker: Detecting prompt injection attacks in LLMs,” inFindings of the Association for Computational Linguistics: NAACL 2025, L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Computational JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 21, N...

  30. [31]

    Safety reasoning elicitation alignment for multi-turn dialogues,

    M. Kuo, J. Zhang, A. Ding, L. DiValentin, A. Hass, B. F. Morris, I. Jacobson, R. Linderman, J. Kiessling, N. Ramos, B. Gopal, M. B. Pouyan, C. Liu, H. Li, and Y . Chen, “Safety reasoning elicitation alignment for multi-turn dialogues,” 2025. [Online]. Available: https://arxiv.org/abs/2506.00668

  31. [32]

    Improving multi-agent debate with sparse communication topology,

    Y . Li, Y . Du, J. Zhang, L. Hou, P. Grabowski, Y . Li, and E. Ie, “Improving multi-agent debate with sparse communication topology,” inFindings of the Association for Computational Linguistics: EMNLP 2024, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 7281–

  32. [33]

    Available: https://aclanthology.org/2024.findings-emnlp

    [Online]. Available: https://aclanthology.org/2024.findings-emnlp. 427/

  33. [34]

    The truth becomes clearer through debate! multi-agent systems with large language models unmask fake news,

    Y . Liu, Y . Liu, X. Zhang, X. Chen, and R. Yan, “The truth becomes clearer through debate! multi-agent systems with large language models unmask fake news,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 50...

  34. [35]

    Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,

    P. Manakul, A. Liusie, and M. J. F. Gales, “Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,”

  35. [36]
  36. [37]

    Detecting hallucinations in large language models using semantic entropy,

    S. Farquhar, J. Kossen, L. Kuhn, and Y . Gal, “Detecting hallucinations in large language models using semantic entropy,”Nature, vol. 630, pp. 625–630, 2024. [Online]. Available: https://doi.org/10.1038/ s41586-024-07421-0

  37. [38]

    Pii-bench: Evaluating query-aware privacy protection systems.arXiv preprint arXiv:2502.18545, 2025

    H. Shen, Z. Gu, H. Hong, and W. Han, “Pii-bench: Evaluating query-aware privacy protection systems,” 2025. [Online]. Available: https://arxiv.org/abs/2502.18545

  38. [39]

    Can llms learn new concepts incrementally without forgetting?

    J. Zheng, S. Qiu, and Q. Ma, “Can llms learn new concepts incrementally without forgetting?” 2024. [Online]. Available: https: //arxiv.org/abs/2402.08526

  39. [40]

    The trust paradox in llm-based multi-agent systems: When collaboration becomes a security vulnerability,

    Z. Xu, M. Qi, S. Wu, L. Zhang, Q. Wei, H. He, and N. Li, “The trust paradox in llm-based multi-agent systems: When collaboration becomes a security vulnerability,” 2025. [Online]. Available: https: //arxiv.org/abs/2510.18563

  40. [41]

    On sampling techniques for corporate credit scoring,

    H. B. Nguyen and V .-N. Huynh, “On sampling techniques for corporate credit scoring,”Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 24, no. 1, pp. 48–57, 2020

  41. [42]

    On adaptive attacks to adversarial example defenses,

    F. Tramèr, N. Carlini, W. Brendel, and A. M ˛ adry, “On adaptive attacks to adversarial example defenses,” inProceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS ’20. Red Hook, NY , USA: Curran Associates Inc., 2020