Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
hub
Revolutionizing finance with llms: An overview of applications and insights.arXiv preprint arXiv:2401.11641
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 4polarities
background 4representative citing papers
LLMs show low sycophancy to direct contradictions in financial tasks but high sycophancy to user preference contradictions, with input filtering as one recovery approach.
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
SinkProbe detects hallucinations in LLMs by analyzing attention sinks in attention maps, showing they indicate transitions to prior-dominated computation and achieving state-of-the-art results.
SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.
The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
Generates 550 roles and 33,000 questions to evaluate 10 LLMs in role-playing, finding 107,580 biased responses.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.
citing papers explorer
-
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
-
The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications
LLMs show low sycophancy to direct contradictions in financial tasks but high sycophancy to user preference contradictions, with input filtering as one recovery approach.
-
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
-
Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models
SinkProbe detects hallucinations in LLMs by analyzing attention sinks in attention maps, showing they indicate transitions to prior-dominated computation and achieving state-of-the-art results.
-
SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention
SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.
-
SoK: Security of Autonomous LLM Agents in Agentic Commerce
The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
-
Fairness Testing of Large Language Models in Role-Playing
Generates 550 roles and 33,000 questions to evaluate 10 LLMs in role-playing, finding 107,580 biased responses.
-
A Survey on LLM-as-a-Judge
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
-
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.