FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
hub Canonical reference
Tradingagents: Multi-agents llm financial trading framework
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
fields
cs.AI 7 cs.CE 3 cs.CL 2 cs.LG 2 cs.CR 1 cs.DC 1 cs.MA 1 physics.chem-ph 1 q-fin.PR 1 q-fin.ST 1roles
background 6polarities
background 6representative citing papers
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.
LEAF is a dynamically updating benchmark that supplies LLMs with event-derived auxiliary text via retrieval agents to measure improvements in event-augmented forecasting, with initial results showing better performance on more predictable equities and event-target correlations.
ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
MadEvolve uses LLMs for evolutionary optimization of trading strategies and reports significant backtest improvements on Bitcoin tasks including signal feature evolution and joint strategy optimization.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
AlphaQuanter introduces a single-agent tool-augmented RL framework for stock trading that learns dynamic policies over a transparent decision workflow and reports state-of-the-art financial metrics.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
-
AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
-
Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading
Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.
-
LEAF: A Living Benchmark for Event-Augmented Forecasting
LEAF is a dynamically updating benchmark that supplies LLMs with event-derived auxiliary text via retrieval agents to measure improvements in event-augmented forecasting, with initial results showing better performance on more predictable equities and event-target correlations.
-
Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems
ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.
-
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
-
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
-
Large Language Model Agent for User-friendly Chemical Process Simulations
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
-
TokenCake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications
TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
-
Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain
CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.
-
Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
-
The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
-
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
-
MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models
MadEvolve uses LLMs for evolutionary optimization of trading strategies and reports significant backtest improvements on Bitcoin tasks including signal feature evolution and joint strategy optimization.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
-
AlphaQuanter: An End-to-End Tool-Augmented Agentic Reinforcement Learning Framework for Stock Trading
AlphaQuanter introduces a single-agent tool-augmented RL framework for stock trading that learns dynamic policies over a transparent decision workflow and reports state-of-the-art financial metrics.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.