hub Canonical reference

Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang

Tradingagents: Multi-agents llm financial trading framework , author= · 2024 · arXiv 2412.20138

Canonical reference. 100% of citing Pith papers cite this work as background.

38 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 38 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.

AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution

cs.AI · 2026-06-28 · reject · novelty 7.0

Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection

cs.CE · 2026-05-09 · unverdicted · novelty 7.0

AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.

Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.

CSTrader: A Testbed for Language-Grounded Trading in a Community-Driven Virtual Asset Market

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

CSTrader is a multi-agent LLM trading system for CS2 skins that outperforms a -15.62% market index and single-prompt baselines with up to 7.58% returns by using specialized agents for liquidity, sentiment reversal, and risk control.

A Systematic Approach to Multi-Agent AI from Advanced Regulatory Control Theory: Safe and Auditable LLM Operator Agents for Process Control

eess.SY · 2026-06-29 · unverdicted · novelty 6.0

ARC-derived multi-agent LLM framework for safe, auditable process control with operator agents and deterministic orchestrator, evaluated on dairy ventilation.

The Interplay of Harness Design and Post-Training in LLM Agents

cs.LG · 2026-06-24 · unverdicted · novelty 6.0

Harness-aware post-training of LLM agents improves both in-distribution performance and robustness to out-of-distribution tool environment shifts, while minimal harness designs cause large drops under shifts.

Harnessing Generalist Agents for Contextualized Time Series

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

TimeClaw is a framework that augments LLM agents with temporal tools, capability evolution, and episodic memory to enable contextualized time series reasoning, with reported gains on benchmarks across energy, finance, weather, and traffic.

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

MechSim is a mechanism-grounded framework that represents simulators via structured schemas and uses constrained LLM agents to generate evidence-based explanations linking outcomes to underlying mechanisms.

Large Language Models Hack Rewards, and Society

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

LLMs discover regulatory loopholes in simulated societal environments through reward hacking during RL training.

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.

LEAF: A Living Benchmark for Event-Augmented Forecasting

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

LEAF is a dynamically updating benchmark that supplies LLMs with event-derived auxiliary text via retrieval agents to measure improvements in event-augmented forecasting, with initial results showing better performance on more predictable equities and event-target correlations.

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

cs.AI · 2026-04-05 · unverdicted · novelty 6.0

The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.

Large Language Model Agent for User-friendly Chemical Process Simulations

physics.chem-ph · 2026-01-15 · unverdicted · novelty 6.0

An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.

TokenCake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications

cs.DC · 2025-10-21 · unverdicted · novelty 6.0

TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.

Scheming Ability in LLM-to-LLM Strategic Interactions

cs.CL · 2025-10-11 · conditional · novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.

Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

cs.CL · 2025-09-07 · unverdicted · novelty 6.0

CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

cs.AI · 2026-06-10 · unverdicted · novelty 5.0

MoCA-Agent decomposes questions into typed atomic claims, clears them via trader-agent markets into confidence-weighted decisions, synthesizes and verifies executable Python code, and reports strong benchmark scores including 78.3% on FinQA.

Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems

cs.AI · 2026-05-23 · unverdicted · novelty 5.0

MRC computes coalition Shapley credits from performance histories to weight three LLM agents, stabilized by Bayesian mixture and regime multipliers, achieving SR 1.51 and 440.1% cumulative return over 1037 days on 13 crypto assets.

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

LLM trading agents show detectable pre-failure signatures in planning embeddings and fused risk representations, with structured risk feedback acting as a partial alignment signal without fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Scheming Ability in LLM-to-LLM Strategic Interactions cs.CL · 2025-10-11 · conditional · none · ref 48
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.

Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer