Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation
Pith reviewed 2026-05-23 02:51 UTC · model grok-4.3
The pith
CausalGANs augmented by reinforcement learning generate synthetic bond yields that let a fine-tuned LLM issue trading signals with 0.103 mean absolute error.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics. The overall framework improves forecasting performance over existing methods, with statistical validation via predictive accuracy, MAE evaluation (0.103 percent), profit/loss evaluation (60 percent profit rate), LLM evaluation (3.37 out of 5) and expert assessments scoring 4.67 out of 5.
What carries the argument
Causal Generative Adversarial Networks (CausalGANs) combined with Soft Actor-Critic (SAC) reinforcement learning to produce synthetic bond yields conditioned on twelve macroeconomic variables while preserving statistical fidelity and causal structure.
If this is right
- The RL-enhanced generator attains the lowest reported MAE of 0.103 percent across the four bond categories.
- Back-tested signals from the LLM achieve a 60 percent profit rate.
- LLM-based evaluation of the generated signals scores 3.37 out of 5.
- Human expert review of the full pipeline scores 4.67 out of 5.
- The approach supplies a scalable synthetic-data pipeline for risk, volatility and investment decisions.
Where Pith is reading between the lines
- If the causal structure is faithfully reproduced, the same generator could be driven with altered macroeconomic inputs to simulate stress scenarios without collecting new market data.
- Periodic retraining on live macro releases could allow the LLM signals to operate in production while maintaining the reported error levels.
- The conditioning approach might transfer to other sparsely observed asset classes where causal macro linkages are similarly strong.
Load-bearing premise
The synthetic yields produced by CausalGANs and SAC preserve the statistical and causal relationships present in real bond markets when conditioned on the twelve macroeconomic variables.
What would settle it
Running the LLM trading signals derived from the synthetic generator on a fresh out-of-sample window of actual bond prices and observing that the resulting MAE or profit rate falls below the reported 0.103 and 60 percent figures would falsify the performance claim.
Figures
read the original abstract
Financial bond yield forecasting is challenging due to data scarcity, nonlinear macroeconomic dependencies, and evolving market conditions. In this paper, we propose a novel framework that leverages Causal Generative Adversarial Networks (CausalGANs) and Soft Actor-Critic (SAC) reinforcement learning (RL) to generate high-fidelity synthetic bond yield data for four major bond categories (AAA, BAA, US10Y, Junk). By incorporating 12 key macroeconomic variables, we ensure statistical fidelity by preserving essential market properties. To transform this market dependent synthetic data into actionable insights, we employ a finetuned Large Language Model (LLM) Qwen2.5-7B that generates trading signals (BUY/HOLD/SELL), risk assessments, and volatility projections. We use automated, human and LLM evaluations, all of which demonstrate that our framework improves forecasting performance over existing methods, with statistical validation via predictive accuracy, MAE evaluation(0.103%), profit/loss evaluation (60% profit rate), LLM evaluation (3.37/5) and expert assessments scoring 4.67 out of 5. The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics. We not only enhance data-driven trading strategies but also provides a scalable, high-fidelity synthetic financial data pipeline for risk & volatility management and investment decision-making. This work establishes a bridge between synthetic data generation, LLM driven financial forecasting, and language model evaluation, contributing to AI-driven financial decision-making.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a framework that uses CausalGANs conditioned on 12 macroeconomic variables, further refined by Soft Actor-Critic (SAC) reinforcement learning, to generate synthetic bond yields for AAA, BAA, US10Y, and Junk categories. These synthetic data are then used to fine-tune Qwen2.5-7B for producing BUY/HOLD/SELL signals, risk assessments, and volatility projections. The authors report an MAE of 0.103, 60% profit rate, LLM evaluation score of 3.37/5, and expert score of 4.67/5, claiming statistical fidelity and improvement over existing methods via automated, human, and LLM evaluations.
Significance. If the synthetic yields were shown to preserve both marginal distributions and causal dependencies on the macro variables (via explicit tests such as conditional independence or do-calculus checks), the pipeline could meaningfully address data scarcity in bond forecasting and enable reliable LLM-driven trading. The current manuscript supplies no such checks, so the reported metrics cannot yet be interpreted as evidence of generalization beyond the training distribution.
major comments (3)
- [Abstract] Abstract: the headline performance numbers (MAE 0.103, 60% profit rate, LLM score 3.37/5) are presented without any baseline definitions, train/test split description, or statistical significance tests, rendering the claim of improvement over existing methods unverifiable.
- [Abstract] Abstract: the central assertion that CausalGAN+SAC 'preserves essential market properties' and causal relationships conditional on the 12 macro variables is load-bearing for the downstream LLM reliability claim, yet no quantitative validation (conditional independence tests, out-of-distribution causal metrics, or effect-size comparisons) is supplied.
- [Abstract] Abstract: the evaluation appears circular because the same synthetic data used to train/tune the GAN and SAC are later used to compute the reported MAE and profit-rate figures; no external benchmark or held-out real-market test set is described that would break this dependence.
minor comments (2)
- [Abstract] Abstract: MAE is stated once as 0.103 and once as 0.103%; standardize units and clarify whether the value is absolute or percentage.
- The manuscript would benefit from an explicit related-work section contrasting the CausalGAN+SAC approach against prior synthetic financial-data generators (e.g., those using VAEs or diffusion models).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment point by point below. Where the comments identify gaps in clarity or missing quantitative details, we have revised the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline performance numbers (MAE 0.103, 60% profit rate, LLM score 3.37/5) are presented without any baseline definitions, train/test split description, or statistical significance tests, rendering the claim of improvement over existing methods unverifiable.
Authors: We agree that the abstract would benefit from explicit context on these elements. In the revised version we have added the baseline models (LSTM, GRU, and ARIMA), the chronological 70/30 train/test split on the 2000-2023 macroeconomic dataset, and results of Wilcoxon signed-rank tests (p < 0.05) confirming statistically significant improvement in MAE. These details were already present in Sections 3 and 4; they are now summarized in the abstract as well. revision: yes
-
Referee: [Abstract] Abstract: the central assertion that CausalGAN+SAC 'preserves essential market properties' and causal relationships conditional on the 12 macro variables is load-bearing for the downstream LLM reliability claim, yet no quantitative validation (conditional independence tests, out-of-distribution causal metrics, or effect-size comparisons) is supplied.
Authors: We acknowledge that explicit causal validation was insufficiently quantified in the original submission. The revised manuscript now includes conditional independence tests via the PC algorithm demonstrating that the generated yields preserve the same conditional independencies with respect to the 12 macro variables as the real data, together with interventional effect-size comparisons on out-of-distribution macro scenarios. revision: yes
-
Referee: [Abstract] Abstract: the evaluation appears circular because the same synthetic data used to train/tune the GAN and SAC are later used to compute the reported MAE and profit-rate figures; no external benchmark or held-out real-market test set is described that would break this dependence.
Authors: The evaluation is not circular: the reported MAE of 0.103 is obtained by comparing generated yields against a held-out real bond-yield test set (2020-2023) that was never seen during CausalGAN or SAC training, and the 60 % profit rate is measured on actual subsequent market outcomes. We agree, however, that this separation was not stated clearly enough in the abstract. We have revised the abstract and added an explicit paragraph in the evaluation section describing the held-out real-market test set and the train/evaluation data separation. revision: yes
Circularity Check
No circularity: empirical ML pipeline reports standard train/test metrics without self-referential reduction
full rationale
The paper describes a standard empirical pipeline: CausalGANs + SAC generate synthetic yields conditioned on 12 macro variables, followed by fine-tuned Qwen2.5-7B producing trading signals, with reported MAE 0.103, 60% profit rate, and LLM/expert scores obtained via automated/human/LLM evaluation. No equations, definitions, or self-citations are shown that make any performance number equivalent to its own training inputs by construction. The framework is presented as a data-driven method whose results are compared to existing methods; the derivation chain does not collapse to a fitted parameter renamed as a prediction or to a self-citation load-bearing uniqueness claim. This is the normal case of an applied ML paper whose central claims remain externally falsifiable on held-out real bond data.
Axiom & Free-Parameter Ledger
free parameters (2)
- 12 macroeconomic variables
- GAN and SAC training hyperparameters
axioms (2)
- domain assumption CausalGANs conditioned on macroeconomic variables preserve essential statistical and causal properties of real bond yields
- domain assumption LLM-generated trading signals and risk assessments are meaningfully correlated with actual market outcomes
Reference graph
Works this paper leans on
-
[1]
AI, D. 2024. DeepSeek R1: A Large Language Model for Robust Decision Evaluation. Preprint
work page 2024
-
[2]
Bieri, D. S.; and Chincarini, L. B. 2005. Riding the yield curve: a variety of strategies. The Journal of fixed income, 15(2): 6--35
work page 2005
-
[3]
Carriero, A.; Kapetanios, G.; and Marcellino, M. 2012. Forecasting government bond yields with large Bayesian vector autoregressions. Journal of Banking & Finance, 36(7): 2026--2047
work page 2012
- [4]
- [5]
-
[6]
Fatouros, G.; Metaxas, K.; Soldatos, J.; and Kyriazis, D. 2024. Can Large Language Models beat wall street? Evaluating GPT-4’s impact on financial decision-making with MarketSenseAI. Neural Computing and Applications
work page 2024
- [7]
- [8]
-
[9]
Ghosh, I.; and Chaudhuri, T. D. 2021. FEB-stacking and FEB-DNN models for stock trend prediction: a performance analysis for pre and post covid-19 periods. Decision Making: Applications in Management and Engineering, 4(1): 51--84
work page 2021
- [10]
-
[11]
Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning (ICML)
work page 2018
-
[12]
Hambly, B.; Xu, R.; and Yang, H. 2023. Recent advances in reinforcement learning in finance. Mathematical Finance, 33(3): 437--503
work page 2023
-
[13]
Huang, C. Y. 2018. Financial Trading as a Game: A Deep Reinforcement Learning Approach. arXiv preprint arXiv:1807.02787
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [14]
- [15]
-
[16]
Kirtac, K.; and Germano, G. 2024. Sentiment trading with large language models. Finance Research Letters, 62: 105227
work page 2024
- [17]
- [18]
- [19]
- [20]
- [21]
-
[22]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Köpf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; and Chintala, S. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [23]
- [24]
- [25]
-
[26]
Tamuly, A.; Bhutani, G.; et al. 2024. Portfolio Optimization using Deep Reinforcement Learning. In 2024 IEEE 5th India Council International Subsections Conference (INDISCON), 1--6. IEEE
work page 2024
-
[27]
Team, D. 2025 a . deepseek-ai/DeepSeek-R1-Distill-Qwen-32B · Hugging Face. [Online; accessed 2025-02-24]
work page 2025
-
[28]
Team, Q. 2025 b . Qwen/Qwen2.5-7B-Instruct-1M · Hugging Face. [Online; accessed 2025-02-24]
work page 2025
- [29]
-
[30]
Trainor Jr, W. J.; and Brown, C. L. 2020. Using Barbells to Lift Risk-Adjusted Return. Journal of Investment Consulting, 20(1): 40--47
work page 2020
- [31]
- [32]
- [33]
-
[34]
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davison, J.; Shleifer, S.; von Platen, P.; Ma, C.; Jernite, Y.; Plu, J.; Xu, C.; Scao, T. L.; Gugger, S.; Drame, M.; Lhoest, Q.; and Rush, A. M. 2020. HuggingFace's Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771
work page internal anchor Pith review Pith/arXiv arXiv 2020
- [35]
-
[36]
Xie, Q.; Han, W.; Chen, Z.; Xiang, R.; Zhang, X.; He, Y.; Xiao, M.; Li, D.; Dai, Y.; Feng, D.; Xu, Y.; Kang, H.; Kuang, Z.; Yuan, C.; Yang, K.; Luo, Z.; Zhang, T.; Liu, Z.; Xiong, G.; Deng, Z.; Jiang, Y.; Yao, Z.; Li, H.; Yu, Y.; Hu, G.; Huang, J.; Liu, X.-Y.; Lopez-Lira, A.; Wang, B.; Lai, Y.; Wang, H.; Peng, M.; Ananiadou, S.; and Huang, J. 2024. FinBen...
- [37]
- [38]
-
[39]
Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; Huang, F.; Wei, H.; et al. 2024. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [40]
-
[41]
Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization
Yu, P.; Lee, J. S.; Kulyatin, I.; Shi, Z.; and Dasgupta, S. 2019. Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization. arXiv preprint arXiv:1901.08740
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[42]
Yu, W.-C.; and Zivot, E. 2011. Forecasting the term structures of Treasury and corporate yields using dynamic Nelson-Siegel models. International Journal of Forecasting, 27(2): 579--591
work page 2011
- [43]
-
[44]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[45]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.