pith. sign in

arxiv: 2502.17011 · v2 · submitted 2025-02-24 · 💱 q-fin.CP · cs.CE· cs.CL· cs.LG· q-fin.PM

Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation

Pith reviewed 2026-05-23 02:51 UTC · model grok-4.3

classification 💱 q-fin.CP cs.CEcs.CLcs.LGq-fin.PM
keywords bond yield forecastingCausalGANsreinforcement learningsynthetic data generationLLM trading signalsmacroeconomic variablesfinancial risk managementliquidity-aware yields
0
0 comments X

The pith

CausalGANs augmented by reinforcement learning generate synthetic bond yields that let a fine-tuned LLM issue trading signals with 0.103 mean absolute error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes using Causal Generative Adversarial Networks together with Soft Actor-Critic reinforcement learning to create synthetic yield series for AAA, BAA, US10Y and Junk bonds. These series are conditioned on twelve macroeconomic variables so that statistical and causal properties of real markets are retained. The resulting data then fine-tunes Qwen2.5-7B to output BUY/HOLD/SELL signals, risk assessments and volatility projections. Multiple evaluations, including an MAE of 0.103 for the RL-enhanced generator, a 60 percent profit rate, and expert scores of 4.67 out of 5, are presented as evidence that the pipeline outperforms prior forecasting approaches.

Core claim

The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics. The overall framework improves forecasting performance over existing methods, with statistical validation via predictive accuracy, MAE evaluation (0.103 percent), profit/loss evaluation (60 percent profit rate), LLM evaluation (3.37 out of 5) and expert assessments scoring 4.67 out of 5.

What carries the argument

Causal Generative Adversarial Networks (CausalGANs) combined with Soft Actor-Critic (SAC) reinforcement learning to produce synthetic bond yields conditioned on twelve macroeconomic variables while preserving statistical fidelity and causal structure.

If this is right

  • The RL-enhanced generator attains the lowest reported MAE of 0.103 percent across the four bond categories.
  • Back-tested signals from the LLM achieve a 60 percent profit rate.
  • LLM-based evaluation of the generated signals scores 3.37 out of 5.
  • Human expert review of the full pipeline scores 4.67 out of 5.
  • The approach supplies a scalable synthetic-data pipeline for risk, volatility and investment decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the causal structure is faithfully reproduced, the same generator could be driven with altered macroeconomic inputs to simulate stress scenarios without collecting new market data.
  • Periodic retraining on live macro releases could allow the LLM signals to operate in production while maintaining the reported error levels.
  • The conditioning approach might transfer to other sparsely observed asset classes where causal macro linkages are similarly strong.

Load-bearing premise

The synthetic yields produced by CausalGANs and SAC preserve the statistical and causal relationships present in real bond markets when conditioned on the twelve macroeconomic variables.

What would settle it

Running the LLM trading signals derived from the synthetic generator on a fresh out-of-sample window of actual bond prices and observing that the resulting MAE or profit rate falls below the reported 0.103 and 60 percent figures would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2502.17011 by Aarush Sinha, Jaskaran Singh Walia, Naman Saraswat, Srihari Unnikrishnan, Srinitish Srinivasan.

Figure 1
Figure 1. Figure 1: Overall pipeline of the training and evaluation stages of our proposed methodology. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the models in our pipeline for Causal GANs and Deep Reinforcement Learning with Soft actor [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Real-time reward curves for Reinforcement Learn [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Plot of the evaluation given by the LLM over the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Plots for the Total Profit and Total Loss months between RL, GAN and Actual for each bond type. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Financial bond yield forecasting is challenging due to data scarcity, nonlinear macroeconomic dependencies, and evolving market conditions. In this paper, we propose a novel framework that leverages Causal Generative Adversarial Networks (CausalGANs) and Soft Actor-Critic (SAC) reinforcement learning (RL) to generate high-fidelity synthetic bond yield data for four major bond categories (AAA, BAA, US10Y, Junk). By incorporating 12 key macroeconomic variables, we ensure statistical fidelity by preserving essential market properties. To transform this market dependent synthetic data into actionable insights, we employ a finetuned Large Language Model (LLM) Qwen2.5-7B that generates trading signals (BUY/HOLD/SELL), risk assessments, and volatility projections. We use automated, human and LLM evaluations, all of which demonstrate that our framework improves forecasting performance over existing methods, with statistical validation via predictive accuracy, MAE evaluation(0.103%), profit/loss evaluation (60% profit rate), LLM evaluation (3.37/5) and expert assessments scoring 4.67 out of 5. The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics. We not only enhance data-driven trading strategies but also provides a scalable, high-fidelity synthetic financial data pipeline for risk & volatility management and investment decision-making. This work establishes a bridge between synthetic data generation, LLM driven financial forecasting, and language model evaluation, contributing to AI-driven financial decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a framework that uses CausalGANs conditioned on 12 macroeconomic variables, further refined by Soft Actor-Critic (SAC) reinforcement learning, to generate synthetic bond yields for AAA, BAA, US10Y, and Junk categories. These synthetic data are then used to fine-tune Qwen2.5-7B for producing BUY/HOLD/SELL signals, risk assessments, and volatility projections. The authors report an MAE of 0.103, 60% profit rate, LLM evaluation score of 3.37/5, and expert score of 4.67/5, claiming statistical fidelity and improvement over existing methods via automated, human, and LLM evaluations.

Significance. If the synthetic yields were shown to preserve both marginal distributions and causal dependencies on the macro variables (via explicit tests such as conditional independence or do-calculus checks), the pipeline could meaningfully address data scarcity in bond forecasting and enable reliable LLM-driven trading. The current manuscript supplies no such checks, so the reported metrics cannot yet be interpreted as evidence of generalization beyond the training distribution.

major comments (3)
  1. [Abstract] Abstract: the headline performance numbers (MAE 0.103, 60% profit rate, LLM score 3.37/5) are presented without any baseline definitions, train/test split description, or statistical significance tests, rendering the claim of improvement over existing methods unverifiable.
  2. [Abstract] Abstract: the central assertion that CausalGAN+SAC 'preserves essential market properties' and causal relationships conditional on the 12 macro variables is load-bearing for the downstream LLM reliability claim, yet no quantitative validation (conditional independence tests, out-of-distribution causal metrics, or effect-size comparisons) is supplied.
  3. [Abstract] Abstract: the evaluation appears circular because the same synthetic data used to train/tune the GAN and SAC are later used to compute the reported MAE and profit-rate figures; no external benchmark or held-out real-market test set is described that would break this dependence.
minor comments (2)
  1. [Abstract] Abstract: MAE is stated once as 0.103 and once as 0.103%; standardize units and clarify whether the value is absolute or percentage.
  2. The manuscript would benefit from an explicit related-work section contrasting the CausalGAN+SAC approach against prior synthetic financial-data generators (e.g., those using VAEs or diffusion models).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below. Where the comments identify gaps in clarity or missing quantitative details, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline performance numbers (MAE 0.103, 60% profit rate, LLM score 3.37/5) are presented without any baseline definitions, train/test split description, or statistical significance tests, rendering the claim of improvement over existing methods unverifiable.

    Authors: We agree that the abstract would benefit from explicit context on these elements. In the revised version we have added the baseline models (LSTM, GRU, and ARIMA), the chronological 70/30 train/test split on the 2000-2023 macroeconomic dataset, and results of Wilcoxon signed-rank tests (p < 0.05) confirming statistically significant improvement in MAE. These details were already present in Sections 3 and 4; they are now summarized in the abstract as well. revision: yes

  2. Referee: [Abstract] Abstract: the central assertion that CausalGAN+SAC 'preserves essential market properties' and causal relationships conditional on the 12 macro variables is load-bearing for the downstream LLM reliability claim, yet no quantitative validation (conditional independence tests, out-of-distribution causal metrics, or effect-size comparisons) is supplied.

    Authors: We acknowledge that explicit causal validation was insufficiently quantified in the original submission. The revised manuscript now includes conditional independence tests via the PC algorithm demonstrating that the generated yields preserve the same conditional independencies with respect to the 12 macro variables as the real data, together with interventional effect-size comparisons on out-of-distribution macro scenarios. revision: yes

  3. Referee: [Abstract] Abstract: the evaluation appears circular because the same synthetic data used to train/tune the GAN and SAC are later used to compute the reported MAE and profit-rate figures; no external benchmark or held-out real-market test set is described that would break this dependence.

    Authors: The evaluation is not circular: the reported MAE of 0.103 is obtained by comparing generated yields against a held-out real bond-yield test set (2020-2023) that was never seen during CausalGAN or SAC training, and the 60 % profit rate is measured on actual subsequent market outcomes. We agree, however, that this separation was not stated clearly enough in the abstract. We have revised the abstract and added an explicit paragraph in the evaluation section describing the held-out real-market test set and the train/evaluation data separation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline reports standard train/test metrics without self-referential reduction

full rationale

The paper describes a standard empirical pipeline: CausalGANs + SAC generate synthetic yields conditioned on 12 macro variables, followed by fine-tuned Qwen2.5-7B producing trading signals, with reported MAE 0.103, 60% profit rate, and LLM/expert scores obtained via automated/human/LLM evaluation. No equations, definitions, or self-citations are shown that make any performance number equivalent to its own training inputs by construction. The framework is presented as a data-driven method whose results are compared to existing methods; the derivation chain does not collapse to a fitted parameter renamed as a prediction or to a self-citation load-bearing uniqueness claim. This is the normal case of an applied ML paper whose central claims remain externally falsifiable on held-out real bond data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions about generative models and LLM reliability rather than new axioms, but multiple hyperparameters and domain assumptions remain implicit and untested in the provided abstract.

free parameters (2)
  • 12 macroeconomic variables
    Selected as conditioning inputs; exact identities and any scaling or selection criteria are not stated.
  • GAN and SAC training hyperparameters
    Chosen to achieve the reported MAE of 0.103; values and selection procedure not disclosed.
axioms (2)
  • domain assumption CausalGANs conditioned on macroeconomic variables preserve essential statistical and causal properties of real bond yields
    Invoked to justify use of synthetic data for downstream LLM training.
  • domain assumption LLM-generated trading signals and risk assessments are meaningfully correlated with actual market outcomes
    Used to claim actionable insights from the synthetic data.

pith-pipeline@v0.9.0 · 5848 in / 1522 out tokens · 32464 ms · 2026-05-23T02:51:12.464269+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 5 internal anchors

  1. [1]

    AI, D. 2024. DeepSeek R1: A Large Language Model for Robust Decision Evaluation. Preprint

  2. [2]

    S.; and Chincarini, L

    Bieri, D. S.; and Chincarini, L. B. 2005. Riding the yield curve: a variety of strategies. The Journal of fixed income, 15(2): 6--35

  3. [3]

    Carriero, A.; Kapetanios, G.; and Marcellino, M. 2012. Forecasting government bond yields with large Bayesian vector autoregressions. Journal of Banking & Finance, 36(7): 2026--2047

  4. [4]

    Ding, Y.; Jia, S.; Ma, T.; Mao, B.; Zhou, X.; Li, L.; and Han, D. 2023. Integrating Stock Features and Global Information via Large Language Models for Enhanced Stock Return Prediction. arXiv:2310.05627

  5. [5]

    Efimov, D.; Xu, D.; Kong, L.; Nefedov, A.; and Anandakrishnan, A. 2020. Using generative adversarial networks to synthesize artificial financial datasets. arXiv preprint arXiv:2002.02271

  6. [6]

    Fatouros, G.; Metaxas, K.; Soldatos, J.; and Kyriazis, D. 2024. Can Large Language Models beat wall street? Evaluating GPT-4’s impact on financial decision-making with MarketSenseAI. Neural Computing and Applications

  7. [7]

    Feng, D.; Dai, Y.; Huang, J.; Zhang, Y.; Xie, Q.; Han, W.; Chen, Z.; Lopez-Lira, A.; and Wang, H. 2024. Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models. arXiv:2310.00566

  8. [8]

    Fu, W.; Hirsa, A.; and Osterrieder, J. 2022. Simulating financial time series using attention. arXiv preprint arXiv:2207.00493

  9. [9]

    Ghosh, I.; and Chaudhuri, T. D. 2021. FEB-stacking and FEB-DNN models for stock trend prediction: a performance analysis for pre and post covid-19 periods. Decision Making: Applications in Management and Engineering, 4(1): 51--84

  10. [10]

    Guo, T.; and Hauptmann, E. 2024. Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow. arXiv:2407.18103

  11. [11]

    Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning (ICML)

  12. [12]

    Hambly, B.; Xu, R.; and Yang, H. 2023. Recent advances in reinforcement learning in finance. Mathematical Finance, 33(3): 437--503

  13. [13]

    Huang, C. Y. 2018. Financial Trading as a Game: A Deep Reinforcement Learning Approach. arXiv preprint arXiv:1807.02787

  14. [14]

    Huang, G.; Zhou, X.; and Song, Q. 2022. Deep reinforcement learning for portfolio management. arXiv:2012.13773

  15. [15]

    Kim, A.; Muhn, M.; and Nikolaev, V. 2024. Financial Statement Analysis with Large Language Models. arXiv:2407.17866

  16. [16]

    Kirtac, K.; and Germano, G. 2024. Sentiment trading with large language models. Finance Research Letters, 62: 105227

  17. [17]

    Li, J.; Wang, X.; Lin, Y.; Sinha, A.; and Wellman, M. P. 2020. Generating Realistic Stock Market Order Streams. arXiv preprint arXiv:2006.04212

  18. [18]

    Li, Z.; Liu, X.-Y.; Zheng, J.; Wang, Z.; Walid, A.; and Guo, J. 2021. FinRL-Podracer: High Performance and Scalable Deep Reinforcement Learning for Quantitative Finance. arXiv preprint arXiv:2111.05188

  19. [19]

    Lopez-Lira, A.; and Tang, Y. 2024. Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models. arXiv:2304.07619

  20. [20]

    Nagy, P.; Frey, S.; Li, K.; Sarkar, B.; Vyetrenko, S.; Zohren, S.; Calinescu, A.; and Foerster, J. 2025. LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data. arXiv:2502.09172

  21. [21]

    Niu, Y.; Lu, L.; Dolphin, R.; Poti, V.; and Dong, R. 2024. Evaluating Financial Relational Graphs: Interpretation Before Prediction. arXiv:2410.07216

  22. [22]

    Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Köpf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; and Chintala, S. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703

  23. [23]

    Pippas, N.; Turkay, C.; and Ludvig, E. A. 2024. The Evolution of Reinforcement Learning in Quantitative Finance: A Survey. arXiv preprint arXiv:2408.10932

  24. [24]

    Rizzato, M.; Wallart, J.; Geissler, C.; Morizet, N.; and Boumlaik, N. 2022. Generative Adversarial Networks Applied to Synthetic Financial Scenarios Generation. arXiv preprint arXiv:2209.03935

  25. [25]

    Selser, M.; Kreiner, J.; and Maurette, M. 2021. Optimal Market Making by Reinforcement Learning. arXiv preprint arXiv:2104.04036

  26. [26]

    Tamuly, A.; Bhutani, G.; et al. 2024. Portfolio Optimization using Deep Reinforcement Learning. In 2024 IEEE 5th India Council International Subsections Conference (INDISCON), 1--6. IEEE

  27. [27]

    Team, D. 2025 a . deepseek-ai/DeepSeek-R1-Distill-Qwen-32B · Hugging Face. [Online; accessed 2025-02-24]

  28. [28]

    Team, Q. 2025 b . Qwen/Qwen2.5-7B-Instruct-1M · Hugging Face. [Online; accessed 2025-02-24]

  29. [29]

    Th \'e ate, T.; and Ernst, D. 2020. An Application of Deep Reinforcement Learning to Algorithmic Trading. arXiv preprint arXiv:2004.06627

  30. [30]

    J.; and Brown, C

    Trainor Jr, W. J.; and Brown, C. L. 2020. Using Barbells to Lift Risk-Adjusted Return. Journal of Investment Consulting, 20(1): 40--47

  31. [31]

    Wang, M.; Izumi, K.; and Sakaji, H. 2024. LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction. arXiv:2406.10811

  32. [32]

    Wiese, M.; Bai, L.; Wood, B.; and Buehler, H. 2019 a . Deep Hedging: Learning to Simulate Equity Option Markets. arXiv preprint arXiv:1911.01700

  33. [33]

    Wiese, M.; Knobloch, R.; Korn, R.; and Kretschmer, P. 2019 b . Quant GANs: Deep Generation of Financial Time Series. arXiv preprint arXiv:1907.06673

  34. [34]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davison, J.; Shleifer, S.; von Platen, P.; Ma, C.; Jernite, Y.; Plu, J.; Xu, C.; Scao, T. L.; Gugger, S.; Drame, M.; Lhoest, Q.; and Rush, A. M. 2020. HuggingFace's Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771

  35. [35]

    Xia, H.; Sun, S.; Wang, X.; and An, B. 2023. Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context. arXiv preprint arXiv:2309.07708

  36. [36]

    Xie, Q.; Han, W.; Chen, Z.; Xiang, R.; Zhang, X.; He, Y.; Xiao, M.; Li, D.; Dai, Y.; Feng, D.; Xu, Y.; Kang, H.; Kuang, Z.; Yuan, C.; Yang, K.; Luo, Z.; Zhang, T.; Liu, Z.; Xiong, G.; Deng, Z.; Jiang, Y.; Yao, Z.; Li, H.; Yu, Y.; Hu, G.; Huang, J.; Liu, X.-Y.; Lopez-Lira, A.; Wang, B.; Lai, Y.; Wang, H.; Peng, M.; Ananiadou, S.; and Huang, J. 2024. FinBen...

  37. [37]

    Xie, Q.; Han, W.; Zhang, X.; Lai, Y.; Peng, M.; Lopez-Lira, A.; and Huang, J. 2023. PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance. arXiv:2306.05443

  38. [38]

    Xu, L.; Zhu, L.; Wu, Y.; and Xue, H. 2024. SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications. arXiv:2404.19063

  39. [39]

    Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; Huang, F.; Wei, H.; et al. 2024. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115

  40. [40]

    Yoon, J.; Jarrett, D.; and van der Schaar, M. 2019. Time-series Generative Adversarial Networks. arXiv preprint arXiv:1907.03107

  41. [41]

    Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

    Yu, P.; Lee, J. S.; Kulyatin, I.; Shi, Z.; and Dasgupta, S. 2019. Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization. arXiv preprint arXiv:1901.08740

  42. [42]

    Yu, W.-C.; and Zivot, E. 2011. Forecasting the term structures of Treasury and corporate yields using dynamic Nelson-Siegel models. International Journal of Forecasting, 27(2): 579--591

  43. [43]

    Zhang, Z.; Zohren, S.; and Roberts, S. 2019. Deep Reinforcement Learning for Trading. arXiv preprint arXiv:1911.10107

  44. [44]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  45. [45]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...