MulFSA: Multi-level Financial Sentiment Analysis Framework for Bond Market
Pith reviewed 2026-05-22 21:47 UTC · model grok-4.3
The pith
MulFSA combines firm-level and industry-level sentiments with duration smoothing to cut credit spread forecast errors by over 10 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MulFSA integrates firm-specific micro-level sentiment, industry-specific meso-level sentiment, and duration-aware smoothing inside pre-trained and large language models. Applied to the 1.35 million text Chinese bond corpus, it produces a daily composite sentiment index whose inclusion in forecasting models yields 10.25 percent MAE reduction and 11.94 percent MAPE reduction for credit spreads, with shifts aligning to major risk events and firm crises.
What carries the argument
The MulFSA framework that systematically integrates firm-specific micro-level sentiment, industry-specific meso-level sentiment, and duration-aware smoothing to model the latency and persistence of textual impact on bond risk.
If this is right
- Credit spread forecasting models achieve lower errors when the daily composite sentiment index is included.
- Sentiment index movements track documented social risk events and firm-specific crises in the Chinese bond market.
- The multi-level construction works on a 1.35 million text corpus spanning a full decade of market activity.
Where Pith is reading between the lines
- The same layered extraction could be tested on equity or derivative markets to check whether multi-level signals add value outside bonds.
- If the duration smoothing step is the main driver, simpler time-decay adjustments might achieve similar gains without separate micro and meso layers.
- Daily index values might support real-time monitoring dashboards that flag emerging credit events earlier than price data alone.
- The framework leaves open whether the improvements hold when the underlying language models are swapped for lighter or open-source alternatives.
- keywords
Load-bearing premise
The duration-aware smoothing and the separation into micro- and meso-level sentiments each add independent predictive value beyond single-level sentiment, and the 1.35 million texts accurately represent the full Chinese bond market from 2013 to 2023.
What would settle it
Running the same credit-spread forecasting regressions on a new out-of-sample period or market while adding the MulFSA index and finding no MAE or MAPE reduction, or finding that a single-level sentiment index performs equally well.
Figures
read the original abstract
Existing financial sentiment analysis methods often fail to capture the multi-faceted nature of risk in bond markets due to their single-level approach and neglect of temporal dynamics. We propose Multi-level Financial Sentiment Analysis (MulFSA) based on pre-trained language models (PLMs) and large language models (LLMs), a novel framework that systematically integrates firm-specific micro-level sentiment, industry-specific meso-level sentiment, and duration-aware smoothing to model the latency and persistence of textual impact. Applying MulFSA to the comprehensive Chinese bond market corpus constructed by us (2013-2023, 1.35M texts), we extracted a daily composite sentiment index. Empirical results show statistically measurable improvements in credit spread forecasting when incorporating sentiment (10.25% MAE and 11.94% MAPE reduction), with sentiment shifts closely correlating with major social risk events and firm-specific crises. Project Page: https://mulfsa.github.io/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MulFSA, a framework that integrates firm-specific micro-level sentiment, industry-specific meso-level sentiment, and duration-aware smoothing using PLMs and LLMs to construct a daily composite sentiment index from a 1.35M-text Chinese bond market corpus (2013-2023). It claims this index yields statistically measurable improvements in credit spread forecasting (10.25% MAE reduction, 11.94% MAPE reduction) over a no-sentiment baseline and correlates with major risk events.
Significance. If the multi-level decomposition and smoothing demonstrably add predictive value beyond simpler sentiment aggregation, the work would advance financial sentiment analysis by addressing multi-faceted risks and temporal dynamics in bond markets. The scale of the constructed corpus (1.35M texts) is a clear strength for domain-specific empirical work.
major comments (2)
- [Abstract / Empirical results] Abstract and empirical results section: The headline improvements (10.25% MAE, 11.94% MAPE) are reported only versus a no-sentiment baseline. No ablation results are provided for micro-only, meso-only, or unsmoothed single-level variants, leaving open whether the multi-level architecture and duration-aware smoothing contribute independent value as asserted in the abstract.
- [Abstract] Abstract: The reported percentage reductions supply no information on the base forecasting regressor, train/test splits, cross-validation procedure, or statistical significance tests (e.g., Diebold-Mariano), which are required to substantiate that the gains are attributable to the MulFSA index rather than confounding factors.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important aspects of clarity and empirical rigor that we will address in the revision. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract / Empirical results] Abstract and empirical results section: The headline improvements (10.25% MAE, 11.94% MAPE) are reported only versus a no-sentiment baseline. No ablation results are provided for micro-only, meso-only, or unsmoothed single-level variants, leaving open whether the multi-level architecture and duration-aware smoothing contribute independent value as asserted in the abstract.
Authors: We agree that ablation experiments are required to isolate the incremental value of the multi-level decomposition and duration-aware smoothing. The current manuscript reports only the full MulFSA versus the no-sentiment baseline. In the revised version we will add a dedicated ablation subsection in the empirical results that compares the complete framework against micro-only, meso-only, and unsmoothed single-level variants, using the same forecasting setup and significance tests. revision: yes
-
Referee: [Abstract] Abstract: The reported percentage reductions supply no information on the base forecasting regressor, train/test splits, cross-validation procedure, or statistical significance tests (e.g., Diebold-Mariano), which are required to substantiate that the gains are attributable to the MulFSA index rather than confounding factors.
Authors: The abstract currently omits these methodological specifics. The full manuscript describes the forecasting setup in Section 4, but to make the claims self-contained we will revise the abstract to include a concise statement of the base regressor, the temporal train/test protocol, the cross-validation approach, and the statistical tests employed. The empirical results section will also be expanded with explicit reporting of these elements. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper constructs the MulFSA sentiment index from an external 1.35M-text corpus via PLMs/LLMs with micro/meso levels and duration smoothing, then reports empirical forecasting gains on credit spreads versus a no-sentiment baseline. No equations or steps reduce the index construction or the reported MAE/MAPE improvements to self-definition, fitted parameters reused as predictions, or load-bearing self-citations. The chain relies on independent data processing and model application rather than tautological re-use of inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
F. Perri and V . Quadrini, “International recessions,” American Economic Review, vol. 108, no. 4-5, pp. 935–984, 2018
work page 2018
-
[2]
Investor sentiment and the cross-section of stock returns,
M. Baker et al. , “Investor sentiment and the cross-section of stock returns,” The journal of Finance , vol. 61, no. 4, pp. 1645–1680, 2006
work page 2006
-
[3]
News and asset pricing: A high-frequency anatomy of the sdf,
S. Aleti and T. Bollerslev, “News and asset pricing: A high-frequency anatomy of the sdf,” The Review of Financial Studies, p. hhae019, 2024
work page 2024
-
[4]
Emotions in macroeconomic news and their impact on the european bond market,
S. Consoli, L. T. Pezzoli, and E. Tosetti, “Emotions in macroeconomic news and their impact on the european bond market,” Journal of International Money and Finance , vol. 118, p. 102472, 2021
work page 2021
-
[5]
Financial sentiment analysis: Techniques and applica- tions,
K. Du et al. , “Financial sentiment analysis: Techniques and applica- tions,” ACM Computing Surveys , vol. 56, no. 9, pp. 1–42, 2024
work page 2024
-
[6]
Identifying corporate credit risk sentiments from financial news,
N. Ahbali, X. Liu, A. Nanda, J. Stark, A. Talukder, and R. P. Khandpur, “Identifying corporate credit risk sentiments from financial news,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2022, pp. 362–370
work page 2022
-
[7]
How to gauge investor behavior? a comparison of online investor sentiment measures,
D. Ballinari and S. Behrendt, “How to gauge investor behavior? a comparison of online investor sentiment measures,” Digital Finance , vol. 3, no. 2, pp. 169–204, 2021
work page 2021
-
[8]
Year and industry-level accounting narrative analysis: Readability and tone variation,
E. Efretuei, “Year and industry-level accounting narrative analysis: Readability and tone variation,” Journal of Emerging Technologies in Accounting, vol. 18, no. 2, pp. 53–76, 2021
work page 2021
-
[9]
Z. Liu et al., “Emollms: A series of emotional large language models and annotation tools for comprehensive affective analysis,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5487–5496
work page 2024
-
[10]
B. Mendel and A. Shleifer, “Chasing noise,” Journal of Financial Economics, vol. 104, no. 2, pp. 303–320, 2012
work page 2012
-
[11]
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
D. Araci, “Finbert: Financial sentiment analysis with pre-trained lan- guage models,” arXiv preprint arXiv:1908.10063 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[12]
Fingpt: Democratizing internet-scale data for financial large language models,
X.-Y . Liu et al., “Fingpt: Democratizing internet-scale data for financial large language models,” arXiv preprint arXiv:2307.10485 , 2023
-
[13]
BloombergGPT: A Large Language Model for Finance
S. Wu, O. Irsoy, S. Lu, V . Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large language model for finance,” arXiv preprint arXiv:2303.17564 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Large language models in finance (finllms),
J. Lee, N. Stevens, and S. C. Han, “Large language models in finance (finllms),” Neural Computing and Applications , pp. 1–15, 2025
work page 2025
-
[15]
A survey on aspect-based sentiment analysis: Tasks, methods, and challenges,
W. Zhang, X. Li, Y . Deng, L. Bing, and W. Lam, “A survey on aspect-based sentiment analysis: Tasks, methods, and challenges,” IEEE Transactions on Knowledge and Data Engineering , vol. 35, no. 11, pp. 11 019–11 038, 2022
work page 2022
-
[16]
Big data: Deep learning for financial sentiment analysis,
S. Sohangir, D. Wang, A. Pomeranets, and T. M. Khoshgoftaar, “Big data: Deep learning for financial sentiment analysis,” Journal of Big Data, vol. 5, no. 1, pp. 1–25, 2018
work page 2018
-
[17]
An investigation of investor sentiment and specula- tive bond yield spreads,
G. Cerci et al. , “An investigation of investor sentiment and specula- tive bond yield spreads,” in International Interdisciplinary Business- Economics Advancement Conference , 2015, p. 224
work page 2015
-
[18]
Good debt or bad debt: Detecting semantic orientations in economic texts,
P. Malo et al., “Good debt or bad debt: Detecting semantic orientations in economic texts,” Journal of the Association for Information Science and Technology, vol. 65, no. 4, pp. 782–796, 2014
work page 2014
-
[19]
Www’18 open challenge: financial opinion mining and question answering,
M. Maia, S. Handschuh, A. Freitas, B. Davis, R. McDermott, M. Zarrouk, and A. Balahur, “Www’18 open challenge: financial opinion mining and question answering,” in Companion proceedings of the the web conference 2018 , 2018, pp. 1941–1942
work page 2018
-
[20]
When flue me ets flang: Benchmarks and large pre-trained language model for financial domain
R. S. Shah, K. Chawla, D. Eidnani, A. Shah, W. Du, S. Chava, N. Raman, C. Smiley, J. Chen, and D. Yang, “When flue meets flang: Benchmarks and large pre-trained language model for financial domain,” arXiv preprint arXiv:2211.00083, 2022
-
[21]
Combining enterprise knowledge graph and news sentiment analysis for stock price prediction,
J. Liu, Z. Lu, and W. Du, “Combining enterprise knowledge graph and news sentiment analysis for stock price prediction,”Hawaii International Conference on System Sciences , 2019
work page 2019
-
[22]
Bias propagation in economically linked firms,
T. Jochem and F. S. Peters, “Bias propagation in economically linked firms,” Available at SSRN 2698365 , 2019
work page 2019
-
[23]
Too sensitive to fail: The impact of sentiment connect- edness on stock price crash risk,
J. Cao et al. , “Too sensitive to fail: The impact of sentiment connect- edness on stock price crash risk,” Entropy, vol. 27, no. 4, p. 345, 2025
work page 2025
-
[24]
News sentiment and bank credit risk,
L. A. Smales, “News sentiment and bank credit risk,” Journal of Empirical Finance, vol. 38, pp. 37–61, 2016
work page 2016
-
[25]
The timeliness of the bond market reaction to bad earnings news,
M. L. Defond and J. Zhang, “The timeliness of the bond market reaction to bad earnings news,” Contemporary Accounting Research , vol. 31, no. 3, pp. 911–936, 2014
work page 2014
-
[26]
A survey of sentiment analysis in social media,
L. Yue et al. , “A survey of sentiment analysis in social media,” Knowledge and Information Systems , vol. 60, pp. 617–663, 2019
work page 2019
-
[27]
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
M. Zhang, Y . Wu, W. Li, and W. Li, “Learning universal sentence representations with mean-max attention autoencoder,” arXiv preprint arXiv:1809.06590, 2018. 12
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Large lan- guage models are zero-shot reasoners,
T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large lan- guage models are zero-shot reasoners,” Advances in neural information processing systems, vol. 35, pp. 22 199–22 213, 2022
work page 2022
-
[29]
Language mod- els are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
work page 1901
-
[30]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou et al. , “Chain-of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems , vol. 35, pp. 24 824–24 837, 2022
work page 2022
-
[31]
Foundations of portfolio theory,
H. M. Markowitz, “Foundations of portfolio theory,” The journal of finance, vol. 46, no. 2, pp. 469–477, 1991
work page 1991
-
[32]
D. Chong and J. N. Druckman, “Framing theory,” Annu. Rev. Polit. Sci., vol. 10, no. 1, pp. 103–126, 2007
work page 2007
-
[33]
M. Nerlove, D. M. Grether, and J. L. Carvalho, Analysis of economic time series: a synthesis . Academic Press, 2014
work page 2014
-
[34]
Credit spreads as predictors of real-time economic activity: a bayesian model-averaging approach,
J. Faust et al. , “Credit spreads as predictors of real-time economic activity: a bayesian model-averaging approach,” Review of Economics and Statistics, vol. 95, no. 5, pp. 1501–1519, 2013
work page 2013
-
[35]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
P. Wang et al., “Qwen2-vl: Enhancing vision-language model’s percep- tion of the world at any resolution,” arXiv preprint arXiv:2409.12191 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
C-pack: Packaged resources to advance general chinese embedding,
S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-pack: Packaged resources to advance general chinese embedding,” 2023
work page 2023
-
[37]
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,
DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025
work page 2025
-
[38]
No contagion, only interdependence: measuring stock market comovements,
K. J. Forbes and R. Rigobon, “No contagion, only interdependence: measuring stock market comovements,” The journal of Finance, vol. 57, no. 5, pp. 2223–2261, 2002
work page 2002
-
[39]
Clean evidence on peer effects,
A. Falk and A. Ichino, “Clean evidence on peer effects,” Journal of labor economics, vol. 24, no. 1, pp. 39–57, 2006
work page 2006
-
[40]
Stl: A seasonal-trend decomposition,
R. B. Cleveland et al. , “Stl: A seasonal-trend decomposition,” J. off. Stat, vol. 6, no. 1, pp. 3–73, 1990
work page 1990
-
[41]
Permutation tests for studying classifier performance
M. Ojala et al., “Permutation tests for studying classifier performance.” Journal of machine learning research , vol. 11, no. 6, 2010
work page 2010
-
[42]
A. Fisher, C. Rudin, and F. Dominici, “All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously,” Journal of Machine Learning Research, vol. 20, no. 177, pp. 1–81, 2019. Yiwei Liu is an undergraduate student expected to obtain a Bachelor’s degree in Software Engineering, Si...
work page 2019
-
[43]
Shibor (Shanghai Interbank Offered Rate) in March
-
[44]
Manufacturing PMI (Purchasing Managers’ Index)
-
[45]
Macroeconomic Prosperity Index: Leading Index
-
[46]
PPI (Producer Price Index): Year-over-Year for the Current Month
-
[47]
GDP (Gross Domestic Product): Year-over- Year for the Current Quarter
-
[48]
CPI (Consumer Price Index): Year-over-Year for the Current Month
-
[49]
Aggregate Financing to the Real Economy (AFRE): Year-over-Year at Period-End
-
[50]
SWS 3 Primary Industry Index Trading Indicator 11
Yield on Government Bonds (for the Corre- sponding Period) Industrial Indicator 10. SWS 3 Primary Industry Index Trading Indicator 11. Trading V olume Firm Financial and Operational Indicators
-
[51]
Current Assets 14 Firm Financial and Operational Indicators
-
[52]
Non-Current Liabilities
-
[53]
Total Shareholders’ Equity
-
[54]
Cash Flow from Operations
-
[55]
Cash Flow from Investment
-
[56]
Cash Flow from Finance
-
[57]
Debt-to-Asset Ratio (%)
-
[58]
Tangible Net Worth Debt Ratio (%)
-
[59]
Gross Profit Margin (%)
-
[60]
Net Profit Margin (%)
-
[61]
Return on Assets (%)
-
[62]
Operating Profit Margin (%)
-
[63]
Average Return on Equity (%)
-
[64]
Operating Cycle (Days)
-
[65]
Inventory Turnover Ratio
-
[66]
Accounts Receivable Turnover Ratio
-
[67]
Current Asset Turnover Ratio
-
[68]
Shareholders’ Equity Turnover Ratio
-
[69]
Total Asset Turnover Ratio Firm Comprehensive Credit Indicators
-
[70]
Remaining Credit Utilization Ratio
-
[71]
Month-over-Month Change in Credit
-
[72]
Secured Credit Ratio TABLE VIII: Feature Attribution. Feature Name Feature Importance 1: PPI 0.05599892602199476 2: Macroeconomic Climate Index 0.05008474777898123 3: AFRE 0.048741595825980824 4: Yield on Government Bonds 0.04777380830739202 5: Shibor 0.04202742096446867 6: CPI 0.03426918656953126 7: GDP 0.024902612696883587 8: Cash Flow from Finance 0.02...
-
[73]
Pessimistic (-1): The text describes factors reflecting negative market sentiment in macro-financial contexts, particularly the bond market. These include concerns about economic fundamentals (such as slowdowns or rising unemployment), expectations of tighter monetary policy and higher interest rates, reduced market liquidity, and risk events like default...
-
[74]
The content is mostly unrelated to macro financial markets, especially the bond market
Neutral (0): The text describes neutral market sentiment, characterized by stable economic fundamentals, unchanged monetary policy with priced- in expectations, balanced market liquidity, and absence of major risk events. The content is mostly unrelated to macro financial markets, especially the bond market
-
[75]
It is related to macro financial markets, particularly the bond market
Optimistic (1): The text reflects positive market sentiment, with mentions of stronger economic fundamentals, expectations of rate cuts or accommodative policy, ample liquidity, and easing risk events such as debt relief or rating upgrades. It is related to macro financial markets, particularly the bond market. **Output Requirements:** - a) First, identif...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.