Bridging Language Models and Financial Analysis

Alejandro Lopez-Lira; Chanyeol Choi; Jihoon Kwon; Jy-yong Sohn; Sangwoon Yoon

arxiv: 2503.22693 · v2 · pith:WT26HGN6new · submitted 2025-03-14 · 💱 q-fin.ST · cs.AI· cs.CL

Bridging Language Models and Financial Analysis

Alejandro Lopez-Lira , Jihoon Kwon , Sangwoon Yoon , Jy-yong Sohn , Chanyeol Choi This is my paper

Pith reviewed 2026-05-23 01:05 UTC · model grok-4.3

classification 💱 q-fin.ST cs.AIcs.CL

keywords large language modelsfinancial analysissurveyfinancial data processingLLM applicationsnatural language processingfinance industry adoption

0 comments

The pith

Large language models offer new pathways for analyzing financial data in text, tables, and charts, yet adoption in the finance industry lags behind research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that financial data involves intricate relationships across textual, numerical, and visual elements that traditional methods handle poorly. It shows how recent LLM advances create opportunities for more efficient processing and insight in this domain. A significant gap exists because finance prioritizes cautious integration and long-term validation while LLM research moves quickly, leaving many techniques underexplored. By synthesizing studies on novel LLM methodologies and their relevance to finance, the survey provides direction on research avenues and future opportunities for practitioners and researchers.

Core claim

The survey claims that the emergence of LLMs offers new pathways for processing and analyzing multifaceted financial data with increased efficiency and insight, and that a comprehensive overview of recent LLM developments, building on prior literature, can bridge the adoption gap by highlighting distinctive capabilities and outlining applications in the financial sector.

What carries the argument

A synthesis of insights from studies on novel LLM methodologies, examining their distinctive capabilities and potential relevance to financial data analysis.

If this is right

Researchers receive guidance on promising avenues for LLM applications in finance.
Practitioners obtain direction on future opportunities to advance LLM use in financial analysis.
The survey serves as a resource that can reduce underutilization of recent LLM techniques in the financial domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Testing specific LLM techniques from the overview on real financial datasets could reveal efficiency gains over conventional analysis tools.
The pattern of cautious adoption seen here may appear in other regulated data-heavy fields, pointing to a general role for targeted surveys.
Integrating LLMs with existing financial systems might enable analysis of combined text and chart data at scales not previously feasible.

Load-bearing premise

A synthesis of existing studies on LLM developments will effectively close the adoption gap between LLM research and finance industry practice.

What would settle it

A follow-up review or industry report showing no measurable increase in the exploration or implementation of the highlighted LLM techniques in financial applications within two to three years would challenge the bridging effect.

read the original abstract

The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing, particularly within the financial sector. Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts, posing challenges that traditional methods struggle to address effectively. However, the emergence of LLMs offers new pathways for processing and analyzing this multifaceted data with increased efficiency and insight. Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry, where cautious integration and long-term validation are prioritized. This disparity has led to a slower implementation of emerging LLM techniques, despite their immense potential in financial applications. As a result, many of the latest advancements in LLM technology remain underexplored or not fully utilized in this domain. This survey seeks to bridge this gap by providing a comprehensive overview of recent developments in LLM research and examining their applicability to the financial sector. Building on previous survey literature, we highlight several novel LLM methodologies, exploring their distinctive capabilities and their potential relevance to financial data analysis. By synthesizing insights from a broad range of studies, this paper aims to serve as a valuable resource for researchers and practitioners, offering direction on promising research avenues and outlining future opportunities for advancing LLM applications in finance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard literature survey on LLMs in finance that organizes existing work but does not supply the validation step the abstract itself says the industry requires.

read the letter

The paper is a survey that collects recent LLM papers and discusses their potential fit with financial data like text, tables, and charts. It builds directly on earlier surveys and flags some newer methods plus open directions. That part is straightforward and could save time for someone who needs a map of the area rather than original experiments. The synthesis is broad enough to cover the main threads without obvious gaps in the abstract's description. Credit for keeping the scope practical and for noting that finance moves slower than general LLM research. The main weakness is the framing. The abstract points out that finance demands cautious integration and long-term validation, then claims the survey bridges the adoption gap by synthesizing studies and outlining opportunities. A review does not perform or organize that validation; it describes what others have done. Without a mechanism that turns the overview into testable steps or industry-ready checks, the bridging claim rests on the premise that awareness alone changes practice, which the paper does not demonstrate. No new data, no reproduced results, and no formal criteria for paper selection are mentioned in the provided text, so the coverage feels more illustrative than exhaustive. This is the kind of paper that helps a reader who is already working in the intersection and wants pointers, or a practitioner who needs a quick entry point. It does not move the technical frontier or resolve the validation bottleneck it identifies. A serious editor could send it to referees as a survey, with the expectation that they will ask for tighter scope and explicit limits on what the synthesis can claim to achieve.

Referee Report

1 major / 0 minor

Summary. The manuscript is a survey paper that reviews recent developments in Large Language Models (LLMs) and explores their applicability to the financial sector. It identifies a gap between LLM research advancements and their adoption in finance, where cautious integration and long-term validation are emphasized, leading to slower implementation. The survey aims to bridge this gap by synthesizing insights from a broad range of studies, highlighting novel LLM methodologies, and outlining future opportunities for LLM applications in finance.

Significance. A well-executed survey could provide a useful resource for researchers and practitioners by consolidating knowledge on LLM applications in finance and suggesting research directions. However, the paper's claim to bridge the adoption gap is questionable because the method described—an overview and synthesis—does not address the long-term validation needs highlighted in the abstract itself.

major comments (1)

[Abstract] Abstract: The abstract states that the survey seeks to bridge the gap by providing a comprehensive overview and synthesizing insights, but it also notes that finance prioritizes 'cautious integration and long-term validation' which causes slower adoption. No mechanism is described for supplying this validation, so the bridging claim rests on the unexamined assumption that an overview alone is sufficient to change industry practice.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed feedback on our survey manuscript. We address the major comment below and indicate the corresponding revision.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states that the survey seeks to bridge the gap by providing a comprehensive overview and synthesizing insights, but it also notes that finance prioritizes 'cautious integration and long-term validation' which causes slower adoption. No mechanism is described for supplying this validation, so the bridging claim rests on the unexamined assumption that an overview alone is sufficient to change industry practice.

Authors: We agree that the current abstract wording can be read as overstating what a survey can achieve. A review cannot itself supply the long-term empirical validation that the finance industry requires; that would necessitate primary studies. We will revise the abstract to state that the survey bridges the identified gap by synthesizing recent LLM developments, highlighting their potential relevance to financial data, and explicitly noting the need for subsequent validation work. This adjustment removes any implication that the overview alone drives industry adoption while preserving the manuscript's intended contribution as a consolidated resource. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper contains no derivations, predictions, or load-bearing self-citations

full rationale

The paper is explicitly a survey whose stated contribution is a synthesis of existing LLM research and its applicability to finance. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described structure. The bridging claim is presented as the purpose of the overview itself rather than a derived result that reduces to its inputs. Self-citations to prior surveys are standard and not invoked as uniqueness theorems or ansatzes that close the argument. The paper therefore carries no circularity burden under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5766 in / 941 out tokens · 51922 ms · 2026-05-23T01:05:58.314162+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages · 26 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Il ge Akkaya, Florencia Leoni Ale- man, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyama l Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 ,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Eval- uating correctness and faithfulness of instruction-follo wing models for question answering

V aibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. Eval- uating correctness and faithfulness of instruction-follo wing models for question answering. arXiv preprint arXiv:2307.16877,

work page arXiv
[3]

Large language models for mathematical reasoning: Progresses and challenges

Janice Ahn, Rishu V erma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157 ,

work page arXiv
[4]

Reproducing and exten ding experiments in behavioral strategy with large language models

Daniel Albert and Stephan Billinger. Reproducing and exten ding experiments in behavioral strategy with large language models. arXiv preprint arXiv:2410.06932 ,

work page arXiv
[5]

Domain adaption of named entity recognition to support credit risk assessment

Julio Cesar Salinas Alvarado, Karin V erspoor, and Timothy B aldwin. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language T echnology Association W orkshop 2015, pp. 84–90,

work page 2015
[6]

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

D Araci. Finbert: Financial sentiment analysis with pre-tr ained language models. arXiv preprint arXiv:1908.10063,

work page internal anchor Pith review Pith/arXiv arXiv 1908
[7]

From numbers to words: Multi- modal bankruptcy prediction using the ecl dataset

Henri Arno, Klaas Mulier, Joke Baeck, and Thomas Demeester. From numbers to words: Multi- modal bankruptcy prediction using the ecl dataset. arXiv preprint arXiv:2401.12652 ,

work page arXiv
[8]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau. Neural machine translation by jointly le arning to align and translate. arXiv preprint arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Fine-tuning language models for predicting the imp act of events associated to ﬁnancial news articles

Neelabha Banerjee, Anubhav Sarkar, Swagata Chakraborty, S ohom Ghosh, and Sudip Kumar Naskar. Fine-tuning language models for predicting the imp act of events associated to ﬁnancial news articles. In Proceedings of the Joint W orkshop of the 7th Financial T echnology and Natural Language Processing, the 5th Knowledge Discovery from Unst ructured Data in F...

work page 2024
[10]

Fintral: A family of gpt-4 level multi- modal financial large language models

Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu, an d Muhammad Abdul-Mageed. Fintral: A family of gpt-4 level multimodal ﬁnancial large l anguage models. arXiv preprint arXiv:2402.10986,

work page arXiv
[11]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jare d D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda As kell, et al. Language models are few-shot learners. Advances in neural information processing systems , 33:1877–1901,

work page 1901
[12]

Risklabs: Predicting ﬁna ncial risk using large language model based on multi-sources data

13 Y upeng Cao, Zhi Chen, Qingyun Pei, Fabrizio Dimino, LorenzoAusiello, Prashant Kumar, KP Sub- balakshmi, and Papa Momar Ndiaye. Risklabs: Predicting ﬁna ncial risk using large language model based on multi-sources data. arXiv preprint arXiv:2404.07452 , 2024a. Y upeng Cao, Zhiyuan Y ao, Zhi Chen, and Zhiyang Deng. Catmemo at the ﬁnllm challenge task: F...

work page arXiv 2003
[13]

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan, Weize Chen, Y usheng Su, Jianxuan Y u, Wei Xue, S hanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluator s through multi-agent debate. arXiv preprint arXiv:2308.07201,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Fintextqa: A dataset for long-form ﬁnancial ques tion answering

Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, and Jun- wei Liang. Fintextqa: A dataset for long-form ﬁnancial ques tion answering. arXiv preprint arXiv:2405.09980, 2024a. Tianyu Chen, Yiming Zhang, Guoxin Y u, Dapeng Zhang, Li Zeng, Qing He, and Xiang Ao. Efsa: Towards event-level ﬁnancial sentiment analysis. arXiv prep...

work page arXiv
[15]

Convﬁnqa: Exploring the chain of numerical reasoning in con versational ﬁnance question an- swering

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameen a Shah, and William Y ang Wang. Convﬁnqa: Exploring the chain of numerical reasoning in con versational ﬁnance question an- swering. arXiv preprint arXiv:2210.03849 ,

work page arXiv
[16]

C hatgpt informed graph neural network for stock movement prediction

Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Y uan, and Di Zhu. C hatgpt informed graph neural network for stock movement prediction. arXiv preprint arXiv:2306.03763 , 2023b. Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey o n deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Transactions on Pattern Analysis...

work page arXiv
[17]

A closer look into using la rge language models for automatic evaluation

Cheng-Han Chiang and Hung-yi Lee. A closer look into using la rge language models for automatic evaluation. In Findings of the Association for Computational Linguistics : EMNLP 2023 , pp. 8928–8942,

work page 2023
[18]

Data-centric ﬁnancial large language models

Zhixuan Chu, Huaiyu Guo, Xinyuan Zhou, Yijia Wang, Fei Y u, Ho ng Chen, Wanqing Xu, Xin Lu, Qing Cui, Longfei Li, et al. Data-centric ﬁnancial large language models. arXiv preprint arXiv:2310.17784,

work page arXiv
[19]

Beyond demogra phics: aligning role-playing llm-based agents using human belief networks

14 Y un-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford, Agam Goyal, Vincent V Frigo, Sijia Y ang, Dhavan Shah, Junjie Hu, and Timothy T Rogers. Beyond demogra phics: aligning role-playing llm-based agents using human belief networks. arXiv preprint arXiv:2406.17232 ,

work page arXiv
[20]

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

K Clark. Electra: Pre-training text encoders as discrimina tors rather than generators. arXiv preprint arXiv:2003.10555,

work page internal anchor Pith review Pith/arXiv arXiv 2003
[21]

Mathsensei: A tool- augmented large language model for mathematical reasoning

Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Ku lkarni. Mathsensei: A tool- augmented large language model for mathematical reasoning . arXiv preprint arXiv:2402.17231 ,

work page arXiv
[22]

Paciﬁc: towards proac- tive conversational question answering over tabular and te xtual data in ﬁnance

Y ang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, and Tat-Seng Chua. Paciﬁc: towards proac- tive conversational question answering over tabular and te xtual data in ﬁnance. arXiv preprint arXiv:2210.08817,

work page arXiv
[23]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin. Bert: Pre-training of deep bidirectional tra nsformers for language understanding. arXiv preprint arXiv:1810.04805 ,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Integrating stock features and global information via larg e language models for enhanced stock return prediction

Y ujie Ding, Shuai Jia, Tianyi Ma, Bingcheng Mao, Xiuze Zhou, Liuliu Li, and Dongming Han. Integrating stock features and global information via larg e language models for enhanced stock return prediction. arXiv preprint arXiv:2310.05627 ,

work page arXiv
[25]

Multiling 2019: Financial narrative summa risation

Mahmoud El-Haj. Multiling 2019: Financial narrative summa risation. In Proceedings of the W ork- shop MultiLing 2019: Summarization Across Languages, Genr es and Sources, pp. 6–10,

work page 2019
[26]

The ﬁnancial narrative summarisation shared task (fns 2020)

Mahmoud El-Haj, Marina Litvak, Nikiforos Pittaras, George Giannakopoulos, et al. The ﬁnancial narrative summarisation shared task (fns 2020). In Proceedings of the 1st Joint W orkshop on Financial Narrative Processing and MultiLing Financial Su mmarisation, pp. 1–12,

work page 2020
[27]

Can large language models beat wall street? unveiling the potential o f ai in stock selection

Georgios Fatouros, Konstantinos Metaxas, John Soldatos, a nd Dimosthenis Kyriazis. Can large language models beat wall street? unveiling the potential o f ai in stock selection. arXiv preprint arXiv:2401.03737,

work page arXiv
[28]

Empowering many, biasing a few: Generalist credit scoring through large language models

Duanyu Feng, Y ongfu Dai, Jimin Huang, Yifang Zhang, Qianqia n Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, and Hao Wang. Empowering many, biasing a few: Generalist credit scoring through large language models. arXiv preprint arXiv:2310.00566 , 2023a. Duanyu Feng, Y ongfu Dai, Jimin Huang, Yifang Zhang, Qianqia n Xie, Weiguang Han, Zhengyu Chen,...

work page arXiv
[29]

N icer than humans: How do large lan- guage models behave in the prisoner’s dilemma? arXiv preprint arXiv:2406.13605 ,

Nicol´ o Fontana, Francesco Pierri, and Luca Maria Aiello. N icer than humans: How do large lan- guage models behave in the prisoner’s dilemma? arXiv preprint arXiv:2406.13605 ,

work page arXiv
[30]

Stream of search (sos): Learning to search in language

Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winso n Cheng, Archit Sharma, and Noah D Goodman. Stream of search (sos): Learning to search in language. arXiv preprint arXiv:2404.03683,

work page arXiv
[31]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yi ming Y ang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pp. 10764–10799. PMLR, 2023a. Shen Gao, Y untao Wen, Minghang Zhu, Jianing Wei, Y uhan Cheng , Qunzi Zhang, and Shuo Shang. Simulating ﬁnancial market via large lang...

work page arXiv
[32]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y unfan Gao, Y un Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Y uxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large lan guage models: A survey. arXiv preprint arXiv:2312.10997, 2023b. Prashant Garg and Thiemo Fetzer. Causal claims in economics . Technical report, I4R Discussion Paper Series,

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Multilingual and cross-lingual intent detection from spoken data

Daniela Gerz, Pei-Hao Su, Razvan Kusztos, Avishek Mondal, M ichał Lis, Eshan Singhal, Nikola Mrkˇ si´ c, Tsung-Hsien Wen, and Ivan Vuli´ c. Multilingual and cross-lingual intent detection from spoken data. arXiv preprint arXiv:2104.08524 ,

work page arXiv
[34]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Y eyun Gong, Y elong Shen, Y ujiu Y ang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for ma thematical problem solving. arXiv preprint arXiv:2309.17452,

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural ne tworks. arXiv preprint arXiv:1308.0850,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Econnli: Evaluating large language mode ls on economics reasoning

Y ue Guo and Yi Y ang. Econnli: Evaluating large language mode ls on economics reasoning. arXiv preprint arXiv:2407.01212,

work page arXiv
[37]

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. D eberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 ,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[38]

Tool documentation enabl es zero-shot tool-usage with large language models

Cheng-Y u Hsieh, Si-An Chen, Chun-Liang Li, Y asuhisa Fujii, Alexander Ratner, Chen-Y u Lee, Ranjay Krishna, and Tomas Pﬁster. Tool documentation enabl es zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675 ,

work page arXiv
[39]

Chain-of- symbol prompting elicits planning in large langauge models

16 Hanxu Hu, Hongyuan Lu, Huajian Zhang, Y un-Ze Song, Wai Lam, a nd Y ue Zhang. Chain-of- symbol prompting elicits planning in large langauge models . arXiv preprint arXiv:2305.10276 ,

work page arXiv
[40]

Evaluating retrieval- augmented generation models for ﬁnancial report question a nd answering

Ivan Iaroshev, Ramalingam Pillai, Leandro V aglietti, and T homas Hanne. Evaluating retrieval- augmented generation models for ﬁnancial report question a nd answering. Applied Sciences (2076-3417), 14(20),

work page 2076
[41]

Large language model adaptation for ﬁnancial sentiment analysis

Pau Rodriguez Inserte, Mariam Nakhl´ e, Raheel Qader, Ga¨ etan Caillaut, and Jingshu Liu. Large language model adaptation for ﬁnancial sentiment analysis . arXiv preprint arXiv:2401.14777 ,

work page arXiv
[42]

FinanceBench: A New Benchmark for Financial Question Answering

Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, N ino Scherrer, and Bertie Vid- gen. Financebench: A new benchmark for ﬁnancial question an swering. arXiv preprint arXiv:2311.11944,

work page internal anchor Pith review Pith/arXiv arXiv
[43]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. O penai o1 system card. arXiv preprint arXiv:2412.16720,

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Active retrieval augmented generation,

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, J ane Dwivedi-Y u, Yiming Y ang, Jamie Callan, and Graham Neubig. Active retrieval augmente d generation. arXiv preprint arXiv:2305.06983,

work page arXiv
[45]

Multiﬁn: A dataset for multilingual ﬁnancial nlp

Rasmus Jørgensen, Oliver Brandt, Mareike Hartmann, Xiang D ai, Christian Igel, and Desmond Elliott. Multiﬁn: A dataset for multilingual ﬁnancial nlp. In Findings of the Association for Computational Linguistics: EACL 2023 , pp. 894–909,

work page 2023
[46]

From transcripts to insights: Uncovering corpo- rate risks using generative ai

Alex Kim, Maximilian Muhn, and V aleri Nikolaev. From transcripts to insights: Uncovering corpo- rate risks using generative ai. arXiv preprint arXiv:2310.17721 ,

work page arXiv
[47]

Financial statement analysis with large language models

Alex Kim, Maximilian Muhn, and V aleri Nikolaev. Financial statement analysis with large language models. arXiv preprint arXiv:2407.17866 , 2024a. Alex Kim, Maximilian Muhn, and V aleri V Nikolaev. Bloated disclosures: can chatgpt help investors process information? Chicago Booth Research Paper, (23-07):2023–59, 2024b. Alex Gunwoo Kim and Sangwon Y oon. C...

work page arXiv 2023
[48]

Leveraging s emi-supervised learning on a ﬁnancial-specialized pre-trained language model for mult ilingual esg impact duration and type classiﬁcation

Jungdae Kim, Eunkwang Jeon, and Jeon Sang Hyun. Leveraging s emi-supervised learning on a ﬁnancial-specialized pre-trained language model for mult ilingual esg impact duration and type classiﬁcation. In Proceedings of the Joint W orkshop of the 7th Financial T echnology and Natural Language Processing, the 5th Knowledge Discovery from Unst ructured Data i...

work page 2024
[49]

Can ai with high reasoning ability replicate human-like dec ision making in economic experi- ments? arXiv preprint arXiv:2406.11426 ,

A yato Kitadai, Sinndy Dayana Rico Lugo, Y udai Tsurusaki, Y usuke Fukasawa, and Nariaki Nishino. Can ai with high reasoning ability replicate human-like dec ision making in economic experi- ments? arXiv preprint arXiv:2406.11426 ,

work page arXiv
[50]

Lea rning to generate explainable stock predictions using self-reﬂective large language mod els

Kelvin JL Koa, Y unshan Ma, Ritchie Ng, and Tat-Seng Chua. Lea rning to generate explainable stock predictions using self-reﬂective large language mod els. In Proceedings of the ACM on W eb Conference 2024, pp. 4304–4315,

work page 2024
[51]

Causal inference for banking ﬁnance and insurance a survey

Satyam Kumar, Y elleti Vivek, V adlamani Ravi, and Indranil B ose. Causal inference for banking ﬁnance and insurance a survey. arXiv preprint arXiv:2307.16427 ,

work page arXiv
[52]

Sec-qa: A systematic evaluation corpus for ﬁnancial qa

18 Viet Dac Lai, Michael Krumdick, Charles Lovering, V arshiniReddy, Craig Schmidt, and Chris Tan- ner. Sec-qa: A systematic evaluation corpus for ﬁnancial qa . arXiv preprint arXiv:2406.14394 ,

work page arXiv
[53]

Esg2preem: Au- tomated esg grade assessment framework using pre-trained e nsemble models

Haein Lee, Seon Hong Lee, Heungju Park, Jang Hyun Kim, and Hae Sun Jung. Esg2preem: Au- tomated esg grade assessment framework using pre-trained e nsemble models. Heliyon, 10(4), 2024a. Hanwool Lee, Jonghyun Choi, Sohyeon Kwon, and Sungbum Jung. Easyguide: Esg issue iden- tiﬁcation framework leveraging abilities of generative la rge language models. arXiv...

work page arXiv
[54]

Cfgpt: Chinese financial assistant with large lan- guage model

Jiangtong Li, Y uxuan Bian, Guoxuan Wang, Y ang Lei, Dawei Che ng, Zhijun Ding, and Changjun Jiang. Cfgpt: Chinese ﬁnancial assistant with large langua ge model. arXiv preprint arXiv:2309.10654, 2023a. Lezhi Li, Ting-Y u Chang, and Hai Wang. Multimodal gen-ai for fundamental investment research. arXiv preprint arXiv:2401.06164 , 2023b. Nian Li, Chen Gao, ...

work page arXiv
[55]

Are chatgpt and gpt-4 general-purpose solvers for ﬁna ncial text analytics? a study on several typical tasks

Xianzhi Li, Samuel Chan, Xiaodan Zhu, Y ulong Pei, Zhiqiang M a, Xiaomo Liu, and Sameena Shah. Are chatgpt and gpt-4 general-purpose solvers for ﬁna ncial text analytics? a study on several typical tasks. arXiv preprint arXiv:2305.05862 , 2023c. Xin Li, Y unfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, De- qiang Jiang, and Xing...

work page arXiv
[56]

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Y ura Burda, Harri Edward s, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050,

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Tab-cqa: A tabular co nversational question answering dataset on ﬁnancial reports

Chuang Liu, Junzhuo Li, and Deyi Xiong. Tab-cqa: A tabular co nversational question answering dataset on ﬁnancial reports. In Proceedings of the 61st Annual Meeting of the Association fo r Computational Linguistics (V olume 5: Industry Track), pp. 196–207, 2023a. Pengfei Liu, Weizhe Y uan, Jinlan Fu, Zhengbao Jiang, Hiroak i Hayashi, and Graham Neubig. Pr...

work page arXiv 1907
[58]

Confronting m achine learning with ﬁnancial research

Kristof Lommers, Ouns El Harzli, and Jack Kim. Confronting m achine learning with ﬁnancial research. arXiv preprint arXiv:2103.00366 ,

work page arXiv
[59]

Can chatgpt forecast stock price movements? return pre- dictability and large language models

Alejandro Lopez-Lira and Y uehua Tang. Can chatgpt forecast stock price movements? return pre- dictability and large language models. arXiv preprint arXiv:2304.07619 ,

work page arXiv
[60]

Finer: Finan cial numeric entity recognition for xbrl tagging

Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eiri ni Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, and Georgios Paliouras. Finer: Finan cial numeric entity recognition for xbrl tagging. arXiv preprint arXiv:2203.06482 ,

work page arXiv
[61]

Blending is all you need: Cheaper, better alterna- tive to trillion-parameters llm.arXiv preprint arXiv:2401.02994,

Xiaoding Lu, Adian Liusie, Vyas Raina, Y uwen Zhang, and Will iam Beauchamp. Blending is all you need: Cheaper, better alternative to trillion-paramet ers llm. arXiv preprint arXiv:2401.02994,

work page arXiv
[62]

Stockgpt: A genai model for stock prediction and tra ding

Dat Mai. Stockgpt: A genai model for stock prediction and tra ding. arXiv preprint arXiv:2404.05101,

work page arXiv
[63]

Champ: A competition-le vel dataset for ﬁne-grained analyses of llms’ mathematical reasoning capabilities

20 Y ujun Mao, Y oon Kim, and Yilun Zhou. Champ: A competition-le vel dataset for ﬁne-grained analyses of llms’ mathematical reasoning capabilities. arXiv preprint arXiv:2401.06961 ,

work page arXiv
[64]

Financial document causality detectio n shared task (ﬁncausal 2020)

Dominique Mariko, Hanna Abi Akl, Estelle Labidurie, Stepha ne Durfort, Hugues De Mazancourt, and Mahmoud El-Haj. Financial document causality detectio n shared task (ﬁncausal 2020). arXiv preprint arXiv:2012.02505,

work page arXiv 2020
[65]

UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, a nd Shaﬁq Joty. Unichart: A universal vision-language pretrained model for chart com prehension and reasoning. arXiv preprint arXiv:2305.14761,

work page arXiv
[66]

Large Language Models: A Survey

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chena ghlu, Richard Socher, Xavier Am- atriain, and Jianfeng Gao. Large language models: A survey. arXiv preprint arXiv:2402.06196 ,

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Fine- tuning gemma-7b for enhanced sentiment analysis of ﬁnancia l news headlines

Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Y u, Y uelin Zou, a nd Fangqing Xia. Fine- tuning gemma-7b for enhanced sentiment analysis of ﬁnancia l news headlines. arXiv preprint arXiv:2406.13626,

work page arXiv
[68]

Ectsum: A new benchmark dataset for bullet point summarization of lo ng earnings call transcripts

Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya S harma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Gang uly, Saptarshi Ghosh, et al. Ectsum: A new benchmark dataset for bullet point summarization of lo ng earnings call transcripts. arXiv preprint arXiv:2210.12467,

work page arXiv
[69]

M.; Poor, H

Y uqi Nie, Y axuan Kong, Xiaowen Dong, John M Mulvey, H Vincent Poor, Qingsong Wen, and Stefan Zohren. A survey of large language models for ﬁnancia l applications: Progress, prospects and challenges. arXiv preprint arXiv:2406.11903 ,

work page arXiv
[70]

Multim odal chart retrieval: A compari- son of text, table and image based approaches

Averi Nowak, Francesco Piccinno, and Y asemin Altun. Multim odal chart retrieval: A compari- son of text, table and image based approaches. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computationa l Linguistics: Human Language T echnologies (V olume 1: Long Papers), pp. 5488–5505,

work page 2024
[71]

Comparative analysis of chatgpt and the evolution of language models

Oluwatosin Ogundare and Gustavo Quiros Araya. Comparative analysis of chatgpt and the evolution of language models. arXiv preprint arXiv:2304.02468 ,

work page arXiv
[72]

Deep learning vs

Niall O’Mahony, Sean Campbell, Anderson Carvalho, Suman Ha rapanahalli, Gustavo V elasco Her- nandez, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. In Advances in Computer Vision: Proceedings of the 2019 Comput er Vision Conference (CVC), V olume 1 1, pp. 128–144. Springer,

work page 2019
[73]

ART: Automatic multi-step reasoning and tool-use for large language models

21 Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannane h Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014 ,

work page internal anchor Pith review Pith/arXiv arXiv
[74]

Talm: Tool augmente d language models

Aaron Parisi, Y ao Zhao, and Noah Fiedel. Talm: Tool augmente d language models. arXiv preprint arXiv:2205.12255,

work page arXiv
[75]

Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming

URL https://arxiv.org/abs/2305.07970. Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming. Leveraging bert language models for multi-lingual esg issue identiﬁcation. arXiv preprint arXiv:2309.02189 ,

work page arXiv
[76]

Reasoning with language model prompting: A survey

Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Y unzhi Y ao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Reasoning with language model promp ting: A survey. arXiv preprint arXiv:2212.09597,

work page arXiv
[77]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y ujia Qin, Shihao Liang, Yining Y e, Kunlun Zhu, Lan Y an, Y axiLu, Y ankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large languag e models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 ,

work page internal anchor Pith review Pith/arXiv arXiv
[78]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

P Rajpurkar. Squad: 100,000+ questions for machine compreh ension of text. arXiv preprint arXiv:1606.05250,

work page internal anchor Pith review Pith/arXiv arXiv
[79]

Docﬁnqa: A long-context ﬁnancial reasoning dataset

V arshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, and Chris Tanner. Docﬁnqa: A long-context ﬁnancial reasoning dataset. arXiv preprint arXiv:2401.06915 ,

work page arXiv
[80]

Llm economicus? mapp ing the behavioral biases of llms via utility theory

Jillian Ross, Y oon Kim, and Andrew W Lo. Llm economicus? mapp ing the behavioral biases of llms via utility theory. arXiv preprint arXiv:2408.02784 ,

work page arXiv

Showing first 80 references.

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Il ge Akkaya, Florencia Leoni Ale- man, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyama l Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 ,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Eval- uating correctness and faithfulness of instruction-follo wing models for question answering

V aibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. Eval- uating correctness and faithfulness of instruction-follo wing models for question answering. arXiv preprint arXiv:2307.16877,

work page arXiv

[3] [3]

Large language models for mathematical reasoning: Progresses and challenges

Janice Ahn, Rishu V erma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157 ,

work page arXiv

[4] [4]

Reproducing and exten ding experiments in behavioral strategy with large language models

Daniel Albert and Stephan Billinger. Reproducing and exten ding experiments in behavioral strategy with large language models. arXiv preprint arXiv:2410.06932 ,

work page arXiv

[5] [5]

Domain adaption of named entity recognition to support credit risk assessment

Julio Cesar Salinas Alvarado, Karin V erspoor, and Timothy B aldwin. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language T echnology Association W orkshop 2015, pp. 84–90,

work page 2015

[6] [6]

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

D Araci. Finbert: Financial sentiment analysis with pre-tr ained language models. arXiv preprint arXiv:1908.10063,

work page internal anchor Pith review Pith/arXiv arXiv 1908

[7] [7]

From numbers to words: Multi- modal bankruptcy prediction using the ecl dataset

Henri Arno, Klaas Mulier, Joke Baeck, and Thomas Demeester. From numbers to words: Multi- modal bankruptcy prediction using the ecl dataset. arXiv preprint arXiv:2401.12652 ,

work page arXiv

[8] [8]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau. Neural machine translation by jointly le arning to align and translate. arXiv preprint arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Fine-tuning language models for predicting the imp act of events associated to ﬁnancial news articles

Neelabha Banerjee, Anubhav Sarkar, Swagata Chakraborty, S ohom Ghosh, and Sudip Kumar Naskar. Fine-tuning language models for predicting the imp act of events associated to ﬁnancial news articles. In Proceedings of the Joint W orkshop of the 7th Financial T echnology and Natural Language Processing, the 5th Knowledge Discovery from Unst ructured Data in F...

work page 2024

[10] [10]

Fintral: A family of gpt-4 level multi- modal financial large language models

Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu, an d Muhammad Abdul-Mageed. Fintral: A family of gpt-4 level multimodal ﬁnancial large l anguage models. arXiv preprint arXiv:2402.10986,

work page arXiv

[11] [11]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jare d D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda As kell, et al. Language models are few-shot learners. Advances in neural information processing systems , 33:1877–1901,

work page 1901

[12] [12]

Risklabs: Predicting ﬁna ncial risk using large language model based on multi-sources data

13 Y upeng Cao, Zhi Chen, Qingyun Pei, Fabrizio Dimino, LorenzoAusiello, Prashant Kumar, KP Sub- balakshmi, and Papa Momar Ndiaye. Risklabs: Predicting ﬁna ncial risk using large language model based on multi-sources data. arXiv preprint arXiv:2404.07452 , 2024a. Y upeng Cao, Zhiyuan Y ao, Zhi Chen, and Zhiyang Deng. Catmemo at the ﬁnllm challenge task: F...

work page arXiv 2003

[13] [13]

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan, Weize Chen, Y usheng Su, Jianxuan Y u, Wei Xue, S hanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluator s through multi-agent debate. arXiv preprint arXiv:2308.07201,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Fintextqa: A dataset for long-form ﬁnancial ques tion answering

Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, and Jun- wei Liang. Fintextqa: A dataset for long-form ﬁnancial ques tion answering. arXiv preprint arXiv:2405.09980, 2024a. Tianyu Chen, Yiming Zhang, Guoxin Y u, Dapeng Zhang, Li Zeng, Qing He, and Xiang Ao. Efsa: Towards event-level ﬁnancial sentiment analysis. arXiv prep...

work page arXiv

[15] [15]

Convﬁnqa: Exploring the chain of numerical reasoning in con versational ﬁnance question an- swering

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameen a Shah, and William Y ang Wang. Convﬁnqa: Exploring the chain of numerical reasoning in con versational ﬁnance question an- swering. arXiv preprint arXiv:2210.03849 ,

work page arXiv

[16] [16]

C hatgpt informed graph neural network for stock movement prediction

Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Y uan, and Di Zhu. C hatgpt informed graph neural network for stock movement prediction. arXiv preprint arXiv:2306.03763 , 2023b. Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey o n deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Transactions on Pattern Analysis...

work page arXiv

[17] [17]

A closer look into using la rge language models for automatic evaluation

Cheng-Han Chiang and Hung-yi Lee. A closer look into using la rge language models for automatic evaluation. In Findings of the Association for Computational Linguistics : EMNLP 2023 , pp. 8928–8942,

work page 2023

[18] [18]

Data-centric ﬁnancial large language models

Zhixuan Chu, Huaiyu Guo, Xinyuan Zhou, Yijia Wang, Fei Y u, Ho ng Chen, Wanqing Xu, Xin Lu, Qing Cui, Longfei Li, et al. Data-centric ﬁnancial large language models. arXiv preprint arXiv:2310.17784,

work page arXiv

[19] [19]

Beyond demogra phics: aligning role-playing llm-based agents using human belief networks

14 Y un-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford, Agam Goyal, Vincent V Frigo, Sijia Y ang, Dhavan Shah, Junjie Hu, and Timothy T Rogers. Beyond demogra phics: aligning role-playing llm-based agents using human belief networks. arXiv preprint arXiv:2406.17232 ,

work page arXiv

[20] [20]

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

K Clark. Electra: Pre-training text encoders as discrimina tors rather than generators. arXiv preprint arXiv:2003.10555,

work page internal anchor Pith review Pith/arXiv arXiv 2003

[21] [21]

Mathsensei: A tool- augmented large language model for mathematical reasoning

Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Ku lkarni. Mathsensei: A tool- augmented large language model for mathematical reasoning . arXiv preprint arXiv:2402.17231 ,

work page arXiv

[22] [22]

Paciﬁc: towards proac- tive conversational question answering over tabular and te xtual data in ﬁnance

Y ang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, and Tat-Seng Chua. Paciﬁc: towards proac- tive conversational question answering over tabular and te xtual data in ﬁnance. arXiv preprint arXiv:2210.08817,

work page arXiv

[23] [23]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin. Bert: Pre-training of deep bidirectional tra nsformers for language understanding. arXiv preprint arXiv:1810.04805 ,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Integrating stock features and global information via larg e language models for enhanced stock return prediction

Y ujie Ding, Shuai Jia, Tianyi Ma, Bingcheng Mao, Xiuze Zhou, Liuliu Li, and Dongming Han. Integrating stock features and global information via larg e language models for enhanced stock return prediction. arXiv preprint arXiv:2310.05627 ,

work page arXiv

[25] [25]

Multiling 2019: Financial narrative summa risation

Mahmoud El-Haj. Multiling 2019: Financial narrative summa risation. In Proceedings of the W ork- shop MultiLing 2019: Summarization Across Languages, Genr es and Sources, pp. 6–10,

work page 2019

[26] [26]

The ﬁnancial narrative summarisation shared task (fns 2020)

Mahmoud El-Haj, Marina Litvak, Nikiforos Pittaras, George Giannakopoulos, et al. The ﬁnancial narrative summarisation shared task (fns 2020). In Proceedings of the 1st Joint W orkshop on Financial Narrative Processing and MultiLing Financial Su mmarisation, pp. 1–12,

work page 2020

[27] [27]

Can large language models beat wall street? unveiling the potential o f ai in stock selection

Georgios Fatouros, Konstantinos Metaxas, John Soldatos, a nd Dimosthenis Kyriazis. Can large language models beat wall street? unveiling the potential o f ai in stock selection. arXiv preprint arXiv:2401.03737,

work page arXiv

[28] [28]

Empowering many, biasing a few: Generalist credit scoring through large language models

Duanyu Feng, Y ongfu Dai, Jimin Huang, Yifang Zhang, Qianqia n Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, and Hao Wang. Empowering many, biasing a few: Generalist credit scoring through large language models. arXiv preprint arXiv:2310.00566 , 2023a. Duanyu Feng, Y ongfu Dai, Jimin Huang, Yifang Zhang, Qianqia n Xie, Weiguang Han, Zhengyu Chen,...

work page arXiv

[29] [29]

N icer than humans: How do large lan- guage models behave in the prisoner’s dilemma? arXiv preprint arXiv:2406.13605 ,

Nicol´ o Fontana, Francesco Pierri, and Luca Maria Aiello. N icer than humans: How do large lan- guage models behave in the prisoner’s dilemma? arXiv preprint arXiv:2406.13605 ,

work page arXiv

[30] [30]

Stream of search (sos): Learning to search in language

Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winso n Cheng, Archit Sharma, and Noah D Goodman. Stream of search (sos): Learning to search in language. arXiv preprint arXiv:2404.03683,

work page arXiv

[31] [31]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yi ming Y ang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pp. 10764–10799. PMLR, 2023a. Shen Gao, Y untao Wen, Minghang Zhu, Jianing Wei, Y uhan Cheng , Qunzi Zhang, and Shuo Shang. Simulating ﬁnancial market via large lang...

work page arXiv

[32] [32]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y unfan Gao, Y un Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Y uxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large lan guage models: A survey. arXiv preprint arXiv:2312.10997, 2023b. Prashant Garg and Thiemo Fetzer. Causal claims in economics . Technical report, I4R Discussion Paper Series,

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Multilingual and cross-lingual intent detection from spoken data

Daniela Gerz, Pei-Hao Su, Razvan Kusztos, Avishek Mondal, M ichał Lis, Eshan Singhal, Nikola Mrkˇ si´ c, Tsung-Hsien Wen, and Ivan Vuli´ c. Multilingual and cross-lingual intent detection from spoken data. arXiv preprint arXiv:2104.08524 ,

work page arXiv

[34] [34]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Y eyun Gong, Y elong Shen, Y ujiu Y ang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for ma thematical problem solving. arXiv preprint arXiv:2309.17452,

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural ne tworks. arXiv preprint arXiv:1308.0850,

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Econnli: Evaluating large language mode ls on economics reasoning

Y ue Guo and Yi Y ang. Econnli: Evaluating large language mode ls on economics reasoning. arXiv preprint arXiv:2407.01212,

work page arXiv

[37] [37]

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. D eberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 ,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[38] [38]

Tool documentation enabl es zero-shot tool-usage with large language models

Cheng-Y u Hsieh, Si-An Chen, Chun-Liang Li, Y asuhisa Fujii, Alexander Ratner, Chen-Y u Lee, Ranjay Krishna, and Tomas Pﬁster. Tool documentation enabl es zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675 ,

work page arXiv

[39] [39]

Chain-of- symbol prompting elicits planning in large langauge models

16 Hanxu Hu, Hongyuan Lu, Huajian Zhang, Y un-Ze Song, Wai Lam, a nd Y ue Zhang. Chain-of- symbol prompting elicits planning in large langauge models . arXiv preprint arXiv:2305.10276 ,

work page arXiv

[40] [40]

Evaluating retrieval- augmented generation models for ﬁnancial report question a nd answering

Ivan Iaroshev, Ramalingam Pillai, Leandro V aglietti, and T homas Hanne. Evaluating retrieval- augmented generation models for ﬁnancial report question a nd answering. Applied Sciences (2076-3417), 14(20),

work page 2076

[41] [41]

Large language model adaptation for ﬁnancial sentiment analysis

Pau Rodriguez Inserte, Mariam Nakhl´ e, Raheel Qader, Ga¨ etan Caillaut, and Jingshu Liu. Large language model adaptation for ﬁnancial sentiment analysis . arXiv preprint arXiv:2401.14777 ,

work page arXiv

[42] [42]

FinanceBench: A New Benchmark for Financial Question Answering

Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, N ino Scherrer, and Bertie Vid- gen. Financebench: A new benchmark for ﬁnancial question an swering. arXiv preprint arXiv:2311.11944,

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. O penai o1 system card. arXiv preprint arXiv:2412.16720,

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Active retrieval augmented generation,

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, J ane Dwivedi-Y u, Yiming Y ang, Jamie Callan, and Graham Neubig. Active retrieval augmente d generation. arXiv preprint arXiv:2305.06983,

work page arXiv

[45] [45]

Multiﬁn: A dataset for multilingual ﬁnancial nlp

Rasmus Jørgensen, Oliver Brandt, Mareike Hartmann, Xiang D ai, Christian Igel, and Desmond Elliott. Multiﬁn: A dataset for multilingual ﬁnancial nlp. In Findings of the Association for Computational Linguistics: EACL 2023 , pp. 894–909,

work page 2023

[46] [46]

From transcripts to insights: Uncovering corpo- rate risks using generative ai

Alex Kim, Maximilian Muhn, and V aleri Nikolaev. From transcripts to insights: Uncovering corpo- rate risks using generative ai. arXiv preprint arXiv:2310.17721 ,

work page arXiv

[47] [47]

Financial statement analysis with large language models

Alex Kim, Maximilian Muhn, and V aleri Nikolaev. Financial statement analysis with large language models. arXiv preprint arXiv:2407.17866 , 2024a. Alex Kim, Maximilian Muhn, and V aleri V Nikolaev. Bloated disclosures: can chatgpt help investors process information? Chicago Booth Research Paper, (23-07):2023–59, 2024b. Alex Gunwoo Kim and Sangwon Y oon. C...

work page arXiv 2023

[48] [48]

Leveraging s emi-supervised learning on a ﬁnancial-specialized pre-trained language model for mult ilingual esg impact duration and type classiﬁcation

Jungdae Kim, Eunkwang Jeon, and Jeon Sang Hyun. Leveraging s emi-supervised learning on a ﬁnancial-specialized pre-trained language model for mult ilingual esg impact duration and type classiﬁcation. In Proceedings of the Joint W orkshop of the 7th Financial T echnology and Natural Language Processing, the 5th Knowledge Discovery from Unst ructured Data i...

work page 2024

[49] [49]

Can ai with high reasoning ability replicate human-like dec ision making in economic experi- ments? arXiv preprint arXiv:2406.11426 ,

A yato Kitadai, Sinndy Dayana Rico Lugo, Y udai Tsurusaki, Y usuke Fukasawa, and Nariaki Nishino. Can ai with high reasoning ability replicate human-like dec ision making in economic experi- ments? arXiv preprint arXiv:2406.11426 ,

work page arXiv

[50] [50]

Lea rning to generate explainable stock predictions using self-reﬂective large language mod els

Kelvin JL Koa, Y unshan Ma, Ritchie Ng, and Tat-Seng Chua. Lea rning to generate explainable stock predictions using self-reﬂective large language mod els. In Proceedings of the ACM on W eb Conference 2024, pp. 4304–4315,

work page 2024

[51] [51]

Causal inference for banking ﬁnance and insurance a survey

Satyam Kumar, Y elleti Vivek, V adlamani Ravi, and Indranil B ose. Causal inference for banking ﬁnance and insurance a survey. arXiv preprint arXiv:2307.16427 ,

work page arXiv

[52] [52]

Sec-qa: A systematic evaluation corpus for ﬁnancial qa

18 Viet Dac Lai, Michael Krumdick, Charles Lovering, V arshiniReddy, Craig Schmidt, and Chris Tan- ner. Sec-qa: A systematic evaluation corpus for ﬁnancial qa . arXiv preprint arXiv:2406.14394 ,

work page arXiv

[53] [53]

Esg2preem: Au- tomated esg grade assessment framework using pre-trained e nsemble models

Haein Lee, Seon Hong Lee, Heungju Park, Jang Hyun Kim, and Hae Sun Jung. Esg2preem: Au- tomated esg grade assessment framework using pre-trained e nsemble models. Heliyon, 10(4), 2024a. Hanwool Lee, Jonghyun Choi, Sohyeon Kwon, and Sungbum Jung. Easyguide: Esg issue iden- tiﬁcation framework leveraging abilities of generative la rge language models. arXiv...

work page arXiv

[54] [54]

Cfgpt: Chinese financial assistant with large lan- guage model

Jiangtong Li, Y uxuan Bian, Guoxuan Wang, Y ang Lei, Dawei Che ng, Zhijun Ding, and Changjun Jiang. Cfgpt: Chinese ﬁnancial assistant with large langua ge model. arXiv preprint arXiv:2309.10654, 2023a. Lezhi Li, Ting-Y u Chang, and Hai Wang. Multimodal gen-ai for fundamental investment research. arXiv preprint arXiv:2401.06164 , 2023b. Nian Li, Chen Gao, ...

work page arXiv

[55] [55]

Are chatgpt and gpt-4 general-purpose solvers for ﬁna ncial text analytics? a study on several typical tasks

Xianzhi Li, Samuel Chan, Xiaodan Zhu, Y ulong Pei, Zhiqiang M a, Xiaomo Liu, and Sameena Shah. Are chatgpt and gpt-4 general-purpose solvers for ﬁna ncial text analytics? a study on several typical tasks. arXiv preprint arXiv:2305.05862 , 2023c. Xin Li, Y unfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, De- qiang Jiang, and Xing...

work page arXiv

[56] [56]

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Y ura Burda, Harri Edward s, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050,

work page internal anchor Pith review Pith/arXiv arXiv

[57] [57]

Tab-cqa: A tabular co nversational question answering dataset on ﬁnancial reports

Chuang Liu, Junzhuo Li, and Deyi Xiong. Tab-cqa: A tabular co nversational question answering dataset on ﬁnancial reports. In Proceedings of the 61st Annual Meeting of the Association fo r Computational Linguistics (V olume 5: Industry Track), pp. 196–207, 2023a. Pengfei Liu, Weizhe Y uan, Jinlan Fu, Zhengbao Jiang, Hiroak i Hayashi, and Graham Neubig. Pr...

work page arXiv 1907

[58] [58]

Confronting m achine learning with ﬁnancial research

Kristof Lommers, Ouns El Harzli, and Jack Kim. Confronting m achine learning with ﬁnancial research. arXiv preprint arXiv:2103.00366 ,

work page arXiv

[59] [59]

Can chatgpt forecast stock price movements? return pre- dictability and large language models

Alejandro Lopez-Lira and Y uehua Tang. Can chatgpt forecast stock price movements? return pre- dictability and large language models. arXiv preprint arXiv:2304.07619 ,

work page arXiv

[60] [60]

Finer: Finan cial numeric entity recognition for xbrl tagging

Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eiri ni Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, and Georgios Paliouras. Finer: Finan cial numeric entity recognition for xbrl tagging. arXiv preprint arXiv:2203.06482 ,

work page arXiv

[61] [61]

Blending is all you need: Cheaper, better alterna- tive to trillion-parameters llm.arXiv preprint arXiv:2401.02994,

Xiaoding Lu, Adian Liusie, Vyas Raina, Y uwen Zhang, and Will iam Beauchamp. Blending is all you need: Cheaper, better alternative to trillion-paramet ers llm. arXiv preprint arXiv:2401.02994,

work page arXiv

[62] [62]

Stockgpt: A genai model for stock prediction and tra ding

Dat Mai. Stockgpt: A genai model for stock prediction and tra ding. arXiv preprint arXiv:2404.05101,

work page arXiv

[63] [63]

Champ: A competition-le vel dataset for ﬁne-grained analyses of llms’ mathematical reasoning capabilities

20 Y ujun Mao, Y oon Kim, and Yilun Zhou. Champ: A competition-le vel dataset for ﬁne-grained analyses of llms’ mathematical reasoning capabilities. arXiv preprint arXiv:2401.06961 ,

work page arXiv

[64] [64]

Financial document causality detectio n shared task (ﬁncausal 2020)

Dominique Mariko, Hanna Abi Akl, Estelle Labidurie, Stepha ne Durfort, Hugues De Mazancourt, and Mahmoud El-Haj. Financial document causality detectio n shared task (ﬁncausal 2020). arXiv preprint arXiv:2012.02505,

work page arXiv 2020

[65] [65]

UniChart: A universal vision-language pretrained model for chart comprehension and reasoning

Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, a nd Shaﬁq Joty. Unichart: A universal vision-language pretrained model for chart com prehension and reasoning. arXiv preprint arXiv:2305.14761,

work page arXiv

[66] [66]

Large Language Models: A Survey

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chena ghlu, Richard Socher, Xavier Am- atriain, and Jianfeng Gao. Large language models: A survey. arXiv preprint arXiv:2402.06196 ,

work page internal anchor Pith review Pith/arXiv arXiv

[67] [67]

Fine- tuning gemma-7b for enhanced sentiment analysis of ﬁnancia l news headlines

Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Y u, Y uelin Zou, a nd Fangqing Xia. Fine- tuning gemma-7b for enhanced sentiment analysis of ﬁnancia l news headlines. arXiv preprint arXiv:2406.13626,

work page arXiv

[68] [68]

Ectsum: A new benchmark dataset for bullet point summarization of lo ng earnings call transcripts

Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya S harma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Gang uly, Saptarshi Ghosh, et al. Ectsum: A new benchmark dataset for bullet point summarization of lo ng earnings call transcripts. arXiv preprint arXiv:2210.12467,

work page arXiv

[69] [69]

M.; Poor, H

Y uqi Nie, Y axuan Kong, Xiaowen Dong, John M Mulvey, H Vincent Poor, Qingsong Wen, and Stefan Zohren. A survey of large language models for ﬁnancia l applications: Progress, prospects and challenges. arXiv preprint arXiv:2406.11903 ,

work page arXiv

[70] [70]

Multim odal chart retrieval: A compari- son of text, table and image based approaches

Averi Nowak, Francesco Piccinno, and Y asemin Altun. Multim odal chart retrieval: A compari- son of text, table and image based approaches. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computationa l Linguistics: Human Language T echnologies (V olume 1: Long Papers), pp. 5488–5505,

work page 2024

[71] [71]

Comparative analysis of chatgpt and the evolution of language models

Oluwatosin Ogundare and Gustavo Quiros Araya. Comparative analysis of chatgpt and the evolution of language models. arXiv preprint arXiv:2304.02468 ,

work page arXiv

[72] [72]

Deep learning vs

Niall O’Mahony, Sean Campbell, Anderson Carvalho, Suman Ha rapanahalli, Gustavo V elasco Her- nandez, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. In Advances in Computer Vision: Proceedings of the 2019 Comput er Vision Conference (CVC), V olume 1 1, pp. 128–144. Springer,

work page 2019

[73] [73]

ART: Automatic multi-step reasoning and tool-use for large language models

21 Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannane h Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014 ,

work page internal anchor Pith review Pith/arXiv arXiv

[74] [74]

Talm: Tool augmente d language models

Aaron Parisi, Y ao Zhao, and Noah Fiedel. Talm: Tool augmente d language models. arXiv preprint arXiv:2205.12255,

work page arXiv

[75] [75]

Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming

URL https://arxiv.org/abs/2305.07970. Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming. Leveraging bert language models for multi-lingual esg issue identiﬁcation. arXiv preprint arXiv:2309.02189 ,

work page arXiv

[76] [76]

Reasoning with language model prompting: A survey

Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Y unzhi Y ao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Reasoning with language model promp ting: A survey. arXiv preprint arXiv:2212.09597,

work page arXiv

[77] [77]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y ujia Qin, Shihao Liang, Yining Y e, Kunlun Zhu, Lan Y an, Y axiLu, Y ankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large languag e models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 ,

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

P Rajpurkar. Squad: 100,000+ questions for machine compreh ension of text. arXiv preprint arXiv:1606.05250,

work page internal anchor Pith review Pith/arXiv arXiv

[79] [79]

Docﬁnqa: A long-context ﬁnancial reasoning dataset

V arshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, and Chris Tanner. Docﬁnqa: A long-context ﬁnancial reasoning dataset. arXiv preprint arXiv:2401.06915 ,

work page arXiv

[80] [80]

Llm economicus? mapp ing the behavioral biases of llms via utility theory

Jillian Ross, Y oon Kim, and Andrew W Lo. Llm economicus? mapp ing the behavioral biases of llms via utility theory. arXiv preprint arXiv:2408.02784 ,

work page arXiv