LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

arxiv: 2605.19714 · v1 · pith:PAYYMIOVnew · submitted 2026-05-19 · 💻 cs.CL

LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

Mona H. Albaqawi , Eman M. Albalkhi , Joud A. Albaiti , Enrico Lopedoto This is my paper

Pith reviewed 2026-05-20 05:24 UTC · model grok-4.3

classification 💻 cs.CL

keywords Arabic NLPfinancial sentiment analysisSaudi stock marketentity linkingsentiment annotationinvestor sentimentsocial media analysisnews corpus

0 comments p. Extension

pith:PAYYMIOV Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{PAYYMIOV}

Prints a linked pith:PAYYMIOV badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

A multi-stage pipeline builds an 84,000-sample Arabic financial sentiment dataset supporting company-level analysis on the Saudi Exchange.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an Arabic NLP framework for financial sentiment analysis tailored to Saudi markets by combining official financial news and social media data. It uses a multi-stage process of collection, cleaning, deduplication, entity linking via transformer NER and lexicon, and five-class sentiment annotation to create a large corpus. The resulting 84K samples enable aggregation of sentiment at the company level and examination of how sentiment relates to stock market behavior. Experimental results indicate that this approach delivers reliable and scalable sentiment analysis in Arabic financial contexts.

Core claim

By integrating official financial news and social media through a multi-stage pipeline of data collection, cleaning, deduplication, entity linking with transformer-based NER plus a curated company lexicon, and five-class sentiment annotation, the authors construct a dataset of 84K samples that supports company-level sentiment aggregation and analysis of sentiment dynamics relative to stock market behavior on the Saudi Exchange, with experiments demonstrating reliable and scalable Arabic financial sentiment analysis.

What carries the argument

The multi-stage pipeline for Arabic financial corpus construction, with transformer-based NER for entity linking to canonical company identifiers combined with five-class sentiment labeling.

If this is right

Sentiment aggregation becomes possible at the level of individual companies listed on the Saudi Exchange.
Sentiment dynamics can be tracked over time in relation to actual stock market movements.
The framework provides a scalable method for financial sentiment analysis in Arabic without relying solely on English resources.
Both institutional investor sentiment from news and public sentiment from social media can be captured and compared.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a dataset could enable the development of Arabic-specific predictive models for stock price movements based on sentiment signals.
Similar pipelines could be applied to other Arabic financial markets to build comparable resources.
The work underscores the importance of language-specific entity linking and annotation for accurate sentiment in financial texts.

Load-bearing premise

The multi-stage pipeline including automated entity linking and sentiment annotation produces labels that truly represent investor sentiment in Arabic financial texts.

What would settle it

If a random sample of the dataset is manually labeled by Arabic-speaking financial experts and shows substantial disagreement with the automated five-class labels, that would undermine the claim of reliable analysis.

Figures

Figures reproduced from arXiv: 2605.19714 by Eman M. Albalkhi, Enrico Lopedoto, Joud A. Albaiti, Mona H. Albaqawi.

**Figure 3.** Figure 3: Hallucination distributions across datasets: News (left) and Social Media (right). [5]. Given the dataset scale (84K samples), manual annotation was impractical. To mitigate bias and enhance reliability, a multi-stage automated labeling framework was employed, progressively refining label quality through model comparison and agreement analysis. The final labeling strategy relied on multiple high-capacit… view at source ↗

**Figure 4.** Figure 4: Correlation matrix of sentiment outputs across evaluated models on the News dataset, highlighting inter-model agreement patterns [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: illustrates the multi-stage consensus labeling process adopted in this study. 6. Results and Discussion 6.1. Benchmark Results Models are evaluated using Accuracy and MacroF1 for class-balanced robustness [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: summarizes the cost–quality trade-off across the evaluated models using Macro-F1 as a class-balanced performance metric. Statistical Significance Testing Paired t-tests confirm performance differences are statistically significant: GPT-5’s Macro-F1 (0.829) exceeds DeepSeek R1 Reasoner (0.739) at p < 0.01 (t = 3.47), and DeepSeek R1 Reasoner outperforms DeepSeek R1 Chat (0.360) at p < 0.001 (t = 8.92), conf… view at source ↗

read the original abstract

Investor sentiment shapes financial markets, yet modeling sentiment in Arabic financial contexts remains challenging due to linguistic complexity and limited resources. We present an Arabic NLP framework for large-scale financial sentiment analysis tailored to the Saudi market, integrating official financial news and social media to capture institutional and public investor sentiment. The framework constructs a large Arabic financial corpus through a multi-stage pipeline encompassing data collection, cleaning, deduplication, entity linking, and sentiment annotation. Transformer-based NER combined with a curated company lexicon links textual mentions to canonical company identifiers, with sentiment labels assigned using a five-class scheme. The resulting dataset of 84K samples supports company-level sentiment aggregation and analysis of sentiment dynamics relative to stock market behavior on the Saudi Exchange. Experimental results demonstrate reliable and scalable Arabic financial sentiment analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They built a new 84K Arabic financial sentiment corpus for Saudi companies and markets, but the reliability claims rest on an unvalidated labeling pipeline.

read the letter

The main takeaway is that this paper delivers a concrete new resource: an 84K-sample Arabic dataset drawn from financial news and social media, linked to Saudi companies via transformer NER plus a lexicon, and labeled with a five-class sentiment scheme that is then aggregated to track stock behavior on the Saudi Exchange. That fills a visible gap in non-English financial NLP and gives researchers something specific to work with for this regional market. The pipeline description itself is straightforward and practical, covering collection, cleaning, deduplication, entity linking, and annotation in one flow. The choice to mix official and public sources to capture both institutional and retail sentiment also makes sense for the domain. What is actually new is the tailored corpus and the entity-linking step tuned to Saudi firms rather than a general Arabic financial model. The paper does a reasonable job laying out the data-construction steps and showing how the labels can support company-level analysis. The soft spot is exactly the one the stress test flags. The abstract calls the results reliable and scalable, yet the provided text gives no accuracy numbers, inter-annotator agreement, expert validation, or baseline comparisons for the five-class sentiment step. Without those checks, the downstream aggregation and market-dynamics claims sit on unverified label quality. If the full paper contains held-out evaluations or human review details, they are not visible here and need to be made explicit. This work is aimed at financial NLP researchers who need Arabic or Middle-East market resources and at practitioners who might use the corpus for monitoring. A reader building similar datasets could extract useful pipeline ideas even if the validation is missing. It deserves a serious referee because the resource addresses a real scarcity, but the review should focus on adding quantitative evidence for label quality and simple baselines before any stronger claims are accepted.

Referee Report

2 major / 1 minor

Summary. The paper presents an Arabic NLP framework for large-scale financial sentiment analysis tailored to Saudi markets. It describes a multi-stage pipeline for constructing an 84K-sample dataset from official financial news and social media, using transformer-based NER combined with a company lexicon for entity linking and a five-class scheme for sentiment annotation. The resulting dataset is positioned to enable company-level sentiment aggregation and analysis of sentiment dynamics relative to stock market behavior on the Saudi Exchange, with experimental results claimed to demonstrate reliable and scalable Arabic financial sentiment analysis.

Significance. If the label quality were demonstrated, the work would offer a substantial empirical contribution by filling a resource gap in Arabic financial NLP and enabling new analyses of investor sentiment in an emerging market. The scale of the 84K dataset and the integration of institutional and public sources represent a clear strength in data-construction efforts.

major comments (2)

[Abstract and §3] Abstract and §3 (Methodology): The central claim that the 84K-sample dataset 'supports company-level sentiment aggregation and analysis of sentiment dynamics' and yields 'reliable' results rests on the unverified accuracy of the five-class sentiment annotation step. No accuracy, F1-score, inter-annotator agreement, or expert validation metrics are reported for this component, leaving the downstream aggregation and correlation analyses without grounding.
[§4] §4 (Experiments): The assertion of 'reliable and scalable' performance is stated without any baseline comparisons, error analysis, or quantitative evaluation of the full pipeline on held-out data, which is load-bearing for the claim that the framework advances Arabic financial sentiment analysis.

minor comments (1)

[§3] The description of the five-class sentiment scheme would benefit from an explicit definition or example labels in the text or a table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and constructive feedback on our manuscript. We address each major comment below and outline the revisions we plan to make to strengthen the presentation of our work.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Methodology): The central claim that the 84K-sample dataset 'supports company-level sentiment aggregation and analysis of sentiment dynamics' and yields 'reliable' results rests on the unverified accuracy of the five-class sentiment annotation step. No accuracy, F1-score, inter-annotator agreement, or expert validation metrics are reported for this component, leaving the downstream aggregation and correlation analyses without grounding.

Authors: We agree that quantitative validation of the sentiment annotation step is necessary to ground the downstream claims. The manuscript describes the five-class scheme and its integration into the pipeline but does not include accuracy, F1, or inter-annotator agreement figures. In the revised version we will add a dedicated subsection reporting inter-annotator agreement computed on a stratified sample of annotations, together with expert validation results on a held-out subset, thereby providing the required empirical support for the company-level aggregation analyses. revision: yes
Referee: [§4] §4 (Experiments): The assertion of 'reliable and scalable' performance is stated without any baseline comparisons, error analysis, or quantitative evaluation of the full pipeline on held-out data, which is load-bearing for the claim that the framework advances Arabic financial sentiment analysis.

Authors: The current §4 presents the results of applying the pipeline at scale and initial sentiment-market correlations, yet we acknowledge the absence of explicit baselines, error analysis, and held-out quantitative evaluation. We will revise the section to include (i) comparisons against existing Arabic sentiment baselines, (ii) a detailed error analysis of the full pipeline, and (iii) performance metrics on a held-out test partition, thereby more rigorously substantiating the claims of reliability and scalability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction without derivational reduction

full rationale

The paper presents an empirical multi-stage pipeline for data collection, cleaning, entity linking via transformer NER plus lexicon, and five-class sentiment annotation to produce an 84K-sample Arabic financial corpus. No equations, mathematical derivations, fitted parameters, or predictions are described that could reduce to inputs by construction. Claims about supporting company-level aggregation and sentiment dynamics analysis rest on the pipeline output and experimental results rather than any self-referential loop or self-citation load-bearing premise. This is self-contained empirical work with no load-bearing steps that match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the full ledger cannot be audited. The approach implicitly relies on standard NLP assumptions about data representativeness and model accuracy for Arabic text.

axioms (2)

domain assumption Transformer-based NER combined with a curated lexicon can reliably link Arabic textual mentions to canonical company identifiers.
Invoked in the entity-linking stage of the pipeline.
domain assumption Five-class sentiment annotation on the collected corpus accurately reflects investor sentiment.
Central to labeling and downstream aggregation.

pith-pipeline@v0.9.0 · 5668 in / 1334 out tokens · 59995 ms · 2026-05-20T05:24:34.516021+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

multi-stage pipeline encompassing data collection, cleaning, deduplication, entity linking, and sentiment annotation... five-class scheme
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Transformer-based NER combined with a curated company lexicon

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

[1]

ALLaM: Large language models for arabic and english

Ahmed Abdelali, Maram Hasanain, Hamdy Mubarak, Laura Kallmeyer, Hassan Sajjad, Fahim Dalvi, et al. ALLaM: Large language models for arabic and english. arXiv preprint arXiv:2407.15390, 2024. SDAIA Arabic foun- dation model

work page arXiv 2024
[2]

Ahmad and Shahla U

Hero O. Ahmad and Shahla U. Umar. Senti- ment analysis of financial textual data using machine learning and deep learning models. Informatica, 47(5):153–158, 2023

work page 2023
[3]

Ara- hallueval: A fine-grained hallucination evalua- tion framework for arabic llms

Aisha Alansari and Hamzah Luqman. Ara- hallueval: A fine-grained hallucination evalua- tion framework for arabic llms. arXiv preprint, 2025

work page 2025
[4]

Borsah: A disruptive frame- work for the stock market predictions

Saad M Alshahrani, Said A Salloum, and Khaled Shaalan. Borsah: A disruptive frame- work for the stock market predictions. Inter- national Journal of Information Management , 41:117–129, 2018

work page 2018
[5]

Sentiment analysis in finan- cial news: Enhancing predictive models for stock market behavior

Martins Amola. Sentiment analysis in finan- cial news: Enhancing predictive models for stock market behavior. Preprint, 2025. Avail- able at ResearchGate

work page 2025
[6]

AraBERT : Transformer-based model for ara- bic language understanding

Wissam Antoun, Fady Baly, and Hazem Hajj. AraBERT : Transformer-based model for ara- bic language understanding. In Proceedings of the 4th Workshop on Open-Source Ara- bic Corpora and Processing T ools (OSACT) , pages 9–15. European Language Resources Association (ELRA), 2020

work page 2020
[7]

Finbert: Financial senti- ment analysis with pre-trained language mod- els

Dogu T an Araci. Finbert: Financial senti- ment analysis with pre-trained language mod- els. arXiv preprint, 2019

work page 2019
[8]

A light lexicon-based mobile application for sen- timent mining of Arabic tweets

Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Khaled Shaban, and Wassim El-Hajj. A light lexicon-based mobile application for sen- timent mining of Arabic tweets. In Nizar Habash, Stephan Vogel, and Kareem Dar- wish, editors, Proceedings of the Second Workshop on Arabic Natural Language Pro- cessing, pages 18–25, Beiji...

work page
[9]

Association for Computational Linguis- tics

work page
[10]

A model of investor sentiment

Nicholas Barberis, Andrei Shleifer, and Robert Vishny. A model of investor sentiment. Journal of financial economics , 49(3):307– 343, 1997

work page 1997
[11]

Large language models as annotators: A prelimi- nary evaluation for annotating low-resource language content

Savita Bhat and Vasudeva Varma. Large language models as annotators: A prelimi- nary evaluation for annotating low-resource language content. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems. Association for Computational Linguistics, 2023

work page 2023
[12]

Financial sentiment analysis: Tech- niques and applications

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. Financial sentiment analysis: Tech- niques and applications. ACM Computing Surveys, 56(9):220, 2024

work page 2024
[13]

Arabic named entity recognition using deep learning approach

Ismail El Bazi and Nabil Laachfoubi. Arabic named entity recognition using deep learning approach. International Journal of Electrical and Computer Engineering , 9(3):2025–2032, 2019

work page 2025
[14]

AceGPT : Localizing large language models in arabic

Huang Huang, Fei Zhu, Jianfeng Qin, Yulei T ang, Xuebai Lin, Guo Liu, and Wei Wang. AceGPT : Localizing large language models in arabic. arXiv preprint arXiv:2309.12053 ,

work page arXiv
[15]

Arabic-specialized instruction-tuned model

work page
[16]

The interplay of variant, size, and task type in arabic pre-trained language models

Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, and Nizar Habash. The interplay of variant, size, and task type in arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (WANLP), pages 92–104. Association for Computational Linguistics, 2021. CAMeL - BERT model family

work page 2021
[17]

Llms-as-judges: A comprehen- sive survey on llm-based evaluation methods, 2024

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. Llms-as-judges: A comprehen- sive survey on llm-based evaluation methods, 2024

work page 2024
[18]

Jais and jais-chat: Arabic-centric foundation and instruction-tuned open gener- ative large language models

Neha Sengupta, Sunil Kumar Sharma, Muhammed Masoud, Abbas Akkasi, Karthik Kamur, Shivani Bhatia, Ebtesam Almazrouei, et al. Jais and jais-chat: Arabic-centric foundation and instruction-tuned open gener- ative large language models. arXiv preprint arXiv:2308.16149, 2023. 13B parameter Arabic-centric LLM from Inception/G42

work page arXiv 2023
[19]

Big data: Deep learning for financial sentiment analysis

Sahar Sohangir, Dingding Wang, Anna Pomeranets, and T aghi M Khoshgoftaar. Big data: Deep learning for financial sentiment analysis. Journal of Big Data , 5(1):1–25, 2018

work page 2018
[20]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuur- mans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain-of-thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Bloomberggpt: A large language model for finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. arXiv preprint, 2023. A. Reproducibility A.1. Model Configurations All models were configured with deterministic sam- pling (temperature = 0.0) to ensure reproducibil...

work page 2023
[22]

were not evaluated due to API availability con- straints during the evaluation period. A.2. Production Deployment Requirements Beyond benchmark metrics, models must satisfy the following requirements for production integra- tion:

work page
[23]

Taxonomy Compliance: Output exactly five sentiment classes without category collapse

work page
[24]

Structured Output: Return JSON format with sentiment labels and confidence scores

work page
[25]

Reproducibility: Generate identical predic- tions with deterministic sampling (tempera- ture = 0)

work page
[26]

Latency: Complete inference within 5 min- utes per 1,000 samples

work page
[27]

ﺍܳ(” the stock is experienc- ing technical correction

Cost Eﬀiciency: Maintain inference cost be- low $0.0012 per sample A.3. Dataset Availability The Arabic Financial Sentiment Corpus (AFSC) comprising 84,431 labeled samples will be re- leased under Creative Commons Attribution 4.0 In- ternational License upon acceptance. The dataset includes preprocessed Arabic text, five-class sen- timent labels with conf...

work page

[1] [1]

ALLaM: Large language models for arabic and english

Ahmed Abdelali, Maram Hasanain, Hamdy Mubarak, Laura Kallmeyer, Hassan Sajjad, Fahim Dalvi, et al. ALLaM: Large language models for arabic and english. arXiv preprint arXiv:2407.15390, 2024. SDAIA Arabic foun- dation model

work page arXiv 2024

[2] [2]

Ahmad and Shahla U

Hero O. Ahmad and Shahla U. Umar. Senti- ment analysis of financial textual data using machine learning and deep learning models. Informatica, 47(5):153–158, 2023

work page 2023

[3] [3]

Ara- hallueval: A fine-grained hallucination evalua- tion framework for arabic llms

Aisha Alansari and Hamzah Luqman. Ara- hallueval: A fine-grained hallucination evalua- tion framework for arabic llms. arXiv preprint, 2025

work page 2025

[4] [4]

Borsah: A disruptive frame- work for the stock market predictions

Saad M Alshahrani, Said A Salloum, and Khaled Shaalan. Borsah: A disruptive frame- work for the stock market predictions. Inter- national Journal of Information Management , 41:117–129, 2018

work page 2018

[5] [5]

Sentiment analysis in finan- cial news: Enhancing predictive models for stock market behavior

Martins Amola. Sentiment analysis in finan- cial news: Enhancing predictive models for stock market behavior. Preprint, 2025. Avail- able at ResearchGate

work page 2025

[6] [6]

AraBERT : Transformer-based model for ara- bic language understanding

Wissam Antoun, Fady Baly, and Hazem Hajj. AraBERT : Transformer-based model for ara- bic language understanding. In Proceedings of the 4th Workshop on Open-Source Ara- bic Corpora and Processing T ools (OSACT) , pages 9–15. European Language Resources Association (ELRA), 2020

work page 2020

[7] [7]

Finbert: Financial senti- ment analysis with pre-trained language mod- els

Dogu T an Araci. Finbert: Financial senti- ment analysis with pre-trained language mod- els. arXiv preprint, 2019

work page 2019

[8] [8]

A light lexicon-based mobile application for sen- timent mining of Arabic tweets

Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Khaled Shaban, and Wassim El-Hajj. A light lexicon-based mobile application for sen- timent mining of Arabic tweets. In Nizar Habash, Stephan Vogel, and Kareem Dar- wish, editors, Proceedings of the Second Workshop on Arabic Natural Language Pro- cessing, pages 18–25, Beiji...

work page

[9] [9]

Association for Computational Linguis- tics

work page

[10] [10]

A model of investor sentiment

Nicholas Barberis, Andrei Shleifer, and Robert Vishny. A model of investor sentiment. Journal of financial economics , 49(3):307– 343, 1997

work page 1997

[11] [11]

Large language models as annotators: A prelimi- nary evaluation for annotating low-resource language content

Savita Bhat and Vasudeva Varma. Large language models as annotators: A prelimi- nary evaluation for annotating low-resource language content. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems. Association for Computational Linguistics, 2023

work page 2023

[12] [12]

Financial sentiment analysis: Tech- niques and applications

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. Financial sentiment analysis: Tech- niques and applications. ACM Computing Surveys, 56(9):220, 2024

work page 2024

[13] [13]

Arabic named entity recognition using deep learning approach

Ismail El Bazi and Nabil Laachfoubi. Arabic named entity recognition using deep learning approach. International Journal of Electrical and Computer Engineering , 9(3):2025–2032, 2019

work page 2025

[14] [14]

AceGPT : Localizing large language models in arabic

Huang Huang, Fei Zhu, Jianfeng Qin, Yulei T ang, Xuebai Lin, Guo Liu, and Wei Wang. AceGPT : Localizing large language models in arabic. arXiv preprint arXiv:2309.12053 ,

work page arXiv

[15] [15]

Arabic-specialized instruction-tuned model

work page

[16] [16]

The interplay of variant, size, and task type in arabic pre-trained language models

Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, and Nizar Habash. The interplay of variant, size, and task type in arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (WANLP), pages 92–104. Association for Computational Linguistics, 2021. CAMeL - BERT model family

work page 2021

[17] [17]

Llms-as-judges: A comprehen- sive survey on llm-based evaluation methods, 2024

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. Llms-as-judges: A comprehen- sive survey on llm-based evaluation methods, 2024

work page 2024

[18] [18]

Jais and jais-chat: Arabic-centric foundation and instruction-tuned open gener- ative large language models

Neha Sengupta, Sunil Kumar Sharma, Muhammed Masoud, Abbas Akkasi, Karthik Kamur, Shivani Bhatia, Ebtesam Almazrouei, et al. Jais and jais-chat: Arabic-centric foundation and instruction-tuned open gener- ative large language models. arXiv preprint arXiv:2308.16149, 2023. 13B parameter Arabic-centric LLM from Inception/G42

work page arXiv 2023

[19] [19]

Big data: Deep learning for financial sentiment analysis

Sahar Sohangir, Dingding Wang, Anna Pomeranets, and T aghi M Khoshgoftaar. Big data: Deep learning for financial sentiment analysis. Journal of Big Data , 5(1):1–25, 2018

work page 2018

[20] [20]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuur- mans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain-of-thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Bloomberggpt: A large language model for finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. arXiv preprint, 2023. A. Reproducibility A.1. Model Configurations All models were configured with deterministic sam- pling (temperature = 0.0) to ensure reproducibil...

work page 2023

[22] [22]

were not evaluated due to API availability con- straints during the evaluation period. A.2. Production Deployment Requirements Beyond benchmark metrics, models must satisfy the following requirements for production integra- tion:

work page

[23] [23]

Taxonomy Compliance: Output exactly five sentiment classes without category collapse

work page

[24] [24]

Structured Output: Return JSON format with sentiment labels and confidence scores

work page

[25] [25]

Reproducibility: Generate identical predic- tions with deterministic sampling (tempera- ture = 0)

work page

[26] [26]

Latency: Complete inference within 5 min- utes per 1,000 samples

work page

[27] [27]

ﺍܳ(” the stock is experienc- ing technical correction

Cost Eﬀiciency: Maintain inference cost be- low $0.0012 per sample A.3. Dataset Availability The Arabic Financial Sentiment Corpus (AFSC) comprising 84,431 labeled samples will be re- leased under Creative Commons Attribution 4.0 In- ternational License upon acceptance. The dataset includes preprocessed Arabic text, five-class sen- timent labels with conf...

work page