Trust in AI among Middle Eastern CS Students: Investigating Students' Trust and Usage Patterns Across Saudi Arabia, Kuwait and Jordan
Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3
The pith
Middle Eastern CS students show trust in AI predicted by language fluency, with gender patterns that vary by country and differ from US results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Replicating a US study of trust in AI, the authors surveyed students in three Arabic-speaking Middle Eastern countries and found that language fluency can predict trust, female students in Saudi Arabia indicated lower trust than their male peers (contrasting US results), no noticeable gender differences appeared in Kuwait and Jordan, and English language proficiency showed a generally negative correlation with students' confidence in AI.
What carries the argument
A replicated survey instrument that measures trust in AI and analyzes it against gender, first-generation status, and language proficiency in non-Western student populations.
If this is right
- AI adoption in Middle Eastern computing education depends on language fluency rather than matching Western patterns exactly.
- Gender influences on trust appear only in certain countries within the region and reverse from US observations in Saudi Arabia.
- Negative links between English proficiency and confidence point to potential barriers when AI tools are primarily English-based.
- Design of AI systems for education must account for cultural and linguistic differences to support equitable use across regions.
Where Pith is reading between the lines
- Repeating the survey in other non-English-speaking regions could reveal whether language fluency remains a consistent predictor or interacts differently with local languages.
- Making AI tools available in Arabic or other local languages might raise trust levels and reduce the observed negative correlation with English proficiency.
- The country-specific gender findings suggest that broader cultural norms around technology use, not just language, shape trust and could be tested by comparing urban versus rural student samples.
Load-bearing premise
The translated survey instrument measures the same underlying construct of trust in AI across Arabic-speaking cultural contexts as in the original English study, without introducing response biases from language or social norms.
What would settle it
Administering both the original English and the Arabic versions of the survey to the same bilingual students and finding substantially different trust scores or gender patterns would indicate that translation or cultural response styles alter the measured construct.
Figures
read the original abstract
Background and Context: Artificial intelligence (AI) tools have been reshaping computing and computer science education. Trust in AI is a determining factor in the adoption of these tools. Recent studies have shown different trust factors across gender and first-generation status among students. However, these studies have focused mainly on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations, and their generalizability to other populations with different languages and cultures is unclear. Objective: This study aims to evaluate trust in AI among Middle Eastern computer science students and the factors that can impact it. Method. We replicate a recent study of trust in four universities in three Middle Eastern, Arabic-speaking countries: Saudi Arabia, Kuwait, and Jordan. We analyze trust among students across different factors such as gender and first-generation status. Findings: Our results suggest that language fluency can predict trust in AI. Moreover, unlike the results from the US population where female students tended to trust AI more than their male peers, female students in Saudi Arabia indicated lower trust compared to their male counterparts, and we did not observe any noticeable differences across gender in the other countries. We also found a generally negative correlation between English language proficiency and students' confidence. Implications: This study highlights differences in students' adoption and trust in AI even within the same region. It emphasizes the need for more investigation into students' adoption and interaction in non-WEIRD regions for equitable adoption of this technology. It also suggests a need for efforts in designing effective AI systems tailored to the cultural and linguistic needs of the region.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a replication of a prior US study on trust in AI, surveying computer science students at universities in Saudi Arabia, Kuwait, and Jordan. It claims that language fluency predicts trust in AI, that gender differences in trust diverge from US patterns (with female students in Saudi Arabia showing lower trust than males, and no noticeable gender differences in the other countries), and that English language proficiency is negatively correlated with students' confidence in AI. The work highlights the need for culturally and linguistically tailored AI systems in non-WEIRD regions.
Significance. If the measurement equivalence of the translated instrument can be established, the results would meaningfully extend AI trust research beyond WEIRD populations by providing primary empirical data from Middle Eastern CS students. The replication design and focus on language fluency and gender as predictors are strengths that could inform equitable AI adoption in computing education, particularly if supported by appropriate statistical reporting and cross-cultural validation.
major comments (3)
- [Methods] Methods section: No information is provided on the translation of the survey instrument into Arabic, including back-translation, pilot testing for comprehension, or cultural adaptation. This is load-bearing for the central claims, as direct numeric comparisons of trust scores and gender effects between the English US study and the Arabic versions assume the 'trust in AI' construct is measured equivalently across languages and contexts.
- [Results] Results/Findings: The manuscript does not report sample sizes per country or group, statistical tests, effect sizes, confidence intervals, or controls for the key directional findings on language fluency predicting trust and country-specific gender differences. Without these details, the robustness of the claims (e.g., lower trust among Saudi female students) cannot be assessed and may be vulnerable to response biases.
- [Results] Results/Findings: No tests of measurement invariance (configural, metric, or scalar) across language groups or countries are reported. If Arabic items introduce different response styles or factor structures, the observed differences in means, correlations, and gender effects could be artifacts rather than substantive cultural or linguistic effects, directly undermining the cross-regional comparisons.
minor comments (2)
- [Abstract] Abstract: Including basic descriptive statistics (e.g., total N, response rate, or mean trust scores) would help contextualize the directional findings for readers.
- [Introduction] The paper would benefit from clearer notation distinguishing the replicated US study from the current data collection to avoid any ambiguity in comparisons.
Simulated Author's Rebuttal
We thank the referee for their detailed and insightful comments on our manuscript. We have addressed each of the major concerns raised, and the revised manuscript incorporates additional details and analyses to strengthen the validity of our findings. Below, we provide point-by-point responses.
read point-by-point responses
-
Referee: [Methods] Methods section: No information is provided on the translation of the survey instrument into Arabic, including back-translation, pilot testing for comprehension, or cultural adaptation. This is load-bearing for the central claims, as direct numeric comparisons of trust scores and gender effects between the English US study and the Arabic versions assume the 'trust in AI' construct is measured equivalently across languages and contexts.
Authors: We appreciate the referee pointing out this important omission in our initial submission. The Methods section has been revised to include a comprehensive description of the survey translation process. The instrument was translated into Arabic by a certified translator, followed by an independent back-translation to English to ensure accuracy. We conducted a pilot test with 15-20 CS students from the target population to evaluate item comprehension and cultural relevance, making minor adjustments for idiomatic expressions. These steps align with established cross-cultural research practices and support the equivalence of the measured construct. revision: yes
-
Referee: [Results] Results/Findings: The manuscript does not report sample sizes per country or group, statistical tests, effect sizes, confidence intervals, or controls for the key directional findings on language fluency predicting trust and country-specific gender differences. Without these details, the robustness of the claims (e.g., lower trust among Saudi female students) cannot be assessed and may be vulnerable to response biases.
Authors: We agree that the Results section lacked sufficient statistical detail. In the revised manuscript, we now report the sample sizes broken down by country and relevant subgroups (e.g., gender within each country). We have included the results of regression models testing language fluency as a predictor of trust, with appropriate controls for demographics, along with effect sizes and confidence intervals. For the gender differences, we present t-test results with effect sizes and discuss potential response biases. These additions allow for a more rigorous evaluation of the findings. revision: yes
-
Referee: [Results] Results/Findings: No tests of measurement invariance (configural, metric, or scalar) across language groups or countries are reported. If Arabic items introduce different response styles or factor structures, the observed differences in means, correlations, and gender effects could be artifacts rather than substantive cultural or linguistic effects, directly undermining the cross-regional comparisons.
Authors: This is a valid concern for cross-cultural research. We have now performed measurement invariance testing using multi-group structural equation modeling. The revised Results section reports that configural invariance was supported, indicating similar factor structures across groups. Metric invariance was also established, allowing for comparison of relationships. Scalar invariance showed some non-invariance in intercepts, which we acknowledge as a limitation and discuss in the context of potential cultural response styles. We have added this analysis to bolster confidence in our comparative claims. revision: yes
Circularity Check
No circularity: primary empirical survey replication with direct data analysis
full rationale
The paper performs new data collection via translated surveys across three countries, followed by standard statistical comparisons (means, correlations, gender/language effects) against an external prior US study. No equations, fitted parameters, or derivations are present; claims such as language fluency predicting trust arise directly from the collected responses rather than being redefined or forced by construction. Self-citations are absent from the load-bearing steps, and the replication cites an independent prior work. The derivation chain is therefore self-contained empirical output with no reduction to inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-reported survey responses accurately reflect students' actual trust levels in AI tools.
Reference graph
Works this paper leans on
-
[1]
Ahmed Aljohani, Raed Alharbi, Asma Alkhaldi, and Wajdi Aljedaani. 2025. Evaluating LLMs for Arabic Code Summarization: Challenges and Insights from GPT-4. In2025 8th International Conference on Data Science and Machine Learning Applications (CDMA). IEEE, Riyadh, Saudi Arabia, 67–72. doi:10.1109/CDMA61895.2025.00017
-
[2]
Isaac Alpizar-Chacon and Hieke Keuning. 2025. Student’s Use of Generative AI as a Support Tool in an Advanced Web Development Course. In Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education V. 1. ACM, Nijmegen Netherlands, 312–318. doi:10.1145/3724363.3729106
-
[3]
Matin Amoozadeh, David Daniels, Daye Nam, Aayush Kumar, Stella Chen, Michael Hilton, Sruti Srinivasa Ragavan, and Mohammad Amin Alipour
-
[4]
doi:10.48550/arXiv.2310.04631 arXiv:2310.04631 [cs]
Trust in Generative AI among students: An Exploratory Study. doi:10.48550/arXiv.2310.04631 arXiv:2310.04631 [cs]
-
[5]
Matin Amoozadeh, Daye Nam, Daniel Prol, Ali Alfageeh, James Prather, Michael Hilton, Sruti Srinivasa Ragavan, and Mohammad Amin Alipour
-
[6]
doi:10.48550/arXiv.2407.00305 arXiv:2407.00305 [cs]
Student-AI Interaction: A Case Study of CS1 students. doi:10.48550/arXiv.2407.00305 arXiv:2407.00305 [cs]
-
[7]
Atif Noor Arbab, Badar Al Dhuhli, Yugesh Krishnan, and Anna Sheila Crisostomo. 2024. Student’s Utilization and Assistance of AI Tools in Assessment Completion: Perceptions and Implications.International Linguistics Research7, 3 (Oct. 2024), p1. doi:10.30560/ilr.v7n3p1
-
[8]
Patrick Bassner, Eduard Frankford, and Stephan Krusche. 2024. Iris: An AI-Driven Virtual Tutor For Computer Science Education. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 394–400. doi:10.1145/3649217.3653543 arXiv:2405.08008 [cs]
-
[9]
Doga Cambaz and Xiaoling Zhang. 2024. Use of AI-driven Code Generation Models in Teaching and Learning Programming: a Systematic Literature Review. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1. ACM, Portland OR USA, 172–178. doi:10.1145/3626252.3630958
-
[10]
I-Sheng Chen, Danyang Wang, Luyi Xu, Chen Cao, Xiao Fang, and Jionghao Lin. [n. d.]. A Systematic Review on Prompt Engineering in Large Language Models for K-12 STEM Education. ([n. d.]). Manuscript submitted to ACM Trust in AI among Middle Eastern CS Students: Investigating Students’ Trust and Usage Patterns Across Saudi Arabia, Kuwait and Jordan 19
-
[11]
J. Corbin and A. Strauss. 2014.Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications. https://books.google.com/books?id=hZ6kBQAAQBAJ
work page 2014
-
[12]
Anastasiia Demidova, Hanin Atwany, Nour Rabih, Sanad Sha’ban, and Muhammad Abdul-Mageed. 2024. John vs. Ahmed: Debate-Induced Bias in Multilingual LLMs. InProceedings of The Second Arabic Natural Language Processing Conference. Association for Computational Linguistics, Bangkok, Thailand, 193–209. doi:10.18653/v1/2024.arabicnlp-1.18
-
[13]
Ethan Dickey, Andres Bejarano, and Chirayu Garg. 2024. Innovating Computer Programming Pedagogy: The AI-Lab Framework for Generative AI Adoption.SN Computer Science5, 6 (July 2024), 720. doi:10.1007/s42979-024-03074-y arXiv:2308.12258 [cs]
-
[14]
Silvia García-Méndez, Francisco de Arriba-Pérez, and María del Carmen Somoza-López. 2025. A review on the use of large language models as virtual tutors.Science & Education34, 2 (April 2025), 877–892. doi:10.1007/s11191-024-00530-2 arXiv:2405.11983 [cs]
-
[15]
Summit Haque and Christopher Hundhausen. 2025. Generative AI Access, Usage, and Perceptions: An Empirical Comparison of Computing Students In The United States and Bangladesh. InProceedings of the 2025 ACM Conference on International Computing Education Research V.1 (ICER ’25). Association for Computing Machinery, New York, NY, USA, 109–124. doi:10.1145/3...
-
[16]
Kendall Hartley, Merav Hayak, and Un Hyeok Ko. 2024. Artificial Intelligence Supporting Independent Student Learning: An Evaluative Case Study of ChatGPT and Learning to Code.Education Sciences14, 2 (Jan. 2024), 120. doi:10.3390/educsci14020120
-
[17]
2010.Cultures and organizations: Software of the mind, 3rd edition
Geert Hofstede, Gert Jan Hofstede, and Michael Minkov. 2010.Cultures and organizations: Software of the mind, 3rd edition. McGraw-Hill
work page 2010
-
[18]
Irene Hou, Owen Man, Kate Hamilton, Srishty Muthusekaran, Jeffin Johnykutty, Leili Zadeh, and Stephen MacNeil. 2025. ’All Roads Lead to ChatGPT’: How Generative AI is Eroding Social Interactions and Student Learning Communities. InProceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education V. 1. ACM, Nijmegen Netherla...
-
[19]
Sven Jacobs and Steffen Jaschke. 2024. Evaluating the Application of Large Language Models to Generate Feedback in Programming Education. In 2024 IEEE Global Engineering Education Conference (EDUCON). 1–5. doi:10.1109/EDUCON60312.2024.10578838 arXiv:2403.09744 [cs]
-
[20]
Gregor Jošt, Viktor Taneski, and Sašo Karakatič. 2024. The Impact of Large Language Models on Programming Education and Student Learning Outcomes.Applied Sciences14, 10 (May 2024), 4115. doi:10.3390/app14104115
-
[21]
Moritz Körber. 2019. Theoretical considerations and development of a questionnaire to measure trust in automation. InProceedings of the 20th Congress of the International Ergonomics Association (IEA 2018). Springer, 13–30
work page 2019
-
[22]
Sebastian Linxen, Christian Sturm, Florian Brühlmann, Vincent Cassau, Klaus Opwis, and Katharina Reinecke. 2021. How WEIRD is CHI?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 143, 14 pages. doi:10.1145/3411764.3445488
-
[23]
Iris Cristina Peláez-Sánchez, Davis Velarde-Camaqui, and Leonardo David Glasserman-Morales. 2024. The impact of large language models on higher education: exploring the connection between AI and Education 4.0.Frontiers in Education9 (June 2024), 1392091. doi:10.3389/feduc.2024.1392091
-
[24]
Tung Phung, Victor-Alexandru Pădurean, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. doi:10.48550/arXiv.2306.17156 arXiv:2306.17156 [cs]
-
[25]
Engineering, Artificial Intelligence, and Sustainable Technologies in service of society
Farman Ali Pirzado, Awais Ahmed, Gerardo Ibarra-Vázquez, and Hugo Terashima-Marin. 2025. Evaluating Language Dependency in Large Language Models: A Study on Programming Queries in English and Spanish. InProceedings of the 23rd LACCEI International Multi-Conference for Engineering, Education and Technology (LACCEI): "Engineering, Artificial Intelligence, a...
-
[26]
Griffin Pitts and Sanaz Motamedi. [n. d.]. Understanding Human-AI Trust in Education. ([n. d.])
-
[27]
Randrianasolo, Brett Becker, Bailey Kimmel, Jared Wright, and Ben Briggs
James Prather, Brent Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S. Randrianasolo, Brett Becker, Bailey Kimmel, Jared Wright, and Ben Briggs
-
[28]
Randrianasolo, Brett Becker, Bailey Kimmel, Jared Wright, and Ben Briggs
The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers. doi:10.48550/arXiv.2405.17739 arXiv:2405.17739 [cs]
-
[29]
James Prather, Brent N. Reeves, Paul Denny, Juho Leinonen, Stephen MacNeil, Andrew Luxton-Reilly, João Orvalho, Amin Alipour, Ali Alfageeh, Thezyrie Amarouche, Bailey Kimmel, Jared Wright, Musa Blake, and Gweneth Barbre. 2024. Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners. doi:10.48550/arXiv.2412....
-
[30]
Nishat Raihan, Mohammed Latif Siddiq, Joanna C. S. Santos, and Marcos Zampieri. 2024. Large Language Models in Computer Science Education: A Systematic Literature Review. doi:10.48550/arXiv.2410.16349 arXiv:2410.16349 [cs]
-
[31]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations using Large Language Models. InProceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1. 27–43. doi:10.1145/ 3501385.3543957 arXiv:2206.11861 [cs]
-
[32]
Daniel Schiff. 2022. Education for AI, not AI for Education: The Role of Education and Ethics in National AI Policy Strategies.International Journal of Artificial Intelligence in Education32, 3 (Sept. 2022), 527–563. doi:10.1007/s40593-021-00270-2
-
[33]
Serry Sibaee, Omar Najar, Lahouri Ghouti, and Anis Koubaa. 2024. LLMs as Compiler for Arabic Programming Language. doi:10.48550/arXiv.2403. 16087 arXiv:2403.16087 [cs]
-
[34]
Nicholas Sukiennik, Chen Gao, Fengli Xu, and Yong Li. 2025. An Evaluation of Cultural Value Alignment in LLM. doi:10.48550/arXiv.2504.08863 arXiv:2504.08863 [cs]
-
[35]
Omar Tayan, Ali Hassan, Khaled Khankan, and Sanaa Askool. 2024. Considerations for adapting higher education technology courses for AI large language models: A critical review of the impact of ChatGPT.Machine Learning with Applications15 (March 2024), 100513. doi:10.1016/j.mlwa.2023. 100513 Manuscript submitted to ACM
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.