Advanced Applications of Generative AI in Actuarial Science: Case Studies Beyond ChatGPT
Pith reviewed 2026-05-19 07:55 UTC · model grok-4.3
The pith
Generative AI supports actuarial practice through four concrete case studies in insurance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This article presents four case studies showing generative AI can support actuarial practice: large language models extract informative features from unstructured text for claim cost prediction, retrieval-augmented generation identifies and structures information from insurers' annual reports for market comparisons, fine-tuned vision-enabled large language models classify car damage types and extract contextual details from images, and a multi-agent system autonomously migrates legacy actuarial code from R to Python while validating outputs against the originals.
What carries the argument
Four implemented case studies that apply generative AI techniques—LLMs for text feature extraction, RAG for report structuring, vision LLMs for image classification, and multi-agent systems for code migration—to actuarial tasks.
If this is right
- Claim cost models gain additional predictive power when text descriptions are automatically turned into structured features.
- Market analysis becomes faster when retrieval-augmented generation extracts and organizes comparable data from annual reports.
- Damage assessment in auto insurance can incorporate image-based classification and context extraction without manual review.
- Actuarial teams can reduce technical debt by using multi-agent systems to convert and validate legacy R code into Python.
Where Pith is reading between the lines
- Similar case-study methods could apply to other data-intensive regulated fields such as banking risk modeling.
- The reproducibility issues flagged in the paper suggest that independent audits of the migrated code outputs would be a necessary next step.
- Governance frameworks for these tools will likely need to specify who reviews AI-generated features or code changes before they enter production models.
Load-bearing premise
The described implementations function reliably enough for practical use in regulated insurance environments despite the challenges of reproducibility, privacy, and governance.
What would settle it
A side-by-side comparison where the Python code produced by the multi-agent system yields identical numerical outputs to the original R code across a full set of actuarial test cases would confirm or refute the migration case study.
read the original abstract
This article explores the potential of generative AI (GenAI) to support actuarial practice through four implemented case studies. It situates these case studies within the broader evolution of artificial intelligence in actuarial science, from early neural networks and machine learning to modern transformer-based GenAI systems. The first case study illustrates how large language models (LLMs) can improve claim cost prediction by extracting informative features from unstructured text for use in the underlying supervised learning task. The second case study demonstrates the automation of market comparisons using Retrieval-Augmented Generation to identify, extract, and structure relevant information from insurers' annual reports. The third case study highlights the capabilities of fine-tuned vision-enabled LLMs in classifying car damage types and extracting contextual information from images. The fourth case study presents a multi-agent system that autonomously migrates actuarial legacy code from R to Python and validates the translation against the original code's outputs. In addition to these case studies, we outline further GenAI applications in the insurance industry. Finally, we discuss the regulatory, security, dual-use and fraud, reproducibility, privacy, governance, and organisational challenges associated with deploying GenAI in regulated insurance environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents four implemented case studies demonstrating generative AI applications in actuarial science: (1) LLMs extracting features from unstructured text to improve supervised claim cost prediction models, (2) Retrieval-Augmented Generation (RAG) to automate extraction and structuring of information from insurers' annual reports for market comparisons, (3) fine-tuned vision-enabled LLMs for classifying car damage types and extracting contextual details from images, and (4) a multi-agent system for autonomously migrating legacy actuarial R code to Python while validating output equivalence. The manuscript situates these within the evolution of AI in actuarial work and concludes with a discussion of regulatory, security, reproducibility, privacy, and governance challenges for deployment in insurance.
Significance. If the implementations were shown to deliver measurable improvements with appropriate validation, the work would offer concrete, practical demonstrations of GenAI moving beyond generic chat interfaces into regulated actuarial tasks such as claims modeling, regulatory reporting, damage assessment, and code modernization. The explicit enumeration of deployment challenges (reproducibility, privacy, governance) is a strength that could help frame responsible adoption, but the current lack of supporting evidence substantially reduces the manuscript's contribution to the literature.
major comments (4)
- [First case study] The first case study (LLM feature extraction for claim cost prediction) describes the process at a high level but provides no quantitative results such as accuracy, RMSE, or R² improvement over a baseline model that omits the text-derived features, nor any cross-validation or hold-out performance numbers. This absence makes it impossible to assess whether the approach delivers usable gains in a supervised actuarial task.
- [Second case study] The second case study (RAG on annual reports) claims automation of market comparisons but reports no retrieval precision, recall, or end-to-end accuracy metrics, nor any comparison against manual extraction or simpler keyword-based baselines. Without these, the claim that the pipeline produces reliable structured information for actuarial use cannot be evaluated.
- [Third case study] The third case study (vision-enabled LLMs for car damage classification) states that fine-tuned models classify damage types and extract context, yet supplies no classification metrics (accuracy, F1, confusion matrix), no comparison to non-vision baselines, and no discussion of image quality or edge-case performance. This leaves the practical utility for insurance claims unverified.
- [Fourth case study] The fourth case study (multi-agent R-to-Python migration) asserts autonomous translation and validation against original outputs but gives no success rate, test coverage statistics, or quantitative equivalence measures (e.g., numerical output differences on a test suite). The absence of these figures prevents judgment of whether the system reliably supports legacy code modernization in a regulated environment.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a brief statement of the quantitative evaluation approach used (or planned) for each case study to set reader expectations.
- [Case study sections] Section headings for the four case studies should be numbered or clearly labeled (e.g., §3.1, §3.2) to facilitate precise reference in future citations.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the practical focus of the case studies along with the discussion of deployment challenges. We agree that quantitative validation is necessary to demonstrate measurable improvements and will revise the manuscript to incorporate the requested metrics and comparisons for each case study. Below we respond point by point.
read point-by-point responses
-
Referee: [First case study] The first case study (LLM feature extraction for claim cost prediction) describes the process at a high level but provides no quantitative results such as accuracy, RMSE, or R² improvement over a baseline model that omits the text-derived features, nor any cross-validation or hold-out performance numbers. This absence makes it impossible to assess whether the approach delivers usable gains in a supervised actuarial task.
Authors: We acknowledge that the current manuscript presents the LLM feature extraction process descriptively without reporting performance numbers. Experiments were performed during implementation, and the revised version will add RMSE, R², and other relevant metrics comparing the augmented model against a baseline without text features, together with cross-validation and hold-out results to quantify the gains. revision: yes
-
Referee: [Second case study] The second case study (RAG on annual reports) claims automation of market comparisons but reports no retrieval precision, recall, or end-to-end accuracy metrics, nor any comparison against manual extraction or simpler keyword-based baselines. Without these, the claim that the pipeline produces reliable structured information for actuarial use cannot be evaluated.
Authors: We agree that retrieval and extraction quality must be quantified. The revision will report precision, recall, and end-to-end accuracy for the RAG pipeline, along with direct comparisons to manual extraction and keyword baselines, to substantiate reliability for market comparison tasks. revision: yes
-
Referee: [Third case study] The third case study (vision-enabled LLMs for car damage classification) states that fine-tuned models classify damage types and extract context, yet supplies no classification metrics (accuracy, F1, confusion matrix), no comparison to non-vision baselines, and no discussion of image quality or edge-case performance. This leaves the practical utility for insurance claims unverified.
Authors: The manuscript focused on the fine-tuning approach and contextual extraction capabilities. We will add accuracy, F1 scores, and confusion matrices in the revision, include comparisons to non-vision baselines, and discuss performance across image quality levels and edge cases to better demonstrate utility for claims assessment. revision: yes
-
Referee: [Fourth case study] The fourth case study (multi-agent R-to-Python migration) asserts autonomous translation and validation against original outputs but gives no success rate, test coverage statistics, or quantitative equivalence measures (e.g., numerical output differences on a test suite). The absence of these figures prevents judgment of whether the system reliably supports legacy code modernization in a regulated environment.
Authors: We accept that success rates and equivalence measures are required for a regulated context. The revised manuscript will include success rates, test coverage statistics, and quantitative measures of output equivalence (such as numerical differences on validation suites) to evaluate reliability for legacy code migration. revision: yes
Circularity Check
No circularity: purely descriptive case studies without derivations
full rationale
The manuscript is a descriptive account of four implemented GenAI case studies in actuarial practice (LLM text feature extraction for claims, RAG on annual reports, vision LLMs for car damage, multi-agent R-to-Python migration) plus challenges. It contains no equations, no first-principles derivations, no fitted parameters presented as predictions, and no load-bearing self-citations or uniqueness theorems. The central claims rest on narrative descriptions of processes rather than any reduction of outputs to inputs by construction. This is the normal non-circular outcome for an applications paper that does not attempt mathematical or predictive derivations.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
four implemented case studies... LLMs for extracting features from text in claim cost prediction, RAG for structuring information from annual reports, vision-enabled LLMs for car damage classification, and a multi-agent system for migrating legacy R code to Python
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Results... 18.1% reduction in RMSE... accuracy 0.880... multi-agent system... generated well-structured reports
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Annals of Actuarial Science 15(2), 207–229 (2021)
Richman, R.: AI in actuarial science – a review of recent advances – part 1. Annals of Actuarial Science 15(2), 207–229 (2021)
work page 2021
-
[2]
Annals of Actuarial Science 15(2), 230–258 (2021)
Richman, R.: AI in actuarial science – a review of recent advances – part 2. Annals of Actuarial Science 15(2), 230–258 (2021)
work page 2021
-
[3]
Working paper, SSRN (May 2025)
W¨ uthrich, M.V., Richman, R., Avanzi, B., Lindholm, M., Maggi, M., Mayer, M., Schelldorfer, J., Scognamiglio, S.: AI Tools for Actuaries. Working paper, SSRN (May 2025). https://ssrn.com/abstract=5162304
work page 2025
-
[4]
Research report, Society of Actuaries Research Institute (February 2024)
Carlin, S., Mathys, S.: A Primer on Generative AI for Actuaries. Research report, Society of Actuaries Research Institute (February 2024). https://www.soa.org/ resources/research-reports/2024/generative-ai-for-actuaries/
work page 2024
-
[5]
British Actuarial Journal 29, 1–42 (2024)
Balona, C.: ActuaryGPT: Applications of Large Language Models to Insurance and Actuarial Work. British Actuarial Journal 29, 1–42 (2024)
work page 2024
-
[6]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is All you Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)
work page 2017
-
[7]
Nature 323(6088), 533–536 (1986)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back- propagating errors. Nature 323(6088), 533–536 (1986)
work page 1986
-
[8]
Journal of the Royal Statistical Society: Series A (General) 135(3), 370–384 (1972) 26
Nelder, J.A., Wedderburn, R.W.M.: Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General) 135(3), 370–384 (1972) 26
work page 1972
-
[9]
In: Proceedings of the Casualty Actuarial Society, vol
Bornhuetter, R.L., Ferguson, R.E.: The Actuary and IBNR. In: Proceedings of the Casualty Actuarial Society, vol. 59, pp. 181–195 (1972). Casualty Actuarial Society
work page 1972
-
[10]
Dugas, C., Bengio, Y., Chapados, N., Vincent, P., Denoncourt, G., Fournier, C.: Statistical Learning Algorithms Applied to Automobile Insurance Ratemaking. CAS Forum 1, 179–213 (2003)
work page 2003
-
[11]
This paper has been integrated into SSRN Manuscript 3822407 (2019)
W¨ uthrich, M.V.: From Generalized Linear Models to Neural Networks, and Back. This paper has been integrated into SSRN Manuscript 3822407 (2019). https: //ssrn.com/abstract=3491790
work page 2019
-
[12]
Denuit, M., Hainaut, D., Trufin, J.: Effective Statistical Learning Methods for Actuaries II: Tree-Based Methods and Extensions. LIDAM Reprints ISBA 2020035, Universit´ e catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA) (September 2020)
work page 2020
-
[13]
North American Actuarial Journal 25(2), 255–285 (2021)
Henckaerts, R., Cˆ ot´ e, M.-P., Antonio, K., Verbelen, R.: Boosting Insights in Insur- ance Tariff Plans with Tree-Based Machine Learning Methods. North American Actuarial Journal 25(2), 255–285 (2021)
work page 2021
-
[14]
Research Paper Series 16–68, Swiss Finance Institute, Geneva, Switzerland (June 2023)
W¨ uthrich, M.V., Buser, C.: Data Analytics for Non-Life Insurance Pricing. Research Paper Series 16–68, Swiss Finance Institute, Geneva, Switzerland (June 2023). https://ssrn.com/abstract=2870308
work page 2023
-
[15]
European Actuarial Journal (2025)
Richman, R., Scognamiglio, S., W¨ uthrich, M.V.: The Credibility Transformer. European Actuarial Journal (2025)
work page 2025
-
[16]
Mitteilungen der Deutschen Aktuarvereinigung e.V.· Sonderausgabe zur DAV/DGVFM Jahrestagung 2024
Cowling, C.: AI – Is there a future for actuaries? Aktuar AktuellSonderausgabe 1, 15 (2024). Mitteilungen der Deutschen Aktuarvereinigung e.V.· Sonderausgabe zur DAV/DGVFM Jahrestagung 2024
work page 2024
-
[17]
Zhang, H., Li, X., Bing, L.: Video-LLaMA: An instruction-tuned audio-visual language model for video understanding. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 543–553. Association for Computational Linguistics, Singapore (2023)
work page 2023
-
[18]
Borsos, Z., Marinier, R., Vincent, D., Kharitonov, E., Pietquin, O., Sharifi, M., Roblek, D., Teboul, O., Grangier, D., Tagliasacchi, M., Zeghidour, N.: AudioLM: A Language Modeling Approach to Audio Generation. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 31, 2523–2533 (2023)
work page 2023
-
[19]
Nature Biotechnology 43(2), 166–169 (2025) 27
Lobentanzer, S., Feng, S., Bruderer, N., Maier, A., Consortium, T.B., Wang, C., Baumbach, J., Abreu-Vicente, J., Krehl, N., Ma, Q., Lemberger, T., Saez- Rodriguez, J.: A Platform for the Biomedical Application of Large Language Models. Nature Biotechnology 43(2), 166–169 (2025) 27
work page 2025
-
[20]
Xu, S., Manathunga, V., Hong, D.: Framework of BERT-Based NLP Models for Frequency and Severity in Insurance Claims. Variance 16(2) (2023)
work page 2023
-
[21]
https://ssrn.com/ abstract=4758296
Richman, R.: An AI Vision for the Actuarial Profession (2024). https://ssrn.com/ abstract=4758296
work page 2024
-
[22]
Yeti¸ stiren, B.,¨Ozsoy, I., Ayerdem, M., T¨ uz¨ un, E.: Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copi- lot, Amazon CodeWhisperer, and ChatGPT (2023). https://arxiv.org/abs/2304. 10778
work page 2023
-
[23]
In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp. 9459–9474 (2020)
work page 2020
-
[24]
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From Local to Global: A Graph RAG Approach to Query-Focused Summarization (2025). https://arxiv.org/abs/2404. 16130
work page 2025
-
[25]
Chen, B., Guo, Z., Yang, Z., Chen, Y., Chen, J., Liu, Z., Shi, C., Yang, C.: PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths (2025). https://arxiv.org/abs/2502.14902
-
[26]
Machine Learning with Applications 9, 100332 (2022)
van Ruitenbeek, R.E., Bhulai, S.: Convolutional Neural Networks for vehicle damage detection. Machine Learning with Applications 9, 100332 (2022)
work page 2022
-
[27]
Applied Sciences 14(20) (2024)
P´ erez-Zarate, S.A., Corzo-Garc´ ıa, D., Pro-Mart´ ın, J.L.,´Alvarez-Garc´ ıa, J.A., Mart´ ınez-del-Amor, M.A., Fern´ andez-Cabrera, D.: Automated Car Damage Assessment Using Computer Vision: Insurance Company Use Case. Applied Sciences 14(20) (2024)
work page 2024
-
[28]
https://arxiv.org/abs/ 2406.09105
Lin, C., Lyu, H., Xu, X., Luo, J.: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs’ Performance in Insurance (2024). https://arxiv.org/abs/ 2406.09105
-
[29]
Technical report, EIOPA (2021)
European Insurance and Occupational Pensions Authority (EIOPA): Artificial Intelligence Governance Principles: Towards Ethical and Trustworthy AI in the European Insurance Sector. Technical report, EIOPA (2021). https://www.eiopa. europa.eu/system/files/2021-06/eiopa-ai-governance-principles-june-2021.pdf 28
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.