pith. sign in

arxiv: 2506.18942 · v2 · submitted 2025-06-22 · 💻 cs.CY · q-fin.RM

Advanced Applications of Generative AI in Actuarial Science: Case Studies Beyond ChatGPT

Pith reviewed 2026-05-19 07:55 UTC · model grok-4.3

classification 💻 cs.CY q-fin.RM
keywords generative AIactuarial sciencelarge language modelsinsuranceRAGmulti-agent systemslegacy code migrationclaim prediction
0
0 comments X

The pith

Generative AI supports actuarial practice through four concrete case studies in insurance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that generative AI can assist actuaries with practical tasks by presenting four implemented case studies that go beyond simple chat interfaces. These applications use large language models to extract features from text for claim cost prediction, retrieval-augmented generation to pull and organize data from annual reports, vision-enabled models to classify car damage from images, and multi-agent systems to translate legacy R code into Python while checking outputs. A sympathetic reader would care because actuaries handle large volumes of unstructured data and compliance requirements where such tools could reduce manual effort if they prove reliable.

Core claim

This article presents four case studies showing generative AI can support actuarial practice: large language models extract informative features from unstructured text for claim cost prediction, retrieval-augmented generation identifies and structures information from insurers' annual reports for market comparisons, fine-tuned vision-enabled large language models classify car damage types and extract contextual details from images, and a multi-agent system autonomously migrates legacy actuarial code from R to Python while validating outputs against the originals.

What carries the argument

Four implemented case studies that apply generative AI techniques—LLMs for text feature extraction, RAG for report structuring, vision LLMs for image classification, and multi-agent systems for code migration—to actuarial tasks.

If this is right

  • Claim cost models gain additional predictive power when text descriptions are automatically turned into structured features.
  • Market analysis becomes faster when retrieval-augmented generation extracts and organizes comparable data from annual reports.
  • Damage assessment in auto insurance can incorporate image-based classification and context extraction without manual review.
  • Actuarial teams can reduce technical debt by using multi-agent systems to convert and validate legacy R code into Python.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar case-study methods could apply to other data-intensive regulated fields such as banking risk modeling.
  • The reproducibility issues flagged in the paper suggest that independent audits of the migrated code outputs would be a necessary next step.
  • Governance frameworks for these tools will likely need to specify who reviews AI-generated features or code changes before they enter production models.

Load-bearing premise

The described implementations function reliably enough for practical use in regulated insurance environments despite the challenges of reproducibility, privacy, and governance.

What would settle it

A side-by-side comparison where the Python code produced by the multi-agent system yields identical numerical outputs to the original R code across a full set of actuarial test cases would confirm or refute the migration case study.

read the original abstract

This article explores the potential of generative AI (GenAI) to support actuarial practice through four implemented case studies. It situates these case studies within the broader evolution of artificial intelligence in actuarial science, from early neural networks and machine learning to modern transformer-based GenAI systems. The first case study illustrates how large language models (LLMs) can improve claim cost prediction by extracting informative features from unstructured text for use in the underlying supervised learning task. The second case study demonstrates the automation of market comparisons using Retrieval-Augmented Generation to identify, extract, and structure relevant information from insurers' annual reports. The third case study highlights the capabilities of fine-tuned vision-enabled LLMs in classifying car damage types and extracting contextual information from images. The fourth case study presents a multi-agent system that autonomously migrates actuarial legacy code from R to Python and validates the translation against the original code's outputs. In addition to these case studies, we outline further GenAI applications in the insurance industry. Finally, we discuss the regulatory, security, dual-use and fraud, reproducibility, privacy, governance, and organisational challenges associated with deploying GenAI in regulated insurance environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper presents four implemented case studies demonstrating generative AI applications in actuarial science: (1) LLMs extracting features from unstructured text to improve supervised claim cost prediction models, (2) Retrieval-Augmented Generation (RAG) to automate extraction and structuring of information from insurers' annual reports for market comparisons, (3) fine-tuned vision-enabled LLMs for classifying car damage types and extracting contextual details from images, and (4) a multi-agent system for autonomously migrating legacy actuarial R code to Python while validating output equivalence. The manuscript situates these within the evolution of AI in actuarial work and concludes with a discussion of regulatory, security, reproducibility, privacy, and governance challenges for deployment in insurance.

Significance. If the implementations were shown to deliver measurable improvements with appropriate validation, the work would offer concrete, practical demonstrations of GenAI moving beyond generic chat interfaces into regulated actuarial tasks such as claims modeling, regulatory reporting, damage assessment, and code modernization. The explicit enumeration of deployment challenges (reproducibility, privacy, governance) is a strength that could help frame responsible adoption, but the current lack of supporting evidence substantially reduces the manuscript's contribution to the literature.

major comments (4)
  1. [First case study] The first case study (LLM feature extraction for claim cost prediction) describes the process at a high level but provides no quantitative results such as accuracy, RMSE, or R² improvement over a baseline model that omits the text-derived features, nor any cross-validation or hold-out performance numbers. This absence makes it impossible to assess whether the approach delivers usable gains in a supervised actuarial task.
  2. [Second case study] The second case study (RAG on annual reports) claims automation of market comparisons but reports no retrieval precision, recall, or end-to-end accuracy metrics, nor any comparison against manual extraction or simpler keyword-based baselines. Without these, the claim that the pipeline produces reliable structured information for actuarial use cannot be evaluated.
  3. [Third case study] The third case study (vision-enabled LLMs for car damage classification) states that fine-tuned models classify damage types and extract context, yet supplies no classification metrics (accuracy, F1, confusion matrix), no comparison to non-vision baselines, and no discussion of image quality or edge-case performance. This leaves the practical utility for insurance claims unverified.
  4. [Fourth case study] The fourth case study (multi-agent R-to-Python migration) asserts autonomous translation and validation against original outputs but gives no success rate, test coverage statistics, or quantitative equivalence measures (e.g., numerical output differences on a test suite). The absence of these figures prevents judgment of whether the system reliably supports legacy code modernization in a regulated environment.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a brief statement of the quantitative evaluation approach used (or planned) for each case study to set reader expectations.
  2. [Case study sections] Section headings for the four case studies should be numbered or clearly labeled (e.g., §3.1, §3.2) to facilitate precise reference in future citations.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the practical focus of the case studies along with the discussion of deployment challenges. We agree that quantitative validation is necessary to demonstrate measurable improvements and will revise the manuscript to incorporate the requested metrics and comparisons for each case study. Below we respond point by point.

read point-by-point responses
  1. Referee: [First case study] The first case study (LLM feature extraction for claim cost prediction) describes the process at a high level but provides no quantitative results such as accuracy, RMSE, or R² improvement over a baseline model that omits the text-derived features, nor any cross-validation or hold-out performance numbers. This absence makes it impossible to assess whether the approach delivers usable gains in a supervised actuarial task.

    Authors: We acknowledge that the current manuscript presents the LLM feature extraction process descriptively without reporting performance numbers. Experiments were performed during implementation, and the revised version will add RMSE, R², and other relevant metrics comparing the augmented model against a baseline without text features, together with cross-validation and hold-out results to quantify the gains. revision: yes

  2. Referee: [Second case study] The second case study (RAG on annual reports) claims automation of market comparisons but reports no retrieval precision, recall, or end-to-end accuracy metrics, nor any comparison against manual extraction or simpler keyword-based baselines. Without these, the claim that the pipeline produces reliable structured information for actuarial use cannot be evaluated.

    Authors: We agree that retrieval and extraction quality must be quantified. The revision will report precision, recall, and end-to-end accuracy for the RAG pipeline, along with direct comparisons to manual extraction and keyword baselines, to substantiate reliability for market comparison tasks. revision: yes

  3. Referee: [Third case study] The third case study (vision-enabled LLMs for car damage classification) states that fine-tuned models classify damage types and extract context, yet supplies no classification metrics (accuracy, F1, confusion matrix), no comparison to non-vision baselines, and no discussion of image quality or edge-case performance. This leaves the practical utility for insurance claims unverified.

    Authors: The manuscript focused on the fine-tuning approach and contextual extraction capabilities. We will add accuracy, F1 scores, and confusion matrices in the revision, include comparisons to non-vision baselines, and discuss performance across image quality levels and edge cases to better demonstrate utility for claims assessment. revision: yes

  4. Referee: [Fourth case study] The fourth case study (multi-agent R-to-Python migration) asserts autonomous translation and validation against original outputs but gives no success rate, test coverage statistics, or quantitative equivalence measures (e.g., numerical output differences on a test suite). The absence of these figures prevents judgment of whether the system reliably supports legacy code modernization in a regulated environment.

    Authors: We accept that success rates and equivalence measures are required for a regulated context. The revised manuscript will include success rates, test coverage statistics, and quantitative measures of output equivalence (such as numerical differences on validation suites) to evaluate reliability for legacy code migration. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive case studies without derivations

full rationale

The manuscript is a descriptive account of four implemented GenAI case studies in actuarial practice (LLM text feature extraction for claims, RAG on annual reports, vision LLMs for car damage, multi-agent R-to-Python migration) plus challenges. It contains no equations, no first-principles derivations, no fitted parameters presented as predictions, and no load-bearing self-citations or uniqueness theorems. The central claims rest on narrative descriptions of processes rather than any reduction of outputs to inputs by construction. This is the normal non-circular outcome for an applications paper that does not attempt mathematical or predictive derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied case-study paper with no mathematical modeling. No free parameters, axioms, or invented entities are introduced; the claims rest on the practical success of the described AI implementations.

pith-pipeline@v0.9.0 · 5734 in / 1184 out tokens · 44640 ms · 2026-05-19T07:55:20.320726+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Annals of Actuarial Science 15(2), 207–229 (2021)

    Richman, R.: AI in actuarial science – a review of recent advances – part 1. Annals of Actuarial Science 15(2), 207–229 (2021)

  2. [2]

    Annals of Actuarial Science 15(2), 230–258 (2021)

    Richman, R.: AI in actuarial science – a review of recent advances – part 2. Annals of Actuarial Science 15(2), 230–258 (2021)

  3. [3]

    Working paper, SSRN (May 2025)

    W¨ uthrich, M.V., Richman, R., Avanzi, B., Lindholm, M., Maggi, M., Mayer, M., Schelldorfer, J., Scognamiglio, S.: AI Tools for Actuaries. Working paper, SSRN (May 2025). https://ssrn.com/abstract=5162304

  4. [4]

    Research report, Society of Actuaries Research Institute (February 2024)

    Carlin, S., Mathys, S.: A Primer on Generative AI for Actuaries. Research report, Society of Actuaries Research Institute (February 2024). https://www.soa.org/ resources/research-reports/2024/generative-ai-for-actuaries/

  5. [5]

    British Actuarial Journal 29, 1–42 (2024)

    Balona, C.: ActuaryGPT: Applications of Large Language Models to Insurance and Actuarial Work. British Actuarial Journal 29, 1–42 (2024)

  6. [6]

    In: Proceedings of the 31st International Conference on Neural Information Processing Systems

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is All you Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)

  7. [7]

    Nature 323(6088), 533–536 (1986)

    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back- propagating errors. Nature 323(6088), 533–536 (1986)

  8. [8]

    Journal of the Royal Statistical Society: Series A (General) 135(3), 370–384 (1972) 26

    Nelder, J.A., Wedderburn, R.W.M.: Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General) 135(3), 370–384 (1972) 26

  9. [9]

    In: Proceedings of the Casualty Actuarial Society, vol

    Bornhuetter, R.L., Ferguson, R.E.: The Actuary and IBNR. In: Proceedings of the Casualty Actuarial Society, vol. 59, pp. 181–195 (1972). Casualty Actuarial Society

  10. [10]

    CAS Forum 1, 179–213 (2003)

    Dugas, C., Bengio, Y., Chapados, N., Vincent, P., Denoncourt, G., Fournier, C.: Statistical Learning Algorithms Applied to Automobile Insurance Ratemaking. CAS Forum 1, 179–213 (2003)

  11. [11]

    This paper has been integrated into SSRN Manuscript 3822407 (2019)

    W¨ uthrich, M.V.: From Generalized Linear Models to Neural Networks, and Back. This paper has been integrated into SSRN Manuscript 3822407 (2019). https: //ssrn.com/abstract=3491790

  12. [12]

    LIDAM Reprints ISBA 2020035, Universit´ e catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA) (September 2020)

    Denuit, M., Hainaut, D., Trufin, J.: Effective Statistical Learning Methods for Actuaries II: Tree-Based Methods and Extensions. LIDAM Reprints ISBA 2020035, Universit´ e catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA) (September 2020)

  13. [13]

    North American Actuarial Journal 25(2), 255–285 (2021)

    Henckaerts, R., Cˆ ot´ e, M.-P., Antonio, K., Verbelen, R.: Boosting Insights in Insur- ance Tariff Plans with Tree-Based Machine Learning Methods. North American Actuarial Journal 25(2), 255–285 (2021)

  14. [14]

    Research Paper Series 16–68, Swiss Finance Institute, Geneva, Switzerland (June 2023)

    W¨ uthrich, M.V., Buser, C.: Data Analytics for Non-Life Insurance Pricing. Research Paper Series 16–68, Swiss Finance Institute, Geneva, Switzerland (June 2023). https://ssrn.com/abstract=2870308

  15. [15]

    European Actuarial Journal (2025)

    Richman, R., Scognamiglio, S., W¨ uthrich, M.V.: The Credibility Transformer. European Actuarial Journal (2025)

  16. [16]

    Mitteilungen der Deutschen Aktuarvereinigung e.V.· Sonderausgabe zur DAV/DGVFM Jahrestagung 2024

    Cowling, C.: AI – Is there a future for actuaries? Aktuar AktuellSonderausgabe 1, 15 (2024). Mitteilungen der Deutschen Aktuarvereinigung e.V.· Sonderausgabe zur DAV/DGVFM Jahrestagung 2024

  17. [17]

    In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp

    Zhang, H., Li, X., Bing, L.: Video-LLaMA: An instruction-tuned audio-visual language model for video understanding. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 543–553. Association for Computational Linguistics, Singapore (2023)

  18. [18]

    IEEE/ACM Trans

    Borsos, Z., Marinier, R., Vincent, D., Kharitonov, E., Pietquin, O., Sharifi, M., Roblek, D., Teboul, O., Grangier, D., Tagliasacchi, M., Zeghidour, N.: AudioLM: A Language Modeling Approach to Audio Generation. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 31, 2523–2533 (2023)

  19. [19]

    Nature Biotechnology 43(2), 166–169 (2025) 27

    Lobentanzer, S., Feng, S., Bruderer, N., Maier, A., Consortium, T.B., Wang, C., Baumbach, J., Abreu-Vicente, J., Krehl, N., Ma, Q., Lemberger, T., Saez- Rodriguez, J.: A Platform for the Biomedical Application of Large Language Models. Nature Biotechnology 43(2), 166–169 (2025) 27

  20. [20]

    Variance 16(2) (2023)

    Xu, S., Manathunga, V., Hong, D.: Framework of BERT-Based NLP Models for Frequency and Severity in Insurance Claims. Variance 16(2) (2023)

  21. [21]

    https://ssrn.com/ abstract=4758296

    Richman, R.: An AI Vision for the Actuarial Profession (2024). https://ssrn.com/ abstract=4758296

  22. [22]

    https://arxiv.org/abs/2304

    Yeti¸ stiren, B.,¨Ozsoy, I., Ayerdem, M., T¨ uz¨ un, E.: Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copi- lot, Amazon CodeWhisperer, and ChatGPT (2023). https://arxiv.org/abs/2304. 10778

  23. [23]

    In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp

    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp. 9459–9474 (2020)

  24. [24]

    https://arxiv.org/abs/2404

    Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From Local to Global: A Graph RAG Approach to Query-Focused Summarization (2025). https://arxiv.org/abs/2404. 16130

  25. [25]

    Pathrag: Pruning graph-based re- trieval augmented generation with relational paths.CoRR, abs/2502.14902,

    Chen, B., Guo, Z., Yang, Z., Chen, Y., Chen, J., Liu, Z., Shi, C., Yang, C.: PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths (2025). https://arxiv.org/abs/2502.14902

  26. [26]

    Machine Learning with Applications 9, 100332 (2022)

    van Ruitenbeek, R.E., Bhulai, S.: Convolutional Neural Networks for vehicle damage detection. Machine Learning with Applications 9, 100332 (2022)

  27. [27]

    Applied Sciences 14(20) (2024)

    P´ erez-Zarate, S.A., Corzo-Garc´ ıa, D., Pro-Mart´ ın, J.L.,´Alvarez-Garc´ ıa, J.A., Mart´ ınez-del-Amor, M.A., Fern´ andez-Cabrera, D.: Automated Car Damage Assessment Using Computer Vision: Insurance Company Use Case. Applied Sciences 14(20) (2024)

  28. [28]

    https://arxiv.org/abs/ 2406.09105

    Lin, C., Lyu, H., Xu, X., Luo, J.: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs’ Performance in Insurance (2024). https://arxiv.org/abs/ 2406.09105

  29. [29]

    Technical report, EIOPA (2021)

    European Insurance and Occupational Pensions Authority (EIOPA): Artificial Intelligence Governance Principles: Towards Ethical and Trustworthy AI in the European Insurance Sector. Technical report, EIOPA (2021). https://www.eiopa. europa.eu/system/files/2021-06/eiopa-ai-governance-principles-june-2021.pdf 28