pith. sign in

arxiv: 2606.29371 · v1 · pith:U4YRYUERnew · submitted 2026-06-28 · 📊 stat.AP

Semantic insurance pricing with large language models

Pith reviewed 2026-06-30 02:10 UTC · model grok-4.3

classification 📊 stat.AP
keywords insurance pricinglarge language modelsembeddingsgeneralized linear modelsclaim frequencyactuarial modelingPoisson regressionFrench motor data
0
0 comments X

The pith

Embeddings from a pre-trained large language model can replace hand-crafted features as inputs to a standard actuarial pricing model for Poisson claim-frequency regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether embeddings computed from natural-language policyholder descriptions can substitute for manually engineered risk factors in classical insurance pricing. On French motor third-party liability data the embedding inputs fed into a generalized linear model yield lower prediction error than the usual hand-crafted features, with the largest gains appearing when the training sample is small. At bigger sample sizes the advantage becomes sensitive to model choice and embedding dimension. The authors also show that domain-specific fine-tuning of the embeddings helps and that the pipeline changes output when any out-of-template text is added to the prompt.

Core claim

Embeddings from a pre-trained large language model, computed from a natural-language description of each policyholder, can replace hand-crafted features as inputs to a standard actuarial pricing model. Using French motor third-party liability data the embedding-based model outperforms the generalized linear model, especially when data are scarce, whereas at larger sample sizes the comparison is model- and dimension-dependent. Insurance-specific fine-tuning further improves the embeddings, and a prompt-sensitivity diagnostic shows that the pipeline reacts to any appended out-of-template field, making controlled prompts a governance requirement.

What carries the argument

Pre-trained language-model embeddings treated as fixed covariates inside a generalized linear model for Poisson claim-frequency regression.

Load-bearing premise

The supplied natural-language policy descriptions already contain the risk-relevant information that the embeddings extract without systematic bias or loss of actuarial signal.

What would settle it

A new motor-insurance dataset in which the embedding-based generalized linear model shows no improvement over the hand-crafted-feature version on small training samples would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.29371 by Christopher Blier-Wong, Derek Kusmenko.

Figure 1
Figure 1. Figure 1: Overview of the embedding-based pricing pipeline: structured covariates are serialized into [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Default instruction-prefixed underwriting-list prompt for generating LLM embeddings [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Learning curves comparing the baseline GLM (48 hand-engineered features) against [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of embedding dimension and reduction method on test deviance across embedding [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Test mean Poisson deviance for two ways of combining Qwen embeddings with the [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: shows the main effect of fine-tuning for the two strongest adapters in the evaluated grid. Fine-tuning shifts the Qwen PCA curve downward at every tested dimension, with the largest gain at K = 48. The 100,000-pair, two-epoch adapter provides the strongest low-dimensional embedding-only result, while the 200,000-pair, two-epoch adapter gives the best high-dimensional result. Adding the raw GLM covariates t… view at source ↗
Figure 7
Figure 7. Figure 7: Learning curves for Qwen fine-tuning at K = 48. The left panel shows the full range from n = 1,000 to n = 500,000; the right panel zooms in on the stable medium- and large-sample regime. Raw-plus-LoRA appends the 48 GLM covariates to the 48 fine-tuned PCA components. The 100,000-pair and 200,000-pair adapters shown are the two-epoch variants, matching the labelling of [PITH_FULL_IMAGE:figures/full_fig_p02… view at source ↗
Figure 8
Figure 8. Figure 8: Distribution summaries of individual-level predicted frequency shifts for the five prompt [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: UMAP projection of the Qwen3-Embedding-0.6B policyholder embeddings, colored by [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Per-band feature profile. Each cell prints the mean of that feature over the policies in the [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Exposure-weighted claim frequency by band, computed on the full [PITH_FULL_IMAGE:figures/full_fig_p034_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Prose narrative prompt. The prose format ( [PITH_FULL_IMAGE:figures/full_fig_p035_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Minimal key-value prompt. The minimal format ( [PITH_FULL_IMAGE:figures/full_fig_p036_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Task-instruction list prompt. This format ( [PITH_FULL_IMAGE:figures/full_fig_p036_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative descriptor prompt. This format ( [PITH_FULL_IMAGE:figures/full_fig_p037_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Long underwriting prompt. The long underwriting prompt ( [PITH_FULL_IMAGE:figures/full_fig_p038_16.png] view at source ↗
read the original abstract

Classical actuarial pricing models, such as the generalized linear model, are valued for transparency and ease of governance, but they use interactions among risk factors only when these are supplied through explicit feature engineering. We study whether embeddings from a pre-trained large language model, computed from a natural-language description of each policyholder, can replace hand-crafted features as inputs to a standard actuarial pricing model, taking Poisson claim-frequency regression as the main example. The language model is used only to construct deterministic embedding covariates; pricing is performed by a standard generalized linear model. Using French motor third-party liability data, the embedding-based model outperforms the generalized linear model, especially when data are scarce, whereas at larger sample sizes the comparison is model- and dimension-dependent. Insurance-specific fine-tuning further improves the embeddings, and a prompt-sensitivity diagnostic shows that the pipeline reacts to any appended out-of-template field, making controlled prompts a governance requirement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes replacing hand-crafted features with deterministic embeddings from a pre-trained LLM applied to natural-language policyholder descriptions as covariates in a standard GLM for Poisson claim-frequency regression. On French motor third-party liability data, the embedding-based GLM outperforms the baseline GLM (especially in low-data regimes), with further gains from insurance-specific fine-tuning; a prompt-sensitivity check indicates the need for controlled prompts.

Significance. If the performance gains are robust and attributable to extractable actuarial signal in the text, the approach offers a way to incorporate unstructured data into transparent, governance-friendly pricing models without abandoning GLMs. The deterministic embedding step and retention of the GLM are strengths that keep the method within existing actuarial workflows. The low-data regime result, if confirmed with proper controls, would be particularly relevant for new lines or small portfolios.

major comments (2)
  1. [Abstract, §3] Abstract and §3 (data description): the central claim that embeddings 'can replace hand-crafted features' and outperform them rests on the assumption that the natural-language descriptions encode risk-relevant information not already captured by the baseline covariates. The manuscript provides no details on whether these descriptions are free-form applicant text, templated conversions of the structured fields, or richer external text; without this, outperformance cannot be unambiguously attributed to semantic extraction rather than implicit non-linear feature engineering or data leakage.
  2. [§4] §4 (experimental results): the abstract states outperformance 'especially when data are scarce' and 'model- and dimension-dependent' at larger sizes, yet the provided text contains no quantitative metrics, confidence intervals, data-split details, or statistical significance tests. This absence makes the load-bearing empirical claim unverifiable from the given information and requires explicit tables or figures with effect sizes.
minor comments (2)
  1. [Abstract] Abstract: the claim of outperformance should be accompanied by at least one quantitative metric (e.g., relative improvement in Poisson deviance or log-likelihood) even in the abstract.
  2. [§2] Notation: clarify whether the GLM is fit on the raw embeddings or on a reduced-dimensional projection, and state the exact dimension choice and any regularization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for clarification and strengthening of the empirical presentation. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (data description): the central claim that embeddings 'can replace hand-crafted features' and outperform them rests on the assumption that the natural-language descriptions encode risk-relevant information not already captured by the baseline covariates. The manuscript provides no details on whether these descriptions are free-form applicant text, templated conversions of the structured fields, or richer external text; without this, outperformance cannot be unambiguously attributed to semantic extraction rather than implicit non-linear feature engineering or data leakage.

    Authors: We agree that the source and construction of the natural-language descriptions must be specified to support attribution of performance gains to semantic content rather than leakage or implicit engineering. In the revised manuscript we will expand the data description in §3 to state explicitly how the descriptions are obtained (free-form applicant text versus any templating or conversion from structured fields) and will add a short discussion of how this affects interpretation of the results. revision: yes

  2. Referee: [§4] §4 (experimental results): the abstract states outperformance 'especially when data are scarce' and 'model- and dimension-dependent' at larger sizes, yet the provided text contains no quantitative metrics, confidence intervals, data-split details, or statistical significance tests. This absence makes the load-bearing empirical claim unverifiable from the given information and requires explicit tables or figures with effect sizes.

    Authors: We accept that the current manuscript version does not present the quantitative results with sufficient detail for independent verification. In the revision we will add tables and figures in §4 that report the relevant performance metrics together with confidence intervals, explicit data-split and cross-validation procedures, and statistical significance tests comparing the embedding-based and baseline GLMs. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison of covariate sets in standard GLM

full rationale

The paper's central claim is an empirical performance comparison: LLM embeddings computed from natural-language policy descriptions are substituted as covariates into an otherwise standard Poisson GLM and evaluated against a baseline GLM using hand-crafted features on French motor data. No equations, uniqueness theorems, or predictions are derived; the pipeline is a deterministic embedding step followed by off-the-shelf GLM fitting and out-of-sample evaluation. No self-citations are load-bearing for the result, no fitted parameters are relabeled as predictions, and no ansatz is smuggled via prior work. The derivation chain is self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that LLM embeddings preserve actuarial signal from text without post-hoc selection or bias; no free parameters, axioms, or invented entities are explicitly introduced beyond standard GLM and pre-trained model usage.

pith-pipeline@v0.9.1-grok · 5676 in / 1092 out tokens · 23533 ms · 2026-06-30T02:10:17.663787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 29 canonical work pages · 6 internal anchors

  1. [1]

    British Actuarial Journal , volume =

    Balona, Caesar , year =. British Actuarial Journal , volume =. doi:10.1017/S1357321724000102 , urldate =

  2. [2]

    Annals of Actuarial Science , volume =

    Richman, Ronald , year =. Annals of Actuarial Science , volume =. doi:10.1017/S1748499520000238 , urldate =

  3. [3]

    Annals of Actuarial Science , volume =

    Richman, Ronald , year =. Annals of Actuarial Science , volume =. doi:10.1017/S174849952000024X , urldate =

  4. [4]

    2023 , journal =

    Non-Life Insurance Risk Classification Using Categorical Embedding , author =. 2023 , journal =. doi:10.1080/10920277.2022.2123361 , urldate =

  5. [5]

    2024 , month = dec, journal =

    Enhancing Actuarial Non-Life Pricing Models via Transformers , author =. 2024 , month = dec, journal =. doi:10.1007/s13385-024-00388-2 , urldate =

  6. [6]

    2021 , month = jan, journal =

    Machine Learning in. 2021 , month = jan, journal =. doi:10.3390/risks9010004 , urldate =

  7. [7]

    arXiv preprint arXiv:2102.05784 , year =

    Rethinking Representations in. arXiv preprint arXiv:2102.05784 , year =

  8. [8]

    2024 , journal =

    High-cardinality categorical covariates in network regressions , author =. 2024 , journal =. doi:10.1007/s42081-024-00243-4 , urldate =

  9. [9]

    Avanzi, Benjamin and Taylor, Greg and Wang, Melantha and Wong, Bernard , year =. Machine. ASTIN Bulletin: The Journal of the IAA , volume =. doi:10.1017/asb.2024.7 , urldate =

  10. [10]

    Lee, Jinhyuk and Dai, Zhuyun and Ren, Xiaoqi and Chen, Blair and Cer, Daniel and Cole, Jeremy R. and Hui, Kai and Boratko, Michael and Kapadia, Rajvi and Ding, Wen and Luan, Yi and Duddu, Sai Meher Karthik and Abrego, Gustavo Hernandez and Shi, Weiqiang and Gupta, Nithi and Kusupati, Aditya and Jain, Prateek and Jonnalagadda, Siddhartha Reddy and Chang, M...

  11. [11]

    Gemini Embedding: Generalizable Embeddings from Gemini

    Lee, Jinhyuk and Chen, Feiyang and Dua, Sahil and Cer, Daniel and Shanbhogue, Madhuri and Naim, Iftekhar and. Gemini Embedding: Generalizable Embeddings from. arXiv preprint arXiv:2503.07891 , year =

  12. [12]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 Embedding: Advancing Text Embedding and Reranking through Foundation Models , author =. arXiv preprint arXiv:2506.05176 , year =

  13. [13]

    Text and Code Embeddings by Contrastive Pre-Training

    Text and Code Embeddings by Contrastive Pre-Training , author =. arXiv preprint arXiv:2201.10005 , year =

  14. [14]

    Attention Is All You Need , booktitle =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser,. Attention Is All You Need , booktitle =

  15. [15]

    ASTIN Bulletin , volume =

    Geographic Ratemaking with Spatial Embeddings , author =. ASTIN Bulletin , volume =. doi:10.1017/asb.2021.25 , urldate =

  16. [16]

    ASTIN Bulletin: The Journal of the IAA , volume =

    A Representation-Learning Approach for Insurance Pricing with Images , author =. ASTIN Bulletin: The Journal of the IAA , volume =. doi:10.1017/asb.2024.9 , urldate =

  17. [17]

    arXiv preprint arXiv:2511.17954 , doi =

    A Multi-View Contrastive Learning Framework for Spatial Embeddings in Risk Modelling , author =. arXiv preprint arXiv:2511.17954 , doi =

  18. [18]

    North American Actuarial Journal , doi =

    Neural Networks for Insurance Pricing with Frequency and Severity Data: A Benchmark Study from Data Preprocessing to Technical Tariff , author =. North American Actuarial Journal , doi =

  19. [19]

    ASTIN Bulletin , volume =

    Actuarial Applications of Word Embedding Models , author =. ASTIN Bulletin , volume =. doi:10.1017/asb.2019.28 , urldate =

  20. [20]

    doi:10.1016/j.insmatheco.2022.07.013 , urldate =

    Xu, Shuzhe and Zhang, Chuanlong and Hong, Don , year = 2022, month = nov, journal =. doi:10.1016/j.insmatheco.2022.07.013 , urldate =

  21. [21]

    2021 , journal =

    Mining Actuarial Risk Predictors in Accident Descriptions Using Recurrent Neural Networks , author =. 2021 , journal =. doi:10.3390/risks9010007 , urldate =

  22. [22]

    Dong, Panyi and Quan, Zhiyu , journal =

  23. [23]

    Lee, Chankyu and Roy, Rajarshi and Xu, Mengyao and Raiman, Jonathan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei , journal =

  24. [24]

    Babakhin, Yauhen and Osmulski, Radek and Ak, Ronay and Moreira, Gabriel and Xu, Mengyao and Schifferer, Benedikt and Liu, Bo and Oldridge, Even , journal =. Llama-

  25. [25]

    Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin...

  26. [26]

    2023 , journal =

    The use of autoencoders for training neural networks with mixed categorical and numerical features , author =. 2023 , journal =. doi:10.1017/asb.2023.15 , urldate =

  27. [27]

    Text Mining in Insurance: From Unstructured Data to Meaning , shorttitle =

    Zappa, Diego and Borrelli, Mattia and Clemente, Gian Paolo and Savelli, Nino , year = 2021, journal =. Text Mining in Insurance: From Unstructured Data to Meaning , shorttitle =

  28. [28]

    Annals of Actuarial Science , volume =

    On Clustering Levels of a Hierarchical Categorical Risk Factor , author =. Annals of Actuarial Science , volume =. doi:10.1017/S1748499523000283 , urldate =

  29. [29]

    Advanced Applications of Generative

    Hatzesberger, Simon and Nonneman, Iris , journal =. Advanced Applications of Generative

  30. [30]

    arXiv preprint arXiv:2206.02014 , year =

    Actuarial Applications of Natural Language Processing Using Transformers , author =. arXiv preprint arXiv:2206.02014 , year =

  31. [31]

    Operationalizing

    Balona, Caesar , year = 2025, pages =. Operationalizing

  32. [32]

    Hegselmann, Stefan and Buendia, Alejandro and Lang, Hunter and Agrawal, Monica and Jiang, Xiaoyi and Sontag, David , booktitle=. Tab. 2023 , publisher=

  33. [33]

    Dinh, Tuan and Zeng, Yuchen and Zhang, Ruisu and Lin, Ziqian and Gira, Michael and Rajput, Shashank and Sohn, Jy-yong and Papailiopoulos, Dimitris and Lee, Kangwook , booktitle=

  34. [34]

    Large Language Models (

    Fang, Xi and Xu, Weijie and Tan, Fiona Anting and Zhang, Jiani and Hu, Ziqing and Qi, Yanjun and Nickleach, Scott and Sber, Diego and Gorbachev, Artem and Hou, Ellie , journal=. Large Language Models (

  35. [35]

    , title =

    Ono, Kyoka and Lee, Simon A. , title =. Proceedings of the 41st International Conference on Machine Learning,. 2024 , eprint =

  36. [36]

    Koloski, Boshko and Perdih, Timen and Pollak, Senja , journal=

  37. [37]

    Villalobos Carballo, Karina and Joshi, Shaan and Ren, Haoran and Lanusse, Fran. Tab. arXiv preprint arXiv:2206.10381 , year=

  38. [38]

    Enriching Tabular Data with Contextual

    Kasneci, Enkelejda and Kasneci, Gjergji , journal=. Enriching Tabular Data with Contextual

  39. [39]

    Latte: Transferring

    Shi, Han and Gao, Jiahui and Xu, Hang and Liang, Xiaodan and Li, Zhenguo , journal=. Latte: Transferring

  40. [40]

    arXiv preprint arXiv:2406.12031 , year=

    Large Scale Transfer Learning for Tabular Data via Language Modeling , author=. arXiv preprint arXiv:2406.12031 , year=

  41. [41]

    arXiv preprint arXiv:2602.15844 , year=

    Language Model Representations for Efficient Few-Shot Tabular Classification , author=. arXiv preprint arXiv:2602.15844 , year=

  42. [42]

    Wang, Ruijie and Wang, Yumo and Li, Ondrej , journal=. Uni

  43. [43]

    arXiv preprint arXiv:2310.07338 , year=

    From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models , author=. arXiv preprint arXiv:2310.07338 , year=

  44. [44]

    Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

    Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science , author=. arXiv preprint arXiv:2403.20208 , year=

  45. [45]

    Haque, Radiah and Goh, Hui-Ngo and Ting, Choo-Yee and Quek, Albert and Hasan, M. D. Rakibul , title =. Computers and Education: Artificial Intelligence , year =

  46. [46]

    Large Language Models for Automated Data Science: Introducing

    Hollmann, Noah and M. Large Language Models for Automated Data Science: Introducing. NeurIPS 2023 Workshop on Synthetic Data Generation with Generative AI , year=

  47. [47]

    Proceedings of the 41st International Conference on Machine Learning , year=

    Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning , author=. Proceedings of the 41st International Conference on Machine Learning , year=

  48. [48]

    , title =

    Abhyankar, Nikhil and Shojaee, Parshin and Reddy, Chandan K. , title =. 2025 , eprint =

  49. [49]

    Nature , volume=

    Accurate Predictions on Small Data with a Tabular Foundation Model , author=. Nature , volume=. 2025 , publisher=

  50. [50]

    Ma, Junwei and Nie, Valentin Thomas and Ri, Taro and Dyer, Chris , journal=. Tab

  51. [51]

    Huang, Xin and Khetan, Ashish and Cella, Milan and Dhir, Sarthak , booktitle=. Tab

  52. [52]

    Advances in Neural Information Processing Systems , volume=

    Revisiting Deep Learning Models for Tabular Data , author=. Advances in Neural Information Processing Systems , volume=

  53. [53]

    Advances in Neural Information Processing Systems , volume=

    Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data? , author=. Advances in Neural Information Processing Systems , volume=

  54. [54]

    Wang, Zifeng and Sun, Jimeng , journal=. Trans

  55. [55]

    Kim, Myung Jun and Feuerriegel, Stefan and Hatt, Tobias , journal=

  56. [56]

    Sentence-

    Reimers, Nils and Gurevych, Iryna , booktitle=. Sentence-

  57. [57]

    Text Embeddings by Weakly-Supervised Contrastive Pre-training

    Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. arXiv preprint arXiv:2212.03533 , year=

  58. [58]

    Findings of the Association for Computational Linguistics: ACL 2023 , pages=

    One Embedder, Any Task: Instruction-Finetuned Text Embeddings , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

  59. [59]

    Advances in Neural Information Processing Systems , volume=

    Matryoshka Representation Learning , author=. Advances in Neural Information Processing Systems , volume=

  60. [60]

    , journal=

    Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S. , journal=. Co

  61. [61]

    and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , journal=

    Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , journal=

  62. [62]

    Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung , journal=

  63. [63]

    Available at SSRN 3491790 , year=

    From Generalized Linear Models to Neural Networks, and Back , author=. Available at SSRN 3491790 , year=

  64. [64]

    2023 , publisher=

    Statistical Foundations of Actuarial Learning and its Applications , author=. 2023 , publisher=

  65. [65]

    Entity Embeddings of Categorical Variables

    Entity Embeddings of Categorical Variables , author=. arXiv preprint arXiv:1604.06737 , year=

  66. [66]

    arXiv preprint arXiv:1910.03072 , year=

    Sequence Embeddings Help to Identify Fraudulent Cases in Healthcare Insurance , author=. arXiv preprint arXiv:1910.03072 , year=

  67. [67]

    Dutang, Christophe and Charpentier, Arthur , journal=. fre. 2020 , note=

  68. [68]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Representation Learning: A Review and New Perspectives , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

  69. [69]

    Enhancing Auto Insurance Risk Evaluation with Transformer and

    Sun, Fengyi and Chen, Rui and Wang, Yanyan , journal=. Enhancing Auto Insurance Risk Evaluation with Transformer and

  70. [70]

    arXiv preprint , year=

    Large Language Models for Insurance Intelligence , author=. arXiv preprint , year=

  71. [71]

    arXiv preprint , year=

    Assessing Insurers' Litigation Risk: Claim Dispute Prediction with Actionable Interpretations Using Machine Learning , author=. arXiv preprint , year=

  72. [72]

    Proceedings of the Fourth ACM International Conference on AI in Finance , pages=

    Large language models in finance: A survey , author=. Proceedings of the Fourth ACM International Conference on AI in Finance , pages=