Design, Results and Industry Implications of the World’s First Insurance Large Language Model Evaluation Benchmark (CUFEInse)

Hua Zhou et al · 2025 · arXiv 2511.07794

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

ActuBench is a multi-agent LLM pipeline for generating and evaluating actuarial reasoning tasks, with evaluations of 50 models showing effective verification, competitive local open-weights models, and differing rankings between MCQ and LLM-judge scoring.

citing papers explorer

Showing 1 of 1 citing paper.

ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks cs.AI · 2026-04-22 · unverdicted · none · ref 28
ActuBench is a multi-agent LLM pipeline for generating and evaluating actuarial reasoning tasks, with evaluations of 50 models showing effective verification, competitive local open-weights models, and differing rankings between MCQ and LLM-judge scoring.

Design, Results and Industry Implications of the World’s First Insurance Large Language Model Evaluation Benchmark (CUFEInse)

fields

years

verdicts

representative citing papers

citing papers explorer