A cross-domain evaluation pipeline for AI meeting summaries finds no statistically significant accuracy differences among GPT models but shows gpt-5.1 leading on retention metrics like completeness (0.886) and coverage (0.942).
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Evaluating AI Meeting Summaries with a Reusable Cross-Domain Pipeline
A cross-domain evaluation pipeline for AI meeting summaries finds no statistically significant accuracy differences among GPT models but shows gpt-5.1 leading on retention metrics like completeness (0.886) and coverage (0.942).