From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports -- with Preliminary Extension to Lung Cancer

Jie Cheng; Qiuli Wang; Wei Chen; Xiaoming Li; Xingpeng Zhang; Xinhuang Sun; Yonglin Chen; Yongxu Liu

arxiv: 2510.23008 · v3 · pith:2IAFQWL7new · submitted 2025-10-27 · 💻 cs.AI

From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports -- with Preliminary Extension to Lung Cancer

Qiuli Wang , Xinhuang Sun , Yonglin Chen , Jie Cheng , Yongxu Liu , Xingpeng Zhang , Xiaoming Li , Wei Chen This is my paper

classification 💻 cs.AI

keywords frameworkllm-generatedpromptreportstrustworthinesscredibilityguidanceliver

0 comments

read the original abstract

Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.

This paper has not been read by Pith yet.

From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports -- with Preliminary Extension to Lung Cancer

discussion (0)