Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis

Andrea Sikora; Huaqin Zhao; Peng Shu; Shaochen Xu; Sheng Li; Tianming Liu; Wenxiong Liao; Xiang Li; Zhengliang Liu; Zihao Wu

arxiv: 2402.11398 · v2 · pith:CGKKVPBLnew · submitted 2024-02-17 · 💻 cs.CL · cs.AI

Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis

Shaochen Xu , Zihao Wu , Huaqin Zhao , Peng Shu , Zhengliang Liu , Wenxiong Liao , Sheng Li , Andrea Sikora

show 2 more authors

Tianming Liu Xiang Li

This is my paper

classification 💻 cs.CL cs.AI

keywords similarityanalysismetricssemantictextframeworkspecializeddata

0 comments

read the original abstract

In this study, we leverage LLM to enhance the semantic analysis and develop similarity metrics for texts, addressing the limitations of traditional unsupervised NLP metrics like ROUGE and BLEU. We develop a framework where LLMs such as GPT-4 are employed for zero-shot text identification and label generation for radiology reports, where the labels are then used as measurements for text similarity. By testing the proposed framework on the MIMIC data, we find that GPT-4 generated labels can significantly improve the semantic similarity assessment, with scores more closely aligned with clinical ground truth than traditional NLP metrics. Our work demonstrates the possibility of conducting semantic analysis of the text data using semi-quantitative reasoning results by the LLMs for highly specialized domains. While the framework is implemented for radiology report similarity analysis, its concept can be extended to other specialized domains as well.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
TelcoAgent-Bench: A Multilingual Benchmark for Telecom AI Agents
cs.CL 2026-03 unverdicted novelty 7.0

TelcoAgent-Bench is a new framework that evaluates how well multilingual LLM agents recognize intents, execute troubleshooting steps, and stay consistent across variations in telecom scenarios.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 6.0

REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-sou...