Introduces Lexical Alignment Score and Triangulated Preference Shift metrics to automatically identify lexical overuse in LLMs and attribute portions to preference learning stages via windowed prevalence on PubMed data.
arXiv preprint arXiv:2409.11704 , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
citing papers explorer
-
Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models
Introduces Lexical Alignment Score and Triangulated Preference Shift metrics to automatically identify lexical overuse in LLMs and attribute portions to preference learning stages via windowed prevalence on PubMed data.
-
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.