Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.
Zamfirescu-Pereira, Björn Hartmann, Aditya G
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Style bias dominates LLM-as-a-Judge systems far more than position bias, with debiasing strategies providing model-dependent gains and public tools released for replication.
CausaDisco integrates Aristotle's Four Causes into LLM prompts to produce more engaging, exploratory, and multifaceted self-learning dialogues, as evidenced by controlled user studies.
citing papers explorer
-
Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild
Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.
-
Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines
Style bias dominates LLM-as-a-Judge systems far more than position bias, with debiasing strategies providing model-dependent gains and public tools released for replication.
-
Enhanced Self-Learning with Epistemologically-Informed LLM Dialogue
CausaDisco integrates Aristotle's Four Causes into LLM prompts to produce more engaging, exploratory, and multifaceted self-learning dialogues, as evidenced by controlled user studies.