Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
A survey of 457 papers yields a six-dimensional design space for abstraction in interactive systems that reframes gulfs of execution and evaluation while articulating cognitive and design processes for bridging abstraction gaps.
An LLM-in-the-loop study with 17 interviewers identifies five ethical concerns with AI-generated follow-up questions and translates them into design and governance implications.
User study finds that task difficulty affects keystroke dynamics during LLM prompting as a marker of cognitive effort, while device type has weaker effects and keystrokes do not predict perceived output usefulness.
Position paper proposing Model Science as a discipline to systematically analyze AI model behavior beyond benchmarks, drawing analogies from cognitive science, neuroscience, medicine, and agriculture.
citing papers explorer
-
Evalet: Evaluating Large Language Models through Functional Fragmentation
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.