WASIL is a released dataset of 8,529 in-the-wild Arabic spoken LLM interactions with audio, ASR hypotheses, responses, explicit like/dislike feedback, answerability annotations, a 2,000-turn MSA and dialect test set, and a reference-free multi-judge LLM evaluation method.
Computing inter-rater reliability and its variance in the presence of high agreement,
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
Simulator experiments revealed correlations between steering conflicts, reaction times, and drivers' sense of control in partial automation, highlighting design needs for better intention alignment and intervention ease.
citing papers explorer
-
WASIL: In-the-Wild Arabic Spoken Interactions with LLMs
WASIL is a released dataset of 8,529 in-the-wild Arabic spoken LLM interactions with audio, ASR hypotheses, responses, explicit like/dislike feedback, answerability annotations, a 2,000-turn MSA and dialect test set, and a reference-free multi-judge LLM evaluation method.
-
Linking Behaviour and Perception to Evaluate Meaningful Human Control over Partially Automated Driving
Simulator experiments revealed correlations between steering conflicts, reaction times, and drivers' sense of control in partial automation, highlighting design needs for better intention alignment and intervention ease.