GPT-4o identified only 21.2% of the usability issues found by human experts in heuristic evaluation, while discovering 27 additional issues and exhibiting difficulties with certain heuristics and generating false positives.
arXiv preprint arXiv:2305.13014 (2023)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A method merges codebooks via LLM and evaluates human and AI inductive coding with four new metrics on an online conversation dataset.
citing papers explorer
-
Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation
GPT-4o identified only 21.2% of the usability issues found by human experts in heuristic evaluation, while discovering 27 additional issues and exhibiting difficulties with certain heuristics and generating false positives.
-
A Computational Method for Measuring "Open Codes" in Qualitative Analysis
A method merges codebooks via LLM and evaluates human and AI inductive coding with four new metrics on an online conversation dataset.