pith. machine review for the scientific record. sign in

arxiv: 2508.16165 · v2 · submitted 2025-08-22 · 💻 cs.SE · cs.AI· cs.HC

Recognition: unknown

Investigating Multimodal Large Language Models to Support Usability Evaluation

Authors on Pith no claims yet
classification 💻 cs.SE cs.AIcs.HC
keywords usabilityevaluationmllmssupportissueslanguagelargemodels
0
0 comments X
read the original abstract

Usability evaluation is an essential method to support the design of effective and intuitive user interfaces (UIs). However, it commonly relies on resource-intensive, expert-driven methods, which limit its accessibility, especially for small organizations. Recent multimodal large language models (MLLMs) have the potential to support usability evaluation by analyzing textual instructions together with visual UI context. This paper investigates the use of MLLMs as assistive tools for usability evaluation by framing the task as a prioritization problem. It identifies and explains usability issues and ranks them by severity. We report a study that compares the evaluations generated by multiple MLLMs with assessments from usability experts. The results demonstrate that MLLMs can offer complementary insights and support the efficient prioritization of critical issues. Additionally, we present an interactive visualization tool that enables the transparent review and validation of model-generated findings. Based on this, we outline concepts for integrating MLLM-based usability evaluation into real-world development workflows.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Recommending Usability Improvements with Multimodal Large Language Models

    cs.SE 2026-04 unverdicted novelty 6.0

    Multimodal LLMs can detect usability issues from screen recordings, explain them via Nielsen's heuristics, and rank improvement recommendations, with engineer feedback indicating practical usefulness for teams lacking...