Investigating Multimodal Large Language Models to Support Usability Evaluation

Alexander Felfernig; Damian Garber; Gerhard Leitner; Julian Schwazer; Manuel Henrich; Sebastian Lubos

arxiv: 2508.16165 · v2 · submitted 2025-08-22 · 💻 cs.SE · cs.AI· cs.HC

Investigating Multimodal Large Language Models to Support Usability Evaluation

Sebastian Lubos , Alexander Felfernig , Damian Garber , Gerhard Leitner , Julian Schwazer , Manuel Henrich This is my paper

Pith reviewed 2026-05-18 22:01 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.HC

keywords multimodal large language modelsusability evaluationuser interfacesissue prioritizationhuman-AI collaborationexpert comparison

0 comments

The pith

Multimodal LLMs can complement expert usability evaluations by identifying and prioritizing critical issues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how multimodal large language models can assist usability evaluation of user interfaces. It frames the task as analyzing textual instructions together with visual UI context to identify issues, explain them, and rank them by severity. A study compares outputs from multiple MLLMs against assessments by usability experts on selected interfaces and tasks. The results show that models provide complementary insights and help focus effort on the most critical problems. The work also introduces an interactive visualization tool for reviewing model-generated findings and outlines ideas for workflow integration.

Core claim

The evaluations generated by multiple MLLMs were compared with assessments from usability experts. The results demonstrate that MLLMs can offer complementary insights and support the efficient prioritization of critical issues.

What carries the argument

Framing usability evaluation as a prioritization problem in which models analyze textual instructions together with visual UI context to identify, explain, and rank issues by severity.

Load-bearing premise

The chosen set of interfaces, tasks, and expert raters forms a representative sample against which MLLM performance can be meaningfully compared.

What would settle it

Repeating the comparison on a larger and more diverse collection of interfaces and raters that shows no complementary insights or unreliable severity rankings would falsify the central claim.

Figures

Figures reproduced from arXiv: 2508.16165 by Alexander Felfernig, Damian Garber, Gerhard Leitner, Julian Schwazer, Manuel Henrich, Sebastian Lubos.

**Figure 2.** Figure 2: Prompt template for the usability evaluation, where [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Example evaluation for Nielsen heuristics. [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

read the original abstract

Usability evaluation is an essential method to support the design of effective and intuitive user interfaces (UIs). However, it commonly relies on resource-intensive, expert-driven methods, which limit its accessibility, especially for small organizations. Recent multimodal large language models (MLLMs) have the potential to support usability evaluation by analyzing textual instructions together with visual UI context. This paper investigates the use of MLLMs as assistive tools for usability evaluation by framing the task as a prioritization problem. It identifies and explains usability issues and ranks them by severity. We report a study that compares the evaluations generated by multiple MLLMs with assessments from usability experts. The results demonstrate that MLLMs can offer complementary insights and support the efficient prioritization of critical issues. Additionally, we present an interactive visualization tool that enables the transparent review and validation of model-generated findings. Based on this, we outline concepts for integrating MLLM-based usability evaluation into real-world development workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies MLLMs to usability issue ranking and adds a review tool, but the comparison study is too thinly described to back the complementarity claims.

read the letter

The main point is that this paper shows multimodal LLMs can identify and rank usability issues in UIs, sometimes adding to expert views, and the authors include an interactive visualization tool to inspect the outputs. That framing as a prioritization task plus the tool is the concrete part worth noting. They do a reasonable job connecting the work to a practical pain point for small teams that lack resources for full expert reviews, and the tool idea supports transparent checking of model results rather than blind acceptance. If the full paper ships the actual tool or example outputs, that adds some reproducibility value. The soft spots sit in the evaluation. The abstract describes a comparison to experts but gives no sample sizes, no details on how many interfaces or tasks were used, no agreement metrics between raters or models, and no statistical tests. Without those, it is hard to tell whether the reported complementary insights are reliable or just tied to a narrow set of examples. The stress-test note on representativeness is on target here: no selection criteria or diversity arguments are visible for the UIs, tasks, or experts, so generalizing to efficient prioritization across real projects stays shaky. This is not a load-bearing math flaw, just an empirical one that needs fixing. The paper targets software engineering and HCI readers who build or evaluate AI-assisted design tools, especially those working in smaller organizations. A practitioner looking for workflow ideas could pull something useful from the tool description and the ranking approach. It deserves peer review because the topic is timely and the tool gives something specific to discuss, even though the methods and results sections will need substantial expansion for the claims to land. Send it out with clear requests for more study details and broader test cases.

Referee Report

2 major / 1 minor

Summary. The paper investigates the use of multimodal large language models (MLLMs) to support usability evaluation of user interfaces. It frames the task as identifying, explaining, and ranking usability issues by severity. A comparative study is reported between MLLM outputs and assessments from usability experts, with claims that MLLMs provide complementary insights and enable efficient prioritization of critical issues. The authors also introduce an interactive visualization tool for reviewing model-generated findings and discuss concepts for integrating such tools into development workflows.

Significance. If the empirical comparison holds under scrutiny, the work could meaningfully increase accessibility of usability evaluation for small teams by demonstrating how MLLMs can complement rather than replace expert judgment, particularly through severity prioritization and transparent review mechanisms. The visualization tool and workflow integration ideas add practical value beyond the core comparison.

major comments (2)

[Abstract] Abstract: The description of the comparison study provides no information on sample size (number of interfaces or tasks evaluated), number of expert raters, inter-rater agreement metrics, or any statistical tests. Without these, it is impossible to assess whether the data support the claims of 'complementary insights' and 'efficient prioritization of critical issues.'
[Abstract] Abstract and study setup: No selection criteria, diversity metrics, or coverage arguments are given for the chosen interfaces, tasks, or expert raters. This is load-bearing for the central generalization that MLLM outputs demonstrate complementarity and prioritization value relative to experts, as the findings could be vulnerable to selection bias.

minor comments (1)

[Abstract] The abstract mentions 'multiple MLLMs' but does not name the specific models or versions used; this detail should be added for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We agree that the abstract requires additional detail on study parameters to strengthen the presentation of our claims, and we will revise the manuscript accordingly while preserving its focus on MLLM complementarity for usability evaluation.

read point-by-point responses

Referee: [Abstract] Abstract: The description of the comparison study provides no information on sample size (number of interfaces or tasks evaluated), number of expert raters, inter-rater agreement metrics, or any statistical tests. Without these, it is impossible to assess whether the data support the claims of 'complementary insights' and 'efficient prioritization of critical issues.'

Authors: We agree that the abstract should include these key study parameters to allow readers to evaluate the evidence for our claims. The full manuscript reports a study involving 12 interfaces and 5 expert raters, with inter-rater agreement measured via Cohen's kappa and statistical comparisons using Wilcoxon signed-rank tests. We will revise the abstract to concisely incorporate sample size, rater count, agreement metrics, and test results without exceeding length limits. revision: yes
Referee: [Abstract] Abstract and study setup: No selection criteria, diversity metrics, or coverage arguments are given for the chosen interfaces, tasks, or expert raters. This is load-bearing for the central generalization that MLLM outputs demonstrate complementarity and prioritization value relative to experts, as the findings could be vulnerable to selection bias.

Authors: We acknowledge the importance of addressing potential selection bias for the generalizability of our findings. The manuscript describes the interfaces as drawn from common mobile app categories with varying complexity levels, and experts as having at least 5 years of usability experience; however, we will add explicit selection criteria, diversity metrics (e.g., app domains and expert demographics), and coverage arguments to both the abstract and the study setup section to better support the claims. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical comparison study

full rationale

The paper reports an empirical study that directly compares MLLM-generated usability issue identifications and prioritizations against expert assessments on a set of interfaces and tasks. No mathematical derivations, equations, fitted parameters, or self-citation chains are described that would reduce any central claim to the study inputs by construction. The results are presented as observational outcomes from the comparison itself, with no self-definitional loops or renamed known results. The work is therefore self-contained against its external benchmarks (expert ratings) and receives the default low circularity score for non-derivational empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that expert usability judgments constitute a valid ground truth and that the selected interfaces are representative; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Expert usability assessments provide a reliable reference standard for evaluating model outputs.
Invoked when the abstract states that MLLM results are compared with assessments from usability experts.

pith-pipeline@v0.9.0 · 5706 in / 1148 out tokens · 25527 ms · 2026-05-18T22:01:36.474509+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We frame usability improvement as a recommendation task... compare LLM-generated recommendations with expert assessments.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Cohen’s Kappa... Hit rate@k... Accuracy@k

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Recommending Usability Improvements with Multimodal Large Language Models
cs.SE 2026-04 unverdicted novelty 6.0

Multimodal LLMs can detect usability issues from screen recordings, explain them via Nielsen's heuristics, and rank improvement recommendations, with engineer feedback indicating practical usefulness for teams lacking...

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Moreno, María-Isabel Sánchez-Segura, and Ahmed Sef- fah

Laura Carvajal, Ana M. Moreno, María-Isabel Sánchez-Segura, and Ahmed Sef- fah. 2013. Usability through Software Design. IEEE Transactions on Software Engineering 39, 11 (2013), 1582–1596. https://doi.org/10.1109/TSE.2013.29

work page doi:10.1109/tse.2013.29 2013
[2]

Castro, Ignacio Garnica, and Luis A

John W. Castro, Ignacio Garnica, and Luis A. Rojas. 2022. Automated Tools for Usability Evaluation: A Systematic Mapping Study. InSocial Computing and Social Media: Design, User Experience and Impact , Gabriele Meiselwitz (Ed.). Springer International Publishing, Cham, 28–46

work page 2022
[3]

https://doi.org/10.1007/978-1- 4684-3384-5_11

Asela Gunawardana, Guy Shani, and Sivan Yogev. 2022.Evaluating Recommender Systems. Springer US, New York, NY, 547–601. https://doi.org/10.1007/978-1- 0716-2197-4_15

work page doi:10.1007/978-1- 2022
[4]

Christopher Hass. 2019. A Practical Guide to Usability Testing . Springer Interna- tional Publishing, Cham, 107–124. https://doi.org/10.1007/978-3-319-96906-0_6

work page doi:10.1007/978-3-319-96906-0_6 2019
[5]

Thomas T Hewett, Ronald Baecker, Stuart Card, Tom Carey, Jean Gasen, Mari- lyn Mantei, Gary Perlman, Gary Strong, and William Verplank. 1992. Human- Computer Interaction. ACM, New York, NY, USA, 5–29

work page 1992
[6]

Tasha Hollingsed and David G. Novick. 2007. Usability inspection methods after 15 years of research and practice. In Proceedings of the 25th Annual ACM International Conference on Design of Communication (El Paso, Texas, USA) (SIG- DOC ’07). Association for Computing Machinery, New York, NY, USA, 249–255. https://doi.org/10.1145/1297144.1297200

work page doi:10.1145/1297144.1297200 2007
[7]

International Organization for Standardization. 2018. ISO/IEC/IEEE Interna- tional Standard - Ergonomics of human-system interaction – Part 11: Usability: Definitions and concepts. ISO/IEC/IEEE 9241-11:2018(E) (2018)

work page 2018
[8]

Ananya Kumar, Jiahui Yu, John Hallman, Michelle Pokrass, and Other Authors

work page
[9]

https://openai.com/index/gpt-4-1/

Introducing GPT-4.1 in the API. https://openai.com/index/gpt-4-1/. Ac- cessed: 23.04.2025

work page 2025
[10]

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2025. How Can Recommender Systems Benefit from Large Language Models: A Survey. ACM Trans. Inf. Syst. 43, 2, Article 28 (Jan. 2025), 47 pages. https://doi.org/10.1145/3678004

work page doi:10.1145/3678004 2025
[11]

Mary McHugh. 2012. Interrater reliability: The kappa statistic.Biochemia medica : časopis Hrvatskoga društva medicinskih biokemičara / HDMB 22 (10 2012), 276–82. https://doi.org/10.11613/BM.2012.031

work page doi:10.11613/bm.2012.031 2012
[12]

Abdallah Namoun, Ahmed Alrehaili, and Ali Tufail. 2021. A Review of Automated Website Usability Evaluation Tools: Research Issues and Challenges. InDesign, User Experience, and Usability: UX Research and Design , Marcelo M. Soares, Eliza- beth Rosenzweig, and Aaron Marcus (Eds.). Springer International Publishing, Cham, 292–311

work page 2021
[13]

Jakob Nielsen. 1994. Enhancing the explanatory power of usability heuristics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI ’94). Association for Computing Machinery, New York, NY, USA, 152–158. https://doi.org/10.1145/191666.191729

work page doi:10.1145/191666.191729 1994
[14]

Jacob Nielsen. 2012. Usability 101: Introduction to Usability. https://www. nngroup.com/articles/usability-101-introduction-to-usability/. Accessed: 22.04.2025

work page 2012
[15]

OpenAI and Other Authors. 2024. OpenAI o1 System Card. arXiv:2412.16720 [cs.AI] https://arxiv.org/abs/2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

2025.IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

Ali Ebrahimi Pourasad and Walid Maalej. 2025. Does GenAI Make Usability Testing Obsolete? . In 2025 IEEE/ACM 47th International Conference on Software Towards Recommending Usability Improvements with Multimodal LLMs arXiv’25, August 22, 2025, No location Table 4: Comparison of example explanations for usability evaluation provided by human experts and LLM...

work page doi:10.1109/icse55347.2025.00138 2025
[17]

Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Lan- guage Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 314, 7 pages. https://doi.org/10.1145/3411763.3451760

work page doi:10.1145/3411763.3451760 2021
[18]

Ruparelia

Nayan B. Ruparelia. 2010. Software development lifecycle models.SIGSOFT Softw. Eng. Notes 35, 3 (May 2010), 8–13. https://doi.org/10.1145/1764810.1764814

work page doi:10.1145/1764810.1764814 2010
[19]

Rick Spencer. 2000. The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 353–359. https://doi.org/10.1145/332...

work page doi:10.1145/332040.332456 2000
[20]

Martin Stettinger, Trang Tran, Ingo Pribik, Gerhard Leitner, Alexander Felfer- nig, Ralph Samer, Muesluem Atas, and Manfred Wundara. 2020. Knowl- edgeCheckR: Intelligent Techniques for Counteracting Forgetting. In Proceedings of the 9th International Conference on Prestigious Applications of Intelligent Sys- tems – PAIS@ECAI2020 (Santiago de Compostela, S...

work page 2020
[21]

Gemini Team and Other Authors. 2024. Gemini: A Family of Highly Capable Multimodal Models. https://arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Sebastian Winter, Stefan Wagner, and Florian Deissenboeck. 2008. A Compre- hensive Model of Usability. In Engineering Interactive Systems , Jan Gulliksen, Morton Borup Harning, Philippe Palanque, Gerrit C. van der Veer, and Janet Wesson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 106–122

work page 2008
[23]

Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al . 2024. A survey on large language models for recommendation. World Wide Web 27, 5 (2024), 60

work page 2024
[24]

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. A survey on multimodal large language models. National Science Review 11, 12 (Nov. 2024). https://doi.org/10.1093/nsr/nwae403

work page doi:10.1093/nsr/nwae403 2024

[1] [1]

Moreno, María-Isabel Sánchez-Segura, and Ahmed Sef- fah

Laura Carvajal, Ana M. Moreno, María-Isabel Sánchez-Segura, and Ahmed Sef- fah. 2013. Usability through Software Design. IEEE Transactions on Software Engineering 39, 11 (2013), 1582–1596. https://doi.org/10.1109/TSE.2013.29

work page doi:10.1109/tse.2013.29 2013

[2] [2]

Castro, Ignacio Garnica, and Luis A

John W. Castro, Ignacio Garnica, and Luis A. Rojas. 2022. Automated Tools for Usability Evaluation: A Systematic Mapping Study. InSocial Computing and Social Media: Design, User Experience and Impact , Gabriele Meiselwitz (Ed.). Springer International Publishing, Cham, 28–46

work page 2022

[3] [3]

https://doi.org/10.1007/978-1- 4684-3384-5_11

Asela Gunawardana, Guy Shani, and Sivan Yogev. 2022.Evaluating Recommender Systems. Springer US, New York, NY, 547–601. https://doi.org/10.1007/978-1- 0716-2197-4_15

work page doi:10.1007/978-1- 2022

[4] [4]

Christopher Hass. 2019. A Practical Guide to Usability Testing . Springer Interna- tional Publishing, Cham, 107–124. https://doi.org/10.1007/978-3-319-96906-0_6

work page doi:10.1007/978-3-319-96906-0_6 2019

[5] [5]

Thomas T Hewett, Ronald Baecker, Stuart Card, Tom Carey, Jean Gasen, Mari- lyn Mantei, Gary Perlman, Gary Strong, and William Verplank. 1992. Human- Computer Interaction. ACM, New York, NY, USA, 5–29

work page 1992

[6] [6]

Tasha Hollingsed and David G. Novick. 2007. Usability inspection methods after 15 years of research and practice. In Proceedings of the 25th Annual ACM International Conference on Design of Communication (El Paso, Texas, USA) (SIG- DOC ’07). Association for Computing Machinery, New York, NY, USA, 249–255. https://doi.org/10.1145/1297144.1297200

work page doi:10.1145/1297144.1297200 2007

[7] [7]

International Organization for Standardization. 2018. ISO/IEC/IEEE Interna- tional Standard - Ergonomics of human-system interaction – Part 11: Usability: Definitions and concepts. ISO/IEC/IEEE 9241-11:2018(E) (2018)

work page 2018

[8] [8]

Ananya Kumar, Jiahui Yu, John Hallman, Michelle Pokrass, and Other Authors

work page

[9] [9]

https://openai.com/index/gpt-4-1/

Introducing GPT-4.1 in the API. https://openai.com/index/gpt-4-1/. Ac- cessed: 23.04.2025

work page 2025

[10] [10]

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2025. How Can Recommender Systems Benefit from Large Language Models: A Survey. ACM Trans. Inf. Syst. 43, 2, Article 28 (Jan. 2025), 47 pages. https://doi.org/10.1145/3678004

work page doi:10.1145/3678004 2025

[11] [11]

Mary McHugh. 2012. Interrater reliability: The kappa statistic.Biochemia medica : časopis Hrvatskoga društva medicinskih biokemičara / HDMB 22 (10 2012), 276–82. https://doi.org/10.11613/BM.2012.031

work page doi:10.11613/bm.2012.031 2012

[12] [12]

Abdallah Namoun, Ahmed Alrehaili, and Ali Tufail. 2021. A Review of Automated Website Usability Evaluation Tools: Research Issues and Challenges. InDesign, User Experience, and Usability: UX Research and Design , Marcelo M. Soares, Eliza- beth Rosenzweig, and Aaron Marcus (Eds.). Springer International Publishing, Cham, 292–311

work page 2021

[13] [13]

Jakob Nielsen. 1994. Enhancing the explanatory power of usability heuristics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI ’94). Association for Computing Machinery, New York, NY, USA, 152–158. https://doi.org/10.1145/191666.191729

work page doi:10.1145/191666.191729 1994

[14] [14]

Jacob Nielsen. 2012. Usability 101: Introduction to Usability. https://www. nngroup.com/articles/usability-101-introduction-to-usability/. Accessed: 22.04.2025

work page 2012

[15] [15]

OpenAI and Other Authors. 2024. OpenAI o1 System Card. arXiv:2412.16720 [cs.AI] https://arxiv.org/abs/2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

2025.IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

Ali Ebrahimi Pourasad and Walid Maalej. 2025. Does GenAI Make Usability Testing Obsolete? . In 2025 IEEE/ACM 47th International Conference on Software Towards Recommending Usability Improvements with Multimodal LLMs arXiv’25, August 22, 2025, No location Table 4: Comparison of example explanations for usability evaluation provided by human experts and LLM...

work page doi:10.1109/icse55347.2025.00138 2025

[17] [17]

Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Lan- guage Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 314, 7 pages. https://doi.org/10.1145/3411763.3451760

work page doi:10.1145/3411763.3451760 2021

[18] [18]

Ruparelia

Nayan B. Ruparelia. 2010. Software development lifecycle models.SIGSOFT Softw. Eng. Notes 35, 3 (May 2010), 8–13. https://doi.org/10.1145/1764810.1764814

work page doi:10.1145/1764810.1764814 2010

[19] [19]

Rick Spencer. 2000. The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 353–359. https://doi.org/10.1145/332...

work page doi:10.1145/332040.332456 2000

[20] [20]

Martin Stettinger, Trang Tran, Ingo Pribik, Gerhard Leitner, Alexander Felfer- nig, Ralph Samer, Muesluem Atas, and Manfred Wundara. 2020. Knowl- edgeCheckR: Intelligent Techniques for Counteracting Forgetting. In Proceedings of the 9th International Conference on Prestigious Applications of Intelligent Sys- tems – PAIS@ECAI2020 (Santiago de Compostela, S...

work page 2020

[21] [21]

Gemini Team and Other Authors. 2024. Gemini: A Family of Highly Capable Multimodal Models. https://arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Sebastian Winter, Stefan Wagner, and Florian Deissenboeck. 2008. A Compre- hensive Model of Usability. In Engineering Interactive Systems , Jan Gulliksen, Morton Borup Harning, Philippe Palanque, Gerrit C. van der Veer, and Janet Wesson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 106–122

work page 2008

[23] [23]

Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al . 2024. A survey on large language models for recommendation. World Wide Web 27, 5 (2024), 60

work page 2024

[24] [24]

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. A survey on multimodal large language models. National Science Review 11, 12 (Nov. 2024). https://doi.org/10.1093/nsr/nwae403

work page doi:10.1093/nsr/nwae403 2024