Exploring the Output of Software Testing Tools through a Visual Comparative Analysis

Anthony Maocheia-Ricci; Brandon Lit; Thomas Driscoll

arxiv: 2605.04189 · v1 · submitted 2026-05-05 · 💻 cs.HC · cs.SE

Exploring the Output of Software Testing Tools through a Visual Comparative Analysis

Brandon Lit , Anthony Maocheia-Ricci , Thomas Driscoll This is my paper

Pith reviewed 2026-05-08 17:35 UTC · model grok-4.3

classification 💻 cs.HC cs.SE

keywords software testingtest result visualizationHCICLI outputGUI outputcomparative analysisoutput formattingcolor usage

0 comments

The pith

A comparison of 50 testing tools shows shared patterns in how they format and visualize test results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a visual comparative analysis on the outputs of 50 software testing tools and harnesses, with 44 using command-line interfaces and 6 using graphical ones, drawn from four programming languages. It identifies recurring interface elements, the ways test results are displayed and visualized, and the detailed composition of those outputs. A sympathetic reader would care because earlier studies indicate that good visualizations help testers make decisions, yet no prior HCI work has mapped the common elements across tools. The results point to consistent trends in formatting and color use that developers of new tools could follow.

Core claim

Our analysis reveals the common interface elements in software testing tools, how these tools display and visualize test results, as well as the specific make-up of the output. Our findings provide insight on how visual testing output is formatted and how colour is used across both CLI and GUI environments, identifying trends that can be applied by developers of testing tools.

What carries the argument

The visual comparative analysis of outputs from 44 CLI and 6 GUI testing tools, which surfaces recurring elements, display methods, and formatting details.

If this is right

Testing tool developers can adopt the observed formatting and color conventions to align with existing patterns.
Shared display methods for results can be used to make test output more consistent across tools.
Trends identified in both CLI and GUI settings can inform interface design choices for new harnesses.
The specific composition of outputs can guide how results are structured to support tester decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If these patterns prove stable, integrated development environments could standardize result views based on them.
The same analysis method could be repeated on tools for other programming languages to test whether the trends generalize.
Designers might explore whether adopting the common elements reduces the time testers spend interpreting outputs.

Load-bearing premise

The 50 chosen tools are representative enough of the broader population of testing tools to support claims about general trends.

What would settle it

A follow-up survey of testing tools in additional languages or domains that reveals substantially different visualization patterns or color usage would show the identified common elements are not general.

Figures

Figures reproduced from arXiv: 2605.04189 by Anthony Maocheia-Ricci, Brandon Lit, Thomas Driscoll.

**Figure 1.** Figure 1: Our visual comparative analysis methodology phases, adapted from Frappier et al. [ view at source ↗

**Figure 2.** Figure 2: The full mosaic of all testing outputs, each composed of the 8 common interface elements. The CLIs and GUIs are view at source ↗

**Figure 3.** Figure 3: A screenshot of CUnit CLI with interface element view at source ↗

**Figure 4.** Figure 4: A screenshot of QUnit’s GUI with interface element view at source ↗

**Figure 7.** Figure 7: CUnit’s interactive CLI mode, with an example view at source ↗

**Figure 6.** Figure 6: An example of the “details-on-the-outside” class of view at source ↗

**Figure 8.** Figure 8: Examples of error location identifiers and code lines/blocks: (a) Pytest displaying line numbers, carats, and code block; view at source ↗

**Figure 9.** Figure 9: Example showing a differing amount of detail be view at source ↗

**Figure 10.** Figure 10: Example test suite summary blocks from our sample: (a) CHEAT, using two colours with period and colon symbols view at source ↗

read the original abstract

Software testing is a fundamental process of software development, and prior work has shown that visualizations of test results support testers' decision-making. However, Human-Computer Interaction research on software testing has yet to explore and understand the shared interface elements and patterns in visualization of testing outputs. To address this, we conducted a visual comparative analysis of the output of 50 software testing tools and harnesses (44 with CLI output, 6 with GUI output) across four popular programming languages. Our analysis reveals the common interface elements in software testing tools, how these tools display and visualize test results, as well as the specific make-up of the output. Our findings provide insight on how visual testing output is formatted and how colour is used across both CLI and GUI environments, identifying trends that can be applied by developers of testing tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper conducts a visual comparative analysis of outputs from 50 software testing tools (44 CLI, 6 GUI) across four programming languages. It claims to identify common interface elements, how test results are displayed and visualized, the specific composition of outputs, and trends in formatting and color usage that can inform testing tool development.

Significance. If the methodology were transparent and the sample justified, the work could usefully map design patterns in testing visualizations for HCI researchers and tool developers, addressing a noted gap in understanding shared elements that support tester decision-making. The descriptive approach has practical potential but currently lacks the rigor needed for reliable insights or generalization.

major comments (2)

[Methods] Methods section: The visual comparative analysis is described only at a high level in the abstract and introduction, with no details on the procedure, coding scheme for identifying 'common' elements, criteria for patterns, or validation steps such as inter-rater checks. This is load-bearing for the central claims, as the reported findings on interface elements, visualization, and trends cannot be assessed or replicated without it.
[Tool selection] Tool selection (likely §3 or equivalent): The sample of 50 tools (44 CLI, 6 GUI) across four languages is presented without explicit selection criteria, popularity metrics, stratification, or diversity audit. The severe CLI/GUI imbalance risks the observed 'common' patterns being artifacts of over-represented open-source CLI harnesses rather than general trends.

minor comments (1)

[Abstract] Abstract: 'Four popular programming languages' is stated without naming them (e.g., Java, Python), which reduces immediate clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important areas for improving the transparency and rigor of our work. We address each major comment point by point below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Methods] Methods section: The visual comparative analysis is described only at a high level in the abstract and introduction, with no details on the procedure, coding scheme for identifying 'common' elements, criteria for patterns, or validation steps such as inter-rater checks. This is load-bearing for the central claims, as the reported findings on interface elements, visualization, and trends cannot be assessed or replicated without it.

Authors: We agree that the current Methods section lacks sufficient detail for full assessment and replication. In the revised manuscript, we will expand this section to describe the full analysis procedure step by step, the coding scheme used to categorize interface elements and visualization formats, explicit criteria for identifying 'common' patterns and trends (including thresholds for commonality), and validation measures such as inter-rater reliability checks between coders. These additions will directly address the load-bearing nature of the claims. revision: yes
Referee: [Tool selection] Tool selection (likely §3 or equivalent): The sample of 50 tools (44 CLI, 6 GUI) across four languages is presented without explicit selection criteria, popularity metrics, stratification, or diversity audit. The severe CLI/GUI imbalance risks the observed 'common' patterns being artifacts of over-represented open-source CLI harnesses rather than general trends.

Authors: We acknowledge that the tool selection process requires more explicit justification. The revised manuscript will include a dedicated subsection detailing the selection criteria (e.g., popularity via GitHub stars, download metrics, and official documentation), any stratification by language or output type, and a diversity audit. We will also explain the rationale for the CLI/GUI distribution, noting that it mirrors the real-world prevalence of CLI-based testing tools, while discussing limitations and potential impacts on generalizability. Where feasible, we will explore adding more GUI examples to mitigate the imbalance. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive empirical survey with no derivations or self-referential claims

full rationale

This paper performs a visual comparative analysis of outputs from 50 testing tools (44 CLI, 6 GUI) across four languages. The central claims consist of observed common interface elements, display patterns, and color usage identified through direct inspection. There are no equations, fitted parameters, predictions, uniqueness theorems, or self-citations invoked to justify core results. The analysis is self-contained as an empirical description; any concern about sample representativeness is a question of external validity, not a reduction of the reported findings to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical comparative study of existing tools; no mathematical models, free parameters, axioms, or new entities are introduced or required.

pith-pipeline@v0.9.0 · 5435 in / 878 out tokens · 35384 ms · 2026-05-08T17:35:33.213049+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

List of unit testing frameworks

2026. List of unit testing frameworks. https://en.wikipedia.org/w/index.php? title=List_of_unit_testing_frameworks&oldid=1343929439 Page Version ID: 1343929439

work page 2026
[2]

Abdulaziz Alaboudi and Thomas D. Latoza. 2023. Hypothesizer: A Hypothesis- Based Debugger to Find and Test Debugging Hypotheses. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3586183.3606781

work page doi:10.1145/3586183.3606781 2023
[3]

Paul Ayres and John Sweller. 2005. The Split-Attention Principle in Multime- dia Learning. InThe Cambridge Handbook of Multimedia Learning, Richard Mayer (Ed.). Cambridge University Press, Cambridge, 135–146. doi:10.1017/ CBO9780511816819.009

work page 2005
[4]

Benjamin Bach, Zezhong Wang, Matteo Farinella, Dave Murray-Rust, and Nathalie Henry Riche. 2018. Design Patterns for Data Comics. InProceed- ings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3173574.3173612

work page doi:10.1145/3173574.3173612 2018
[5]

Andrea Borg, Chris Porter, and Mark Micallef. 2015. Is Carmen better than George? testing the exploratory tester using HCI techniques. InProceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE ’15). IEEE Press, Florence, Italy, 815–816. https://dl.acm.org/doi/10.5555/2819009.2819181

work page doi:10.5555/2819009.2819181 2015
[6]

Yuanliang Chen, Yu Jiang, Fuchen Ma, Jie Liang, Mingzhe Wang, Chijin Zhou, Xun Jiao, and Zhuo Su. 2019. EnFuzz: Ensemble Fuzzing with Seed Synchroniza- tion among Diverse Fuzzers. 1967–1983. https://www.usenix.org/conference/ usenixsecurity19/presentation/chen-yuanliang

work page 2019
[7]

Song, Walter S

Yan Chen, Maulishree Pandey, Jean Y. Song, Walter S. Lasecki, and Steve Oney

work page
[8]

In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20)

Improving Crowd-Supported GUI Testing with Structural Guidance. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376835

work page doi:10.1145/3313831.3376835 2020
[9]

Chen, Rahul Gopinath, Anita Tadakamalla, Michael D

Yiqun T. Chen, Rahul Gopinath, Anita Tadakamalla, Michael D. Ernst, Reid Holmes, Gordon Fraser, Paul Ammann, and René Just. 2021. Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). Association for Computing...

work page doi:10.1145/3324884.3416667 2021
[10]

Chiou, Ali S

Paul T. Chiou, Ali S. Alotaibi, and William G.J. Halfond. 2023. BAGEL: An Approach to Automatically Detect Navigation-Based Web Accessibility Barriers for Keyboard Users. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3544548.3580749

work page doi:10.1145/3544548.3580749 2023
[11]

Lisa G Dirks, Miranda Belarde-Lewis, and Wanda Pratt. 2025. Amplifying Cultural Values with Collaborative Photo-Elicitation: Strengths-Focused Co-Design with Alaska Native People. InProceedings of the 2025 ACM Designing Interactive Systems Conference (DIS ’25). Association for Computing Machinery, New York, NY, USA, 1349–1365. doi:10.1145/3715336.3735688

work page doi:10.1145/3715336.3735688 2025
[12]

Micallef

Isabel Evans, Chris Porter, and Mark J. Micallef. 2024. Breaking Tester Stereotypes: who is testing and why it matters. BCS Learning & Development, 115–126. doi:10.14236/ewic/BCSHCI2024.11

work page doi:10.14236/ewic/bcshci2024.11 2024
[13]

Tallullah Frappier, Nathalie Bressa, and Samuel Huron. 2024. Jumping to Con- clusions: A Visual Comparative Analysis of Online Debate Platform Layouts. InProceedings of the 13th Nordic Conference on Human-Computer Interaction (NordiCHI ’24). Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3679318.3685377

work page doi:10.1145/3679318.3685377 2024
[14]

Xiaoxiao Gan, Huayu Liang, and Chris Brown. 2025. Challenges, Strategies, and Impacts: A Qualitative Study on UI Testing in CI/CD Processes from GitHub Developers’ Perspectives. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST). 186–197. doi:10.1109/ICST62969.2025.10988972 ISSN: 2159-4848

work page doi:10.1109/icst62969.2025.10988972 2025
[15]

Nanna Gorm and Irina Shklovski. 2017. Participant Driven Photo Elicitation for Understanding Activity Tracking: Benefits and Limitations. InProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 1350–1361. doi:10.1145/2998181.2998214

work page doi:10.1145/2998181.2998214 2017
[16]

Nina Hollender, Cristian Hofmann, Michael Deneke, and Bernhard Schmitz. 2010. Integrating cognitive load theory and concepts of human–computer interaction. Computers in Human Behavior26, 6 (Nov. 2010), 1278–1288. doi:10.1016/j.chb. 2010.05.031

work page doi:10.1016/j.chb 2010
[17]

Waqas Javed and Niklas Elmqvist. 2012. Exploring the design space of composite visualization. In2012 IEEE Pacific Visualization Symposium. 1–8. doi:10.1109/ PacificVis.2012.6183556 ISSN: 2165-8773

work page arXiv 2012
[18]

Alla Katsnelson. 2021. Colour me better: fixing figures for colour blindness.Nature 598, 7879 (Oct. 2021), 224–225. doi:10.1038/d41586-021-02696-z Bandiera_abtest: a Cg_type: Technology Feature Subject_term: Publishing, Communication

work page doi:10.1038/d41586-021-02696-z 2021
[19]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). Association for Computing Machinery, New York, NY, USA, 2123–2138. doi:10.1145/3243734.3243804

work page doi:10.1145/3243734.3243804 2018
[20]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang

work page
[21]

InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22)

Guided Bug Crush: Assist Manual GUI Testing of Android Apps via Hint Moves. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3491102.3501903

work page doi:10.1145/3491102.3501903 2022
[22]

Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen.Proc. ACM Program. Lang.4, OOPSLA (Nov. 2020), 196:1–196:25. doi:10.1145/3428264

work page doi:10.1145/3428264 2020
[23]

Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 72:1–72:23. doi:10.1145/3359174

work page doi:10.1145/3359174 2019
[24]

Miriah Meyer and Jason Dykes. 2019. Criteria for Rigor in Visualization Design Study.IEEE Transactions on Visualization and Computer Graphics(2019), 1–1. doi:10.1109/TVCG.2019.2934539

work page doi:10.1109/tvcg.2019.2934539 2019
[25]

Inês Coimbra Morgado and Ana C. R. Paiva. 2019. The iMPAcT Tool for Android Testing.Proc. ACM Hum.-Comput. Interact.3, EICS (June 2019), 4:1–4:23. doi:10. 1145/3300963

work page 2019
[26]

Xianfei Ou, Cong Li, Yanyan Jiang, and Chang Xu. 2025. The Mutators Reloaded: Fuzzing Compilers with Large Language Model Generated Mutation Opera- tors. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4 (ASP- LOS ’24). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3622781.3674171 2025
[27]

Marllos Paiva Prado and Auri Marcelo Rizzo Vincenzi. 2018. Towards cognitive support for unit testing: A qualitative study with practitioners.Journal of Systems and Software141 (July 2018), 66–84. doi:10.1016/j.jss.2018.03.052

work page doi:10.1016/j.jss.2018.03.052 2018
[28]

2023.Visual Methodologies: An Introduction to Researching with Visual Materials(fifth edition ed.)

Gillian Rose. 2023.Visual Methodologies: An Introduction to Researching with Visual Materials(fifth edition ed.). SAGE Publications Ltd, 55 City Road. doi:10. 4135/9781036231576

work page 2023
[29]

Clive Seale, Giampietro Gobo, Jaber F.Gubrium, David Silverman, and Sarah Pink

work page
[30]

InQualitative Research Practice

Visual Methods. InQualitative Research Practice. SAGE Publications Ltd, 361–377. doi:10.4135/9781848608191

work page doi:10.4135/9781848608191
[31]

Shneiderman

B. Shneiderman. 1996. The eyes have it: a task by data type taxonomy for infor- mation visualizations. InProceedings 1996 IEEE Symposium on Visual Languages. 336–343. doi:10.1109/VL.1996.545307 ISSN: 1049-2615

work page doi:10.1109/vl.1996.545307 1996
[32]

Per Erik Strandberg, Wasif Afzal, and Daniel Sundmark. 2018. Decision making and visualizations based on test results. InProceedings of the 12th ACM/IEEE 8 Exploring Visual Software Testing Output , , International Symposium on Empirical Software Engineering and Measurement (ESEM ’18). Association for Computing Machinery, New York, NY, USA, 1–10. doi:10.1...

work page doi:10.1145/3239235.3268921 2018
[33]

Per Erik Strandberg, Eduard Paul Enoiu, Wasif Afzal, Daniel Sundmark, and Robert Feldt. 2019. Information Flow in Software Testing – An Interview Study With Embedded Software Engineering Practitioners.IEEE Access7 (2019), 46434– 46453. doi:10.1109/ACCESS.2019.2909093

work page doi:10.1109/access.2019.2909093 2019
[34]

Zezhong Wang, Samuel Huron, Miriam Sturdee, and Sheelagh Carpendale. 2024. Summary of the Workshop on Visual Methods and Analyzing Visual Data in Human Computer Interaction. InCompanion Proceedings of the 2024 Conference on Interactive Surfaces and Spaces (ISS Companion ’24). Association for Computing Machinery, New York, NY, USA, 29–32. doi:10.1145/36967...

work page doi:10.1145/3696762.3698047 2024
[35]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3597503.3639121

work page doi:10.1145/3597503.3639121 2024
[36]

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jab- barvand, and Lingming Zhang. 2024. WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models.Proc. ACM Program. Lang.8, OOPSLA2 (Oct. 2024), 296:709–296:735. doi:10.1145/3689736

work page doi:10.1145/3689736 2024
[37]

Leni Yang, Xian Xu, XingYu Lan, Ziyan Liu, Shunan Guo, Yang Shi, Huamin Qu, and Nan Cao. 2022. A Design Space for Applying the Freytag’s Pyramid Structure to Data Stories.IEEE Transactions on Visualization and Computer Graphics28, 1 (Jan. 2022), 922–932. doi:10.1109/TVCG.2021.3114774 9 , , Lit et al. A Included Programs Table A.1: List of all programs inc...

work page doi:10.1109/tvcg.2021.3114774 2022

[1] [1]

List of unit testing frameworks

2026. List of unit testing frameworks. https://en.wikipedia.org/w/index.php? title=List_of_unit_testing_frameworks&oldid=1343929439 Page Version ID: 1343929439

work page 2026

[2] [2]

Abdulaziz Alaboudi and Thomas D. Latoza. 2023. Hypothesizer: A Hypothesis- Based Debugger to Find and Test Debugging Hypotheses. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3586183.3606781

work page doi:10.1145/3586183.3606781 2023

[3] [3]

Paul Ayres and John Sweller. 2005. The Split-Attention Principle in Multime- dia Learning. InThe Cambridge Handbook of Multimedia Learning, Richard Mayer (Ed.). Cambridge University Press, Cambridge, 135–146. doi:10.1017/ CBO9780511816819.009

work page 2005

[4] [4]

Benjamin Bach, Zezhong Wang, Matteo Farinella, Dave Murray-Rust, and Nathalie Henry Riche. 2018. Design Patterns for Data Comics. InProceed- ings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3173574.3173612

work page doi:10.1145/3173574.3173612 2018

[5] [5]

Andrea Borg, Chris Porter, and Mark Micallef. 2015. Is Carmen better than George? testing the exploratory tester using HCI techniques. InProceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE ’15). IEEE Press, Florence, Italy, 815–816. https://dl.acm.org/doi/10.5555/2819009.2819181

work page doi:10.5555/2819009.2819181 2015

[6] [6]

Yuanliang Chen, Yu Jiang, Fuchen Ma, Jie Liang, Mingzhe Wang, Chijin Zhou, Xun Jiao, and Zhuo Su. 2019. EnFuzz: Ensemble Fuzzing with Seed Synchroniza- tion among Diverse Fuzzers. 1967–1983. https://www.usenix.org/conference/ usenixsecurity19/presentation/chen-yuanliang

work page 2019

[7] [7]

Song, Walter S

Yan Chen, Maulishree Pandey, Jean Y. Song, Walter S. Lasecki, and Steve Oney

work page

[8] [8]

In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20)

Improving Crowd-Supported GUI Testing with Structural Guidance. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376835

work page doi:10.1145/3313831.3376835 2020

[9] [9]

Chen, Rahul Gopinath, Anita Tadakamalla, Michael D

Yiqun T. Chen, Rahul Gopinath, Anita Tadakamalla, Michael D. Ernst, Reid Holmes, Gordon Fraser, Paul Ammann, and René Just. 2021. Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). Association for Computing...

work page doi:10.1145/3324884.3416667 2021

[10] [10]

Chiou, Ali S

Paul T. Chiou, Ali S. Alotaibi, and William G.J. Halfond. 2023. BAGEL: An Approach to Automatically Detect Navigation-Based Web Accessibility Barriers for Keyboard Users. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3544548.3580749

work page doi:10.1145/3544548.3580749 2023

[11] [11]

Lisa G Dirks, Miranda Belarde-Lewis, and Wanda Pratt. 2025. Amplifying Cultural Values with Collaborative Photo-Elicitation: Strengths-Focused Co-Design with Alaska Native People. InProceedings of the 2025 ACM Designing Interactive Systems Conference (DIS ’25). Association for Computing Machinery, New York, NY, USA, 1349–1365. doi:10.1145/3715336.3735688

work page doi:10.1145/3715336.3735688 2025

[12] [12]

Micallef

Isabel Evans, Chris Porter, and Mark J. Micallef. 2024. Breaking Tester Stereotypes: who is testing and why it matters. BCS Learning & Development, 115–126. doi:10.14236/ewic/BCSHCI2024.11

work page doi:10.14236/ewic/bcshci2024.11 2024

[13] [13]

Tallullah Frappier, Nathalie Bressa, and Samuel Huron. 2024. Jumping to Con- clusions: A Visual Comparative Analysis of Online Debate Platform Layouts. InProceedings of the 13th Nordic Conference on Human-Computer Interaction (NordiCHI ’24). Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3679318.3685377

work page doi:10.1145/3679318.3685377 2024

[14] [14]

Xiaoxiao Gan, Huayu Liang, and Chris Brown. 2025. Challenges, Strategies, and Impacts: A Qualitative Study on UI Testing in CI/CD Processes from GitHub Developers’ Perspectives. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST). 186–197. doi:10.1109/ICST62969.2025.10988972 ISSN: 2159-4848

work page doi:10.1109/icst62969.2025.10988972 2025

[15] [15]

Nanna Gorm and Irina Shklovski. 2017. Participant Driven Photo Elicitation for Understanding Activity Tracking: Benefits and Limitations. InProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 1350–1361. doi:10.1145/2998181.2998214

work page doi:10.1145/2998181.2998214 2017

[16] [16]

Nina Hollender, Cristian Hofmann, Michael Deneke, and Bernhard Schmitz. 2010. Integrating cognitive load theory and concepts of human–computer interaction. Computers in Human Behavior26, 6 (Nov. 2010), 1278–1288. doi:10.1016/j.chb. 2010.05.031

work page doi:10.1016/j.chb 2010

[17] [17]

Waqas Javed and Niklas Elmqvist. 2012. Exploring the design space of composite visualization. In2012 IEEE Pacific Visualization Symposium. 1–8. doi:10.1109/ PacificVis.2012.6183556 ISSN: 2165-8773

work page arXiv 2012

[18] [18]

Alla Katsnelson. 2021. Colour me better: fixing figures for colour blindness.Nature 598, 7879 (Oct. 2021), 224–225. doi:10.1038/d41586-021-02696-z Bandiera_abtest: a Cg_type: Technology Feature Subject_term: Publishing, Communication

work page doi:10.1038/d41586-021-02696-z 2021

[19] [19]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). Association for Computing Machinery, New York, NY, USA, 2123–2138. doi:10.1145/3243734.3243804

work page doi:10.1145/3243734.3243804 2018

[20] [20]

Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun Hu, and Qing Wang

work page

[21] [21]

InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22)

Guided Bug Crush: Assist Manual GUI Testing of Android Apps via Hint Moves. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3491102.3501903

work page doi:10.1145/3491102.3501903 2022

[22] [22]

Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen.Proc. ACM Program. Lang.4, OOPSLA (Nov. 2020), 196:1–196:25. doi:10.1145/3428264

work page doi:10.1145/3428264 2020

[23] [23]

Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 72:1–72:23. doi:10.1145/3359174

work page doi:10.1145/3359174 2019

[24] [24]

Miriah Meyer and Jason Dykes. 2019. Criteria for Rigor in Visualization Design Study.IEEE Transactions on Visualization and Computer Graphics(2019), 1–1. doi:10.1109/TVCG.2019.2934539

work page doi:10.1109/tvcg.2019.2934539 2019

[25] [25]

Inês Coimbra Morgado and Ana C. R. Paiva. 2019. The iMPAcT Tool for Android Testing.Proc. ACM Hum.-Comput. Interact.3, EICS (June 2019), 4:1–4:23. doi:10. 1145/3300963

work page 2019

[26] [26]

Xianfei Ou, Cong Li, Yanyan Jiang, and Chang Xu. 2025. The Mutators Reloaded: Fuzzing Compilers with Large Language Model Generated Mutation Opera- tors. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4 (ASP- LOS ’24). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3622781.3674171 2025

[27] [27]

Marllos Paiva Prado and Auri Marcelo Rizzo Vincenzi. 2018. Towards cognitive support for unit testing: A qualitative study with practitioners.Journal of Systems and Software141 (July 2018), 66–84. doi:10.1016/j.jss.2018.03.052

work page doi:10.1016/j.jss.2018.03.052 2018

[28] [28]

2023.Visual Methodologies: An Introduction to Researching with Visual Materials(fifth edition ed.)

Gillian Rose. 2023.Visual Methodologies: An Introduction to Researching with Visual Materials(fifth edition ed.). SAGE Publications Ltd, 55 City Road. doi:10. 4135/9781036231576

work page 2023

[29] [29]

Clive Seale, Giampietro Gobo, Jaber F.Gubrium, David Silverman, and Sarah Pink

work page

[30] [30]

InQualitative Research Practice

Visual Methods. InQualitative Research Practice. SAGE Publications Ltd, 361–377. doi:10.4135/9781848608191

work page doi:10.4135/9781848608191

[31] [31]

Shneiderman

B. Shneiderman. 1996. The eyes have it: a task by data type taxonomy for infor- mation visualizations. InProceedings 1996 IEEE Symposium on Visual Languages. 336–343. doi:10.1109/VL.1996.545307 ISSN: 1049-2615

work page doi:10.1109/vl.1996.545307 1996

[32] [32]

Per Erik Strandberg, Wasif Afzal, and Daniel Sundmark. 2018. Decision making and visualizations based on test results. InProceedings of the 12th ACM/IEEE 8 Exploring Visual Software Testing Output , , International Symposium on Empirical Software Engineering and Measurement (ESEM ’18). Association for Computing Machinery, New York, NY, USA, 1–10. doi:10.1...

work page doi:10.1145/3239235.3268921 2018

[33] [33]

Per Erik Strandberg, Eduard Paul Enoiu, Wasif Afzal, Daniel Sundmark, and Robert Feldt. 2019. Information Flow in Software Testing – An Interview Study With Embedded Software Engineering Practitioners.IEEE Access7 (2019), 46434– 46453. doi:10.1109/ACCESS.2019.2909093

work page doi:10.1109/access.2019.2909093 2019

[34] [34]

Zezhong Wang, Samuel Huron, Miriam Sturdee, and Sheelagh Carpendale. 2024. Summary of the Workshop on Visual Methods and Analyzing Visual Data in Human Computer Interaction. InCompanion Proceedings of the 2024 Conference on Interactive Surfaces and Spaces (ISS Companion ’24). Association for Computing Machinery, New York, NY, USA, 29–32. doi:10.1145/36967...

work page doi:10.1145/3696762.3698047 2024

[35] [35]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3597503.3639121

work page doi:10.1145/3597503.3639121 2024

[36] [36]

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jab- barvand, and Lingming Zhang. 2024. WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models.Proc. ACM Program. Lang.8, OOPSLA2 (Oct. 2024), 296:709–296:735. doi:10.1145/3689736

work page doi:10.1145/3689736 2024

[37] [37]

Leni Yang, Xian Xu, XingYu Lan, Ziyan Liu, Shunan Guo, Yang Shi, Huamin Qu, and Nan Cao. 2022. A Design Space for Applying the Freytag’s Pyramid Structure to Data Stories.IEEE Transactions on Visualization and Computer Graphics28, 1 (Jan. 2022), 922–932. doi:10.1109/TVCG.2021.3114774 9 , , Lit et al. A Included Programs Table A.1: List of all programs inc...

work page doi:10.1109/tvcg.2021.3114774 2022