Recognition: no theorem link
Towards Automated Crowdsourced Testing via Personified-LLM
Pith reviewed 2026-05-15 00:48 UTC · model grok-4.3
The pith
PersonaTester injects three orthogonal dimensions of human tester personas into LLMs to simulate diverse crowdworker behaviors for automated GUI testing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PersonaTester is a framework that automates crowdsourced GUI testing by injecting representative personas—defined along the three orthogonal dimensions of testing mindset, exploration strategy, and interaction habit—into LLM-based agents. This injection produces controllable, repeatable simulations of human-like testing behaviors. The resulting agents exhibit strong behavioral fidelity to real crowdworkers and generate more effective test events, triggering over 100 crashes and 11 functional bugs beyond what non-persona baselines achieve.
What carries the argument
Persona injection into LLM agents along three orthogonal dimensions (testing mindset, exploration strategy, interaction habit) that drive distinct action sequences and coverage patterns.
If this is right
- GUI testing coverage can be scaled with controlled behavioral diversity without recruiting new human testers for every session.
- Test runs become fully reproducible while still spanning multiple distinct user profiles.
- Higher rates of crash and functional-bug detection become achievable in automated pipelines.
- Testing across varied devices and environments can be simulated by varying only the persona parameters rather than the underlying agent code.
Where Pith is reading between the lines
- The same persona framework could be applied to other interactive testing domains such as web or desktop applications.
- Periodic recalibration of persona parameters against fresh crowdworker data might maintain fidelity as LLMs evolve.
- Combining persona-driven agents with lightweight human oversight could further reduce the cost of crowdsourced campaigns while preserving coverage quality.
Load-bearing premise
The three chosen persona dimensions, when injected into current LLMs, produce faithful and generalizable simulations of real human crowdworker behavior rather than LLM-specific artifacts.
What would settle it
Running the same persona agents on a new set of unseen mobile apps and comparing their generated test-event distributions and bug-trigger rates against fresh logs from actual crowdworkers on those same apps.
Figures
read the original abstract
The rapid proliferation and increasing complexity of software demand robust quality assurance, with graphical user interface (GUI) testing playing a pivotal role. Crowdsourced testing has proven effective in this context by leveraging the diversity of human testers to achieve rich, scenario-based coverage across varied devices, user behaviors, and usage environments. In parallel, automated testing, particularly with the advent of large language models (LLMs), offers significant advantages in controllability, reproducibility, and efficiency, enabling scalable and systematic exploration. However, automated approaches often lack the behavioral diversity characteristic of human testers, limiting their capability to fully simulate real-world testing dynamics. To address this gap, we present PersonaTester, a novel personified-LLM-based framework designed to automate crowdsourced GUI testing. By injecting representative personas, defined along three orthogonal dimensions: testing mindset, exploration strategy, and interaction habit, into LLM-based agents, PersonaTester enables the simulation of diverse human-like testing behaviors in a controllable and repeatable manner. Experimental results demonstrate that PersonaTester faithfully reproduces the behavioral patterns of real crowdworkers, exhibiting strong intra-persona consistency and clear inter-persona variability (117.86% -- 126.23% improvement over the baseline). Moreover, persona-guided testing agents consistently generate more effective test events and trigger more crashes (100+) and functional bugs (11) than the baseline without persona, thus substantially advancing the realism and effectiveness of automated crowdsourced GUI testing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PersonaTester, a framework that injects LLMs with personas defined along three orthogonal dimensions (testing mindset, exploration strategy, and interaction habit) to automate crowdsourced GUI testing. It claims this produces human-like behavioral diversity, with strong intra-persona consistency, inter-persona variability, 117.86%–126.23% improvement over a non-persona baseline, and higher effectiveness in generating test events that trigger 100+ crashes and 11 functional bugs.
Significance. If the central empirical claims are substantiated with direct evidence, the work could meaningfully advance automated GUI testing by offering a controllable way to simulate diverse human tester behaviors at scale, potentially reducing costs of crowdsourcing while preserving realism. The orthogonal persona design is a clear strength for reproducibility.
major comments (2)
- [Abstract and Experimental Results] Abstract and Experimental Results: The headline claim that PersonaTester 'faithfully reproduces the behavioral patterns of real crowdworkers' is supported only by indirect evidence (baseline improvements and higher crash/bug counts); no direct quantitative fidelity metrics (e.g., sequence edit distance, action-type KL divergence, or coverage overlap) comparing generated traces to real crowdworker logs on the same apps are reported. This leaves the reproduction claim dependent on the untested assumption that performance gains equal human fidelity rather than LLM artifacts.
- [Experimental Results] Experimental Results: The manuscript provides no details on experimental setup, including app selection criteria, number of runs, statistical significance tests, or how behavioral fidelity and consistency were measured against real human data. Without these, the reported 117–126% gains and crash/bug counts cannot be properly evaluated for robustness or generalizability.
minor comments (2)
- [Methodology] Clarify the exact prompting mechanism used to inject the three persona dimensions into the LLM agents, including any example prompts or templates, to aid reproducibility.
- [Experimental Setup] The baseline description should explicitly state whether it uses the same LLM backbone and prompting structure as PersonaTester (minus personas) to ensure a fair comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the empirical claims and experimental transparency. We address each point below and will revise the manuscript to incorporate additional details and clarifications where appropriate.
read point-by-point responses
-
Referee: [Abstract and Experimental Results] Abstract and Experimental Results: The headline claim that PersonaTester 'faithfully reproduces the behavioral patterns of real crowdworkers' is supported only by indirect evidence (baseline improvements and higher crash/bug counts); no direct quantitative fidelity metrics (e.g., sequence edit distance, action-type KL divergence, or coverage overlap) comparing generated traces to real crowdworker logs on the same apps are reported. This leaves the reproduction claim dependent on the untested assumption that performance gains equal human fidelity rather than LLM artifacts.
Authors: We acknowledge that the current manuscript relies on indirect evidence for the reproduction claim, specifically the quantified intra-persona consistency and inter-persona variability (via the 117.86%–126.23% gains) together with superior crash and bug detection rates. These metrics were chosen because they directly capture the expected properties of crowdworker behavior—consistent patterns within a tester style and diversity across styles—rather than assuming performance gains alone imply fidelity. We agree that direct metrics would provide stronger substantiation. In the revision we will add action-type KL divergence, sequence similarity measures, and coverage overlap analyses using the real crowdworker logs collected during our experiments, and we will tone down the abstract wording to 'reproduces key behavioral patterns' pending those results. revision: partial
-
Referee: [Experimental Results] Experimental Results: The manuscript provides no details on experimental setup, including app selection criteria, number of runs, statistical significance tests, or how behavioral fidelity and consistency were measured against real human data. Without these, the reported 117–126% gains and crash/bug counts cannot be properly evaluated for robustness or generalizability.
Authors: We agree that the Experimental Results section lacks sufficient methodological detail. In the revised manuscript we will expand this section to specify: (i) the criteria and rationale for selecting the evaluated apps, (ii) the exact number of independent runs per condition, (iii) the statistical tests (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values) used to assess significance of the reported gains, and (iv) the precise operational definitions and formulas for measuring intra-persona consistency (variance within persona runs) and inter-persona variability (divergence across personas), including how these were validated against the real human traces. revision: yes
Circularity Check
No significant circularity; empirical results stand independently
full rationale
The paper presents PersonaTester as a framework that injects three defined persona dimensions into LLM agents and evaluates it via direct experimental comparison against a non-persona baseline. Reported gains (117-126% improvement, more crashes and bugs) are measured outcomes from testing runs on apps, not quantities derived from or forced by the persona definitions themselves. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would reduce the fidelity or effectiveness claims to tautologies. The central assumption about human-like behavior is tested (or claimed tested) through observable metrics rather than presupposed by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Personas defined along testing mindset, exploration strategy, and interaction habit dimensions can be faithfully represented and followed by LLMs.
Reference graph
Works this paper leans on
-
[1]
Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Salvatore De Carmine, and Atif M Memon. 2012. Using GUI ripping for automated testing of Android applications. InProceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 258–261
work page 2012
-
[2]
Shaoheng Cao, Minxue Pan, Yuanhong Lan, and Xuandong Li. 2025. Intention-Based GUI Test Migration for Mobile Apps using Large Language Models.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 2296–2318
work page 2025
-
[3]
Hao Chen, Song Huang, Yuchan Liu, Run Luo, and Yifei Xie. 2021. An effective crowdsourced test report clustering model based on sentence embedding. In2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). IEEE, 888–899
work page 2021
-
[4]
Mengzhuo Chen, Zhe Liu, Chunyang Chen, Junjie Wang, Boyu Wu, Jun Hu, and Qing Wang. 2025. Standing on the Shoulders of Giants: Bug-Aware Automated GUI Testing via Retrieval Augmentation.Proceedings of the ACM on Software Engineering2, FSE (2025), 825–846
work page 2025
-
[5]
Qiang Cui, Junjie Wang, Guowei Yang, Miao Xie, Qing Wang, and Mingshu Li. 2017. Who should be selected to perform a task in crowdsourced testing?. In2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, 75–84
work page 2017
-
[6]
Chunrong Fang, Shengcheng Yu, Quanjun Zhang, Xin Li, Yulei Liu, and Zhenyu Chen. 2024. Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration.IEEE Transactions on Software Engineering(2024)
work page 2024
-
[7]
Sidong Feng and Chunyang Chen. 2024. Prompting is all you need: Automated android bug replay with large language models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13
work page 2024
-
[8]
Ruizhi Gao, Yabin Wang, Yang Feng, Zhenyu Chen, and W Eric Wong. 2019. Successes, challenges, and rethinking–an industrial investigation on crowdsourced mobile application testing.Empirical Software Engineering24, 2 (2019), 537–561
work page 2019
-
[9]
Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 273–278
work page 2013
-
[10]
Jonathan Grudin and John Pruitt. 2002. Personas, participatory design and product development: An infrastructure for engagement. InPDC. 144–152
work page 2002
-
[11]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Rui Hao, Yang Feng, James A Jones, Yuying Li, and Zhenyu Chen. 2019. CTRAS: Crowdsourced test report aggregation and summarization. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 900–911
work page 2019
-
[13]
Tiancheng Hu and Nigel Collier. 2024. Quantifying the Persona Effect in LLM Simulations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 10289–10307
work page 2024
-
[14]
Yuchao Huang, Junjie Wang, Zhe Liu, Mingyang Li, Song Wang, Chunyang Chen, Yuanzhe Hu, and Qing Wang. 2025. One Sentence Can Kill the Bug: Auto-replay Mobile App Crashes from One-sentence Overviews.IEEE Transactions on Software Engineering(2025)
work page 2025
-
[15]
Yuekai Huang, Junjie Wang, Song Wang, Zhe Liu, Yuanzhe Hu, and Qing Wang. 2020. Quest for the golden approach: An experimental evaluation of duplicate crowdtesting reports detection. InProceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12
work page 2020
-
[16]
Ankur Joshi, Saket Kale, Satish Chandel, and D Kumar Pal. 2015. Likert scale: Explored and explained.British journal of applied science & technology7, 4 (2015), 396
work page 2015
-
[17]
Taemin Kim and Geunseok Yang. 2022. Predicting duplicate in bug report using topic-based duplicate learning with fine tuning-based bert algorithm.IEEE Access10 (2022), 129666–129675
work page 2022
-
[18]
Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. Droidbot: a lightweight ui-guided test input generator for android. In2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 23–26
work page 2017
-
[19]
Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2019. Humanoid: A deep learning-based approach to automated black-box android app testing. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1070–1073
work page 2019
-
[20]
Yuchen Ling, Shengcheng Yu, Chunrong Fang, Guobin Pan, Jun Wang, and Jia Liu. 2025. Redefining crowdsourced test report prioritization: An innovative approach with large language model.Information and Software Technology179 (2025), 107629
work page 2025
-
[21]
Di Liu, Yang Feng, Xiaofang Zhang, James A Jones, and Zhenyu Chen. 2020. Clustering crowdsourced test reports of mobile applications using image understanding.IEEE transactions on software engineering48, 4 (2020), 1290–1308
work page 2020
-
[22]
Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Arti...
work page 2024
-
[23]
Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, and Qing Wang. 2024. Unblind text inputs: predicting hint-text of text input in mobile apps via LLM. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–20
work page 2024
- [24]
-
[25]
Mostafa Mohammed, Haipeng Cai, and Na Meng. 2019. An empirical comparison between monkey testing and human testing (wip paper). InProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. 188–192
work page 2019
-
[26]
Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing.ACM Computing Surveys (CSUR)43, 2 (2011), 1–29
work page 2011
-
[27]
Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020. Reinforcement learning based curiosity- driven testing of android applications. InProceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis. 153–164
work page 2020
-
[28]
Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, Michael S Bernstein, et al. 2023. Generative agents: Interactive simulacra of human behavior. arXiv.Org (2023, April 7) https://arxiv. org/abs/2304.03442 v2(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Chen Qian and Xin Cong. 2023. Communicative agents for software development.arXiv preprint arXiv:2307.079246, 3 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Hong Kong, China, 3982–3992. doi:10.18653/v1/D19-1410
-
[31]
Iflaah Salman, Ayse Tosun Misirli, and Natalia Juristo. 2015. Are students representatives of professionals in software engineering experiments?. In2015 IEEE/ACM 37th IEEE international conference on software engineering, Vol. 1. IEEE, 666–676
work page 2015
-
[32]
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su
-
[33]
InProceedings of the 2017 11th joint meeting on foundations of software engineering
Guided, stochastic model-based GUI testing of Android apps. InProceedings of the 2017 11th joint meeting on foundations of software engineering. 245–256
work page 2017
-
[34]
Yanqi Su, Zhenchang Xing, Chong Wang, Chunyang Chen, Sherry Xu, Qinghua Lu, and Liming Zhu. 2025. Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead.Proceedings of the ACM on Software Engineering2, FSE (2025), 757–778
work page 2025
-
[35]
Lipeipei Sun, Tianzi Qin, Anran Hu, Jiale Zhang, Shuojia Lin, Jianyan Chen, Mona Ali, and Mirjana Prpa. 2025. Persona-L has Entered the Chat: Leveraging LLMs and Ability-based Framework for Personas of People with Complex Needs. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–31
work page 2025
-
[36]
Yao Tong and Xiaofang Zhang. 2021. Crowdsourced test report prioritization considering bug severity.Information and Software Technology139 (2021), 106668
work page 2021
-
[37]
Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen
- [38]
-
[39]
Chenxu Wang, Tianming Liu, Yanjie Zhao, Minghui Yang, and Haoyu Wang. 2025. LLMDroid: Enhancing Automated Mobile App GUI Testing Coverage with Large Language Model Guidance.Proceedings of the ACM on Software Engineering2, FSE (2025), 1001–1022
work page 2025
-
[40]
Dingbang Wang, Yu Zhao, Sidong Feng, Zhaoxu Zhang, William GJ Halfond, Chunyang Chen, Xiaoxia Sun, Jiangfan Shi, and Tingting Yu. 2024. Feedback-driven automated whole bug report reproduction for android apps. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1048–1060
work page 2024
-
[41]
Jue Wang, Yanyan Jiang, Chang Xu, Chun Cao, Xiaoxing Ma, and Jian Lu. 2020. Combodroid: generating high-quality test inputs for android apps via use case combinations. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 469–480
work page 2020
-
[42]
Junjie Wang, Mingyang Li, Song Wang, Tim Menzies, and Qing Wang. 2019. Images don’t lie: Duplicate crowdtesting reports detection with screenshot information.Information and Software Technology110 (2019), 139–155
work page 2019
-
[43]
Junjie Wang, Song Wang, Jianfeng Chen, Tim Menzies, Qiang Cui, Miao Xie, and Qing Wang. 2019. Characterizing crowds to better optimize worker recommendation in crowdsourced testing.IEEE Transactions on Software Engineering 47, 6 (2019), 1259–1276
work page 2019
-
[44]
Junjie Wang, Ye Yang, Song Wang, Jun Hu, and Qing Wang. 2022. Context-and fairness-aware in-process crowdworker recommendation.ACM Transactions on Software Engineering and Methodology (TOSEM)31, 3 (2022), 1–31
work page 2022
-
[45]
Junjie Wang, Ye Yang, Song Wang, Yuanzhe Hu, Dandan Wang, and Qing Wang. 2020. Context-aware in-process crowdworker recommendation. InProceedings of the ACM/IEEE 42nd international conference on software engineering. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE166. Publication date: July 2026. Towards Automated Crowdsourced Testing via Personified-...
work page 2020
-
[46]
2022.Intelligent Crowdsourced Testing
Qing Wang, Zhenyu Chen, Junjie Wang, and Yang Feng. 2022.Intelligent Crowdsourced Testing. Springer
work page 2022
-
[47]
Xiaoxue Wu, Wenjing Shan, Wei Zheng, Zhiguo Chen, Tao Ren, and Xiaobing Sun. 2023. An intelligent duplicate bug report detection method based on technical term extraction. In2023 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 1–12
work page 2023
-
[48]
Miao Xie, Qing Wang, Guowei Yang, and Mingshu Li. 2017. Cocoon: Crowdsourced testing quality maximization under context coverage constraint. In2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 316–327
work page 2017
-
[49]
Yiheng Xiong, Ting Su, Jue Wang, Jingling Sun, Geguang Pu, and Zhendong Su. 2024. General and practical property- based testing for android apps. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 53–64
work page 2024
-
[50]
Yuxuan Yang and Xin Chen. 2021. Crowdsourced test report prioritization based on text classification.IEEE Access10 (2021), 92692–92705
work page 2021
-
[51]
Juyeon Yoon, Robert Feldt, and Shin Yoo. 2024. Intent-driven mobile gui testing with autonomous large language model agents. In2024 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 129–139
work page 2024
-
[52]
Shengcheng Yu. 2019. Crowdsourced report generation via bug screenshot understanding. In2019 34th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 1277–1279
work page 2019
-
[53]
Shengcheng Yu, Chunrong Fang, Zhenfei Cao, Xu Wang, Tongyu Li, and Zhenyu Chen. 2021. Prioritize crowdsourced test reports via deep screenshot understanding. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 946–956
work page 2021
-
[54]
Shengcheng Yu, Chunrong Fang, Mingzhe Du, Zimin Ding, Zhenyu Chen, and Zhendong Su. 2024. Practical, Automated Scenario-based Mobile App Testing.IEEE Transactions on Software Engineering(2024)
work page 2024
-
[55]
Shengcheng Yu, Chunrong Fang, Xin Li, Yuchen Ling, Zhenyu Chen, and Zhendong Su. 2024. Effective, Platform- Independent GUI Testing via Image Embedding and Reinforcement Learning.ACM Transactions on Software Engineering and Methodology33, 7 (2024), 1–27
work page 2024
-
[56]
Shengcheng Yu, Chunrong Fang, Yuchen Ling, Chentian Wu, and Zhenyu Chen. 2023. Llm for test script generation and migration: Challenges, capabilities, and opportunities. In2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security. IEEE, 206–217
work page 2023
-
[57]
Shengcheng Yu, Chunrong Fang, Ziyuan Tuo, Quanjun Zhang, Chunyang Chen, Zhenyu Chen, and Zhendong Su
- [58]
-
[59]
Shengcheng Yu, Chunrong Fang, Quanjun Zhang, Zhihao Cao, Yexiao Yun, Zhenfei Cao, Kai Mei, and Zhenyu Chen
-
[60]
Mobile app crowdsourced test report consistency detection via deep image-and-text fusion understanding.IEEE Transactions on Software Engineering49, 8 (2023), 4115–4134
work page 2023
-
[61]
Shengcheng Yu, Chunrong Fang, Quanjun Zhang, Mingzhe Du, Jia Liu, and Zhenyu Chen. 2024. Semi-supervised Crowd- sourced Test Report Clustering via Screenshot-Text Binding Rules.Proceedings of the ACM on Software Engineering1, FSE (2024), 1540–1563
work page 2024
-
[62]
Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Chunyang Chen, Shaomin Zhu, and Zhenyu Chen. 2025. LLM-Guided Scenario-based GUI Testing.arXiv preprint arXiv:2506.05079(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [63]
-
[64]
Yakun Zhang, Qihao Zhu, Jiwei Yan, Chen Liu, Wenjie Zhang, Yifan Zhao, Dan Hao, and Lu Zhang. 2024. Synthesis- Based Enhancement for GUI Test Case Migration. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 869–881. Received 2025-09-02; accepted 2026-03-24 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE16...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.