WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements
Pith reviewed 2026-05-16 05:59 UTC · model grok-4.3
The pith
WebTestPilot uses a symbolization layer on GUI elements to infer pre- and post-condition oracles that let LLM agents test web apps against natural language specifications while separating hallucinations from real bugs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WebTestPilot is an LLM-based agent for end-to-end web testing that first detects and symbolizes critical GUI elements into variables and then translates a natural language specification into a sequence of steps, each equipped with inferred pre- and post-conditions over those symbols. These oracles capture dependencies that allow the agent to act as its own validator and distinguish inconsistencies caused by model hallucinations from genuine application bugs. Existing approaches either accept any crash-free navigation or examine states in isolation and therefore miss context-dependent failures.
What carries the argument
Symbolization layer that converts critical GUI elements into variables, paired with inference of pre- and post-conditions over those variables to form per-step oracles.
If this is right
- LLM agents can now perform reliable end-to-end testing against natural language specifications without needing manually written oracles.
- Context-dependent bugs that span multiple steps become detectable because oracles track data, temporal, and causal dependencies.
- The same agent generalizes across different natural language phrasings and across model scales without retraining.
- A reproducible benchmark of bug-injected web applications now exists for systematic comparison of NL-to-E2E testing methods.
- The approach directly addresses the hallucination problem that previously made LLM agents untrustworthy as oracles.
Where Pith is reading between the lines
- The same symbolization-plus-oracle pattern could be adapted to mobile or desktop interfaces where GUI elements are also extractable as structured variables.
- Adding an explicit symbolic layer may reduce error propagation in other long-horizon LLM agent tasks that require consistent state tracking.
- The benchmark could serve as a test bed for comparing LLM agents against traditional scripted or model-based testing frameworks.
- If symbolization accuracy improves with better vision models, overall bug detection rates could rise further without changes to the oracle logic.
Load-bearing premise
The symbolization layer must correctly identify the critical GUI elements, and the inferred pre- and post-conditions must accurately capture the implicit requirements without overlooking context-dependent failures.
What would settle it
A web application and natural language specification where WebTestPilot either reports a bug that is not present in the code or fails to report a real bug that violates the specification because the symbolization or oracle inference missed the relevant dependency.
Figures
read the original abstract
Visual language model (VLM) agents show great promise in automating end-to-end (E2E) web testing against requirements in natural language. However, the probabilistic nature of language models can have inherent hallucinations. Therefore, given a detected inconsistency between the requirement and the web application, it is hard to distinguish whether it stems from the hallucination or a real application bug. Addressing this issue presents two core technical challenges: the implicit oracle inference challenge, where the agent must act as its own oracle to implicitly decide if the application's behavior is correct without guidance, and the probabilistic inference challenge, where an LLM's inconsistent reasoning undermines its trustworthiness as an oracle. Existing LLM-based approaches fail to capture such implicit oracles, either by treating any page navigation that doesn't crash as a success, or by checking each state in isolation, thus missing bugs dependent on context from prior steps. We introduce WebTestPilot, an LLM-based agent designed to address these challenges. WebTestPilot uses (1) a symbolization layer which detects and symbolizes critical GUI elements on the web application into symbols (i.e., variables) and (2) translates natural language specification into a sequence of steps, each of which is equipped with inferred pre- and post-conditions over the symbols as an oracle. This oracle captures data, temporal, and causal dependencies, enabling the validation of implicit requirements. To advance research in this area, we build a benchmark of bug-injected web apps for evaluating NL-to-E2E testing. The results show that WebTestPilot achieves a task completion rate of 99%, with 96% precision and 96% recall in bug detection, outperforming the best baseline (+70 precision, +27 recall). The agent generalizes across diverse natural language inputs and model scales.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces WebTestPilot, an LLM-based agent for end-to-end web testing from natural language specifications. It uses a symbolization layer to map GUI elements to stable symbols and infers pre- and post-conditions over those symbols to act as oracles that capture data, temporal, and causal dependencies. This is intended to distinguish real bugs from model hallucinations. The authors construct a new benchmark of bug-injected web apps and report 99% task completion, 96% precision, and 96% recall in bug detection, outperforming the strongest baseline by +70 precision and +27 recall while generalizing across NL inputs and model scales.
Significance. If the performance claims hold after proper validation, the work would advance automated web testing by providing a concrete mechanism for implicit oracle inference that existing methods lack. The symbolization-plus-oracle approach and the new benchmark are useful contributions that could support follow-on research. The reported generalization across model scales is a positive signal for practical deployment.
major comments (3)
- [Evaluation] Evaluation section: benchmark construction details—including bug injection procedure, how ground-truth oracles are established, and selection criteria for the injected bugs—are not provided. These details are load-bearing for the 96% precision/recall claims, as the metrics cannot be interpreted without knowing whether the injected bugs are representative or whether the evaluation inadvertently favors the proposed oracle inference.
- [Method] Method section (symbolization and oracle inference): no accuracy metric, error analysis, or ablation is reported for the symbolization layer itself. Because the central claim rests on the assumption that symbolization reliably extracts critical elements and enables valid pre/post-condition inference, the absence of this analysis leaves the source of the +70/+27 gains unclear.
- [Results] Results section: no statistical tests, run-to-run variance, or breakdown of failure cases (e.g., symbolization errors vs. oracle mis-inference) are supplied for the headline metrics. This omission prevents assessment of whether the reported superiority over baselines is robust.
minor comments (2)
- [Abstract] Abstract: the baseline comparison states absolute gains but does not name the strongest baseline or report its absolute scores, making the improvement harder to contextualize.
- [Method] Notation: the mapping from GUI elements to symbols is described at a high level; a small example showing an actual page state, the extracted symbols, and the resulting pre/post-conditions would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the manuscript requires additional details and analyses to support the claims and will revise accordingly.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: benchmark construction details—including bug injection procedure, how ground-truth oracles are established, and selection criteria for the injected bugs—are not provided. These details are load-bearing for the 96% precision/recall claims, as the metrics cannot be interpreted without knowing whether the injected bugs are representative or whether the evaluation inadvertently favors the proposed oracle inference.
Authors: We agree that the benchmark details are insufficient. In the revised manuscript we will add a dedicated subsection in Evaluation describing the bug injection procedure (with concrete examples of data, temporal, and causal bugs), the process for establishing ground-truth oracles via independent expert annotation of each test case, and the selection criteria used to ensure the injected bugs are representative of real-world web faults and not biased toward our oracle mechanism. revision: yes
-
Referee: [Method] Method section (symbolization and oracle inference): no accuracy metric, error analysis, or ablation is reported for the symbolization layer itself. Because the central claim rests on the assumption that symbolization reliably extracts critical elements and enables valid pre/post-condition inference, the absence of this analysis leaves the source of the +70/+27 gains unclear.
Authors: We acknowledge the absence of direct validation for the symbolization layer. We will augment the Method section with (1) an accuracy metric for symbolization on a held-out set of pages, (2) a qualitative error analysis of common failure modes, and (3) an ablation that removes the symbolization layer to quantify its contribution to the observed gains over baselines. revision: yes
-
Referee: [Results] Results section: no statistical tests, run-to-run variance, or breakdown of failure cases (e.g., symbolization errors vs. oracle mis-inference) are supplied for the headline metrics. This omission prevents assessment of whether the reported superiority over baselines is robust.
Authors: We will strengthen the Results section by adding statistical significance tests (e.g., McNemar’s test for paired comparisons), reporting run-to-run variance obtained by re-executing the experiments with different random seeds, and providing a breakdown of failure cases categorized by source (symbolization errors, oracle mis-inference, navigation failures, etc.). These additions will allow readers to assess the robustness of the +70 precision and +27 recall improvements. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes an LLM-based agent with a symbolization layer and inferred pre/post-condition oracles, evaluated empirically on a newly constructed benchmark of bug-injected apps. No equations, fitted parameters, or self-citations are presented that reduce the reported 99% task completion or 96% precision/recall metrics to inputs by construction. The performance claims rest on external evaluation rather than definitional equivalence or load-bearing self-references.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Symbolized GUI elements can be used to infer pre- and post-conditions that capture data, temporal, and causal dependencies in web applications
invented entities (1)
-
Symbolization layer
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Parsa Alian, Noor Nashid, Mobina Shahbandeh, and Ali Mesbah. 2024. Semantic constraint inference for web form test generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 932–944
work page 2024
-
[2]
Parsa Alian, Noor Nashid, Mobina Shahbandeh, Taha Shabani, and Ali Mesbah. 2025. Feature-Driven End-to-End Test Generation .2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)(2025), 450–462. https://doi.org/10.1109/ICSE55347.2025.00141
-
[3]
Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, and Michael D Ernst. 2008. Finding bugs in dynamic web applications. InProceedings of the 2008 international symposium on Software testing and analysis. 261–272
work page 2008
-
[4]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. 2025. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Kesina Baral, John Johnson, Junayed Mahmud, Sabiha Salma, Mattia Fazzini, Julia Rubin, Jeff Offutt, and Kevin Moran
-
[6]
InProceedings of the 21st International Conference on Mining Software Repositories
Automating gui-based test oracles for mobile apps. InProceedings of the 21st International Conference on Mining Software Repositories. 309–321
-
[7]
Matteo Biagiola, Filippo Ricca, and Paolo Tonella. 2017. Search based path and input data generation for web application testing. InInternational Symposium on Search Based Software Engineering. Springer, 18–32
work page 2017
-
[8]
Matteo Biagiola, Andrea Stocco, Filippo Ricca, and Paolo Tonella. 2019. Diversity-based web test generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 142–153
work page 2019
-
[9]
https://github.com/BookStackApp/BookStack
Bookstack 2015. https://github.com/BookStackApp/BookStack
work page 2015
-
[10]
Xiaoning Chang, Zheheng Liang, Yifei Zhang, Lei Cui, Zhenyue Long, Guoquan Wu, Yu Gao, Wei Chen, Jun Wei, and Tao Huang. 2023. A reinforcement learning approach to generating test cases for web applications. In2023 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 13–23
work page 2023
-
[11]
Antoine Chevrot, Alexandre Vernotte, Jean-Rémy Falleri, Xavier Blanc, Bruno Legeard, and Aymeric Cretin. 2025. Are Autonomous Web Agents Good Testers?Proceedings of the ACM on Software Engineering2, ISSTA (2025), 206–228
work page 2025
-
[12]
Anna Corazza, Sergio Di Martino, Adriano Peron, and Luigi Libero Lucio Starace. 2021. Web application testing: Using tree kernels to detect near-duplicate states in automated model inference. InProceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–6
work page 2021
- [13]
-
[14]
Sergio Di Meglio, Luigi Libero Lucio Starace, Valeria Pontillo, Ruben Opdebeeck, Coen De Roover, and Sergio Di Martino
-
[15]
In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)
E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). IEEE, 836–840
-
[16]
Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. 2020. Time-travel testing of android apps. In Proceedings of the ACM/IEEE 42nd international conference on software engineering. 481–492
work page 2020
-
[17]
Amin Milani Fard and Ali Mesbah. 2013. Feedback-directed exploration of web applications to derive test models.. In ISSRE, Vol. 13. 278–287
work page 2013
-
[18]
Boni García, Maurizio Leotta, Filippo Ricca, and Jim Whitehead. 2024. Use of chatgpt as an assistant in the end-to-end test script generation for android apps. InProceedings of the 15th ACM International Workshop on Automating Test Case Design, Selection and Evaluation. 5–11. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE087. Publication date: July 2...
work page 2024
-
[19]
Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, and Yu Su. 2025. Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=kxnoqaisCT
work page 2025
-
[20]
https://github.com/marmelab/gremlins.js/
gremlin.js 2014. https://github.com/marmelab/gremlins.js/
work page 2014
- [21]
- [22]
-
[23]
https://www.qt.io/quality-assurance/squish
https://www.qt.io/quality-assurance/squish 2003. https://www.qt.io/quality-assurance/squish
work page 2003
-
[24]
Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: using machine learning to synthesize robust, reusable UI tests. InProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 269–282
work page 2018
-
[25]
Yongxiang Hu, Xuan Wang, Yingchuan Wang, Yu Zhang, Shiyu Guo, Chaoyi Chen, Xin Wang, and Yangfan Zhou
- [26]
- [27]
-
[28]
https://github.com/invoiceninja/invoiceninja
Invoice Ninja 2018. https://github.com/invoiceninja/invoiceninja
work page 2018
-
[29]
https://github.com/lavague-ai/LaVague
LaVague 2024. https://github.com/lavague-ai/LaVague
work page 2024
-
[30]
Maurizio Leotta, Hafiz Zeeshan Yousaf, Filippo Ricca, and Boni Garcia. 2024. Ai-generated test scripts for web e2e testing with chatgpt and copilot: A preliminary study. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 339–344
work page 2024
- [31]
-
[32]
Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, and Tat-Seng Chua
-
[33]
InProceedings of the 33rd ACM International Conference on Multimedia
Screenspot-pro: Gui grounding for professional high-resolution computer use. InProceedings of the 33rd ACM International Conference on Multimedia. 8778–8786
- [34]
-
[35]
Chenxu Liu, Junheng Wang, Wei Yang, Ying Zhang, and Tao Xie. 2025. Judge: Effective State Abstraction for Guiding Automated Web GUI Testing.ACM Transactions on Software Engineering and Methodology(2025)
work page 2025
-
[36]
Ruofan Liu, Xiwen Teoh, Yun Lin, Guanjie Chen, Ruofei Ren, Denys Poshyvanyk, and Jin Song Dong. 2025. GUIPilot: A Consistency-Based Mobile GUI Testing Approach for Detecting Application-Specific Bugs.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 753–776
work page 2025
-
[37]
Xinyue Liu, Zihe Song, Weike Fang, Wei Yang, and Weihang Wang. 2024. Wefix: Intelligent automatic generation of explicit waits for efficient web end-to-end flaky tests. InProceedings of the ACM Web Conference 2024. 3043–3052
work page 2024
-
[38]
Zhe Liu, Chunyang Chen, Junjie Wang, Xing Che, Yuekai Huang, Jun Hu, and Qing Wang. 2023. Fill in the blank: Context-aware automated text input generation for mobile gui testing. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1355–1367
work page 2023
- [39]
-
[40]
Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13
work page 2024
-
[41]
Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Zhilin Tian, Yuekai Huang, Jun Hu, and Qing Wang. 2024. Testing the limits: Unusual text inputs generation for mobile app crash detection with large language model. InProceedings of the IEEE/ACM 46th International conference on software engineering. 1–12
work page 2024
- [42]
- [43]
-
[44]
Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Zheshen Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, and Dakuo Wang. 2025. Uxagent: An llm agent-based usability testing framework for web design. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–12
work page 2025
-
[45]
Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective automated testing for android applications. In Proceedings of the 25th international symposium on software testing and analysis. 94–105. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE087. Publication date: July 2026. WebTestPilot: Agentic End-to-End Web Testing against Natural Language ...
work page 2016
-
[46]
Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, and Mauro Santoro. 2011. AutoBlackTest: a tool for automatic black-box testing. InProceedings of the 33rd international conference on software engineering. 1013–1015
work page 2011
-
[47]
Ali Mesbah, Engin Bozdag, and Arie Van Deursen. 2008. Crawling Ajax by inferring user interface state changes. In 2008 eighth international conference on web engineering. IEEE, 122–134
work page 2008
-
[48]
Ali Mesbah, Arie Van Deursen, and Danny Roest. 2011. Invariant-based automatic testing of modern web applications. IEEE Transactions on Software Engineering38, 1 (2011), 35–53
work page 2011
-
[49]
https://developer.android.com/studio/test/other-testing-tools/monkey
Monkey 2023. https://developer.android.com/studio/test/other-testing-tools/monkey
work page 2023
-
[50]
Dario Olianas, Maurizio Leotta, Filippo Ricca, Matteo Biagiola, and Paolo Tonella. 2021. STILE: a tool for parallel execution of E2E web test scripts. In2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 460–465
work page 2021
-
[51]
Yu Pei, Jeongju Sohn, Sarra Habchi, and Mike Papadakis. 2025. Non-flaky and nearly optimal time-based treatment of asynchronous wait web tests.ACM Transactions on Software Engineering and Methodology34, 2 (2025), 1–29
work page 2025
-
[52]
Sven Peldszus, Noubar Akopian, and Thorsten Berger. 2023. RobotBT: Behavior-tree-based test-case specification for the robot framework. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1503–1506
work page 2023
-
[53]
Chao Peng, Zhengwei Lv, Jiarong Fu, Jiayuan Liang, Zhao Zhang, Ajitha Rajan, and Ping Yang. 2024. Hawkeye: Change-targeted testing for android apps based on deep reinforcement learning. InProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. 298–308
work page 2024
- [54]
-
[55]
Progressive Web Apps Market Size, Share & Trends Analysis Report, 2024–2030 2024. https://www.grandviewresearch. com/industry-analysis/progressive-web-apps-pwa-market-report
work page 2024
-
[56]
Dezhi Ran, Hao Wang, Zihe Song, Mengzhou Wu, Yuan Cao, Ying Zhang, Wei Yang, and Tao Xie. 2024. Guardian: A runtime framework for LLM-based UI exploration. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 958–970
work page 2024
- [57]
-
[58]
Sabiha Salma, SM Hasan Mansur, Yule Zhang, and Kevin Moran. 2024. GuiEvo: Automated Evolution of Mobile App UIs. InProceedings of the 21st International Conference on Mining Software Repositories. 335–347
work page 2024
- [59]
-
[60]
Fei Shao, Rui Xu, Wasif Haque, Jingwei Xu, Ying Zhang, Wei Yang, Yanfang Ye, and Xusheng Xiao. 2021. Webevo: taming web application evolution via detecting semantic structure changes. InProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 16–28
work page 2021
-
[61]
Salman Sherin, Asmar Muqeet, Muhammad Uzair Khan, and Muhammad Zohaib Iqbal. 2023. QExplore: An exploration strategy for dynamic web applications using guided search.Journal of Systems and Software195 (2023), 111512
work page 2023
-
[62]
https://katalon.com/reports/state-quality-2024
State of Software Quality Report 2024. https://katalon.com/reports/state-quality-2024
work page 2024
- [63]
-
[64]
Ting Su, Lingling Fan, Sen Chen, Yang Liu, Lihua Xu, Geguang Pu, and Zhendong Su. 2020. Why my app crashes? understanding and benchmarking framework-specific exceptions of android apps.IEEE Transactions on Software Engineering48, 4 (2020), 1115–1137
work page 2020
-
[65]
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su
-
[66]
InProceedings of the 2017 11th joint meeting on foundations of software engineering
Guided, stochastic model-based GUI testing of Android apps. InProceedings of the 2017 11th joint meeting on foundations of software engineering. 245–256
work page 2017
-
[67]
Ting Su, Jue Wang, and Zhendong Su. 2021. Benchmarking automated gui testing for android against real-world bugs. InProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 119–130
work page 2021
-
[68]
Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang, and Jeffrey Nichols. 2024. Axnav: Replaying accessibility tests from natural language. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–16
work page 2024
-
[69]
https://d3.harvard.edu/platform-rctom/submission/the-failed- launch-of-www-healthcare-gov/
The Failed Launch Of www.HealthCare.gov 2016. https://d3.harvard.edu/platform-rctom/submission/the-failed- launch-of-www-healthcare-gov/
work page 2016
-
[70]
The Payroll System That Cost Queensland Health AU1.25 Billion [n. d.]. https://www.henricodolfing.com/2019/12/ project-failure-case-study-queensland-health.html
work page 2019
-
[71]
Siyi Wang, Sinan Wang, Yujia Fan, Xiaolei Li, and Yepang Liu. 2024. Leveraging large vision-language model for better automatic web GUI testing. In2024 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 125–137. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE087. Publication date: July 2026. FSE087:24 Teoh et al
work page 2024
- [72]
- [73]
-
[74]
Thomas D White, Gordon Fraser, and Guy J Brown. 2019. Improving random GUI testing with image-based widget detection. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 307–317
work page 2019
-
[75]
Yiheng Xiong, Ting Su, Jue Wang, Jingling Sun, Geguang Pu, and Zhendong Su. 2024. General and practical property- based testing for android apps. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 53–64
work page 2024
-
[76]
Rahulkrishna Yandrapally, Andrea Stocco, and Ali Mesbah. 2020. Near-duplicate detection in web app model inference. InProceedings of the ACM/IEEE 42nd international conference on software engineering. 186–197
work page 2020
-
[77]
Rahul Krishna Yandrapally and Ali Mesbah. 2022. Fragment-based test generation for web apps.IEEE Transactions on Software Engineering49, 3 (2022), 1086–1101
work page 2022
-
[78]
Juyeon Yoon, Robert Feldt, and Shin Yoo. 2024. Intent-driven mobile gui testing with autonomous large language model agents. In2024 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 129–139
work page 2024
-
[79]
Shengcheng Yu, Chunrong Fang, Mingzhe Du, Yuchen Ling, Zhenyu Chen, and Zhendong Su. 2024. Practical non- intrusive GUI exploration testing with visual-based robotic arms. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13
work page 2024
-
[80]
Shengcheng Yu, Chunrong Fang, Xin Li, Yuchen Ling, Zhenyu Chen, and Zhendong Su. 2024. Effective, platform- independent gui testing via image embedding and reinforcement learning.ACM Transactions on Software Engineering and Methodology33, 7 (2024), 1–27
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.