ALADDIN is a user-requirement-driven GUI test generation framework that incrementally navigates mobile app UIs and builds LLM-guided oracles to validate both correct and faulty user-requested functionalities across six apps.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 5years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Introduces contextualized code pretraining with caller-callee pairs from static analysis to train CallerGen models that outperform baselines on the new CallerEval benchmark.
Hallucination Inspector verifies symbols in LLM-generated API migration code against a documentation-derived knowledge base using AST extraction, identifying scaffolding hallucinations and cutting false positives versus standard metrics in preliminary Android tests.
An empirical study of real-world issues yields a taxonomy of 34 fault types, symptoms, and root causes in agentic AI systems, validated by 145 practitioners.
An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code generation.
citing papers explorer
-
Automated Functional Testing for Malleable Mobile Application Driven from User Intent
ALADDIN is a user-requirement-driven GUI test generation framework that incrementally navigates mobile app UIs and builds LLM-guided oracles to validate both correct and faulty user-requested functionalities across six apps.
-
Contextualized Code Pretraining for Code Generation
Introduces contextualized code pretraining with caller-callee pairs from static analysis to train CallerGen models that outperform baselines on the new CallerEval benchmark.
-
Hallucination Inspector: A Fact-Checking Judge for API Migration
Hallucination Inspector verifies symbols in LLM-generated API migration code against a documentation-derived knowledge base using AST extraction, identifying scaffolding hallucinations and cutting false positives versus standard metrics in preliminary Android tests.
-
Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes
An empirical study of real-world issues yields a taxonomy of 34 fault types, symptoms, and root causes in agentic AI systems, validated by 145 practitioners.
-
TDD Governance for Multi-Agent Code Generation via Prompt Engineering
An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code generation.