pith. sign in

arxiv: 2605.16544 · v1 · pith:QHCRUC5Enew · submitted 2026-05-15 · 💻 cs.SE · cs.GR· cs.HC

TARIPlay: A Test Framework for AR Applications based on Interactive Area Tracking in Playback Videos

Pith reviewed 2026-05-20 16:07 UTC · model grok-4.3

classification 💻 cs.SE cs.GRcs.HC
keywords AR testingplayback videosautomated testinginteractive area trackingtest coverageAR applicationsbranch coveragevideo analysis
0
0 comments X

The pith

TARIPlay identifies stable and visible interactive areas in AR playback videos to guide automated tests achieving higher branch coverage than Monkey.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TARIPlay as a framework that analyzes playback videos of AR apps to detect, track, and filter interactive areas using stability and visibility criteria. This addresses the core difficulty that AR interfaces come from volatile real-world environments rather than fixed layouts, so recorded videos must supply both timing and location data for tests. The system then supplies these filtered areas to an automated testing engine that simulates taps and other inputs at the right moments. A sympathetic reader cares because reliable automated testing for AR could improve quality and safety without requiring constant live sessions in changing physical spaces. Evaluation on four open-source apps and nine videos shows 55.8 percent branch coverage of AR-related code compared with 41.98 percent from the existing Monkey tool.

Core claim

TARIPlay analyzes playback videos to detect, track, and filter proper interactive areas over time for automated testing. In particular, TARIPlay identifies viable test opportunities based on criteria like stability and visibility, then feeds this information to an automated testing engine to simulate user interactions. Evaluation results with four open-source AR apps and nine playback videos show that TARIPlay significantly outperforms the existing tool Monkey in test coverage of AR-related code, achieving 55.8 percent branch coverage versus 41.98 percent, and can also be used to assess the quality of playback videos for testing suitability.

What carries the argument

Interactive area tracking mechanism that applies stability and visibility criteria to frames in playback videos to identify and follow dynamic, irregular test opportunities for input simulation.

If this is right

  • Automated testing engines receive timed and located inputs from video analysis instead of attempting to guess dynamic AR surfaces.
  • Playback videos can be scored for testing value by counting how many stable and visible interactive areas they contain.
  • AR apps can reuse real-world recordings for repeated test runs without re-executing the physical scenario each time.
  • Branch coverage on AR-specific code improves because tests focus on environment-derived interaction points rather than generic random inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same video-based tracking could support testing of other apps whose interfaces depend on sensor data or external scenes.
  • Developers might first run the framework on candidate recordings to decide which real-world captures are worth keeping for regression suites.
  • Combining the area tracker with existing GUI testing tools could extend coverage to mixed AR and traditional screen elements.
  • If the stability criteria hold across varied lighting and motion, the method could reduce the total number of live recordings needed during development.

Load-bearing premise

Stability and visibility criteria applied to playback video frames will reliably identify interactive areas that correspond to actual user-testable opportunities in live AR sessions.

What would settle it

Applying TARIPlay to a new set of AR playback videos where the detected areas miss major user interactions that occur in the corresponding live sessions, yielding branch coverage no better than Monkey.

Figures

Figures reproduced from arXiv: 2605.16544 by Seyed Amir Mousavi, Xiaoyin Wang.

Figure 1
Figure 1. Figure 1: Dynamics of Planes in 2 seconds not be proper interactive areas for testing (and human users typi￾cally do not interact with them). To illustrate the two challenges, in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of Steps in Sutherland-Hodgman Algo [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of TARIPlay Android Y-axis correction: 𝑥screen = (𝑥ndc + 1) 2 ·𝑊 𝑦screen =  1 − 𝑦𝑛𝑑𝑐 + 1 2  · 𝐻 where 𝑊 and 𝐻 are screen dimensions, and (𝑥ndc, 𝑦ndc) = (𝑥screen/𝑤screen, 𝑦screen/𝑤screen). After all vertices of a track￾able polygon are projected to the phone screen plane, we can connect them to form the projection area. • Polygon Clipping of Projection Areas: after projected areas are calculated,… view at source ↗
Figure 5
Figure 5. Figure 5: A Exemplar Gantt Chart Output of TARIPlay shorter than two seconds and report the remaining as test oppor￾tunities. We choose two seconds as the default threshold because it takes about 0.6 seconds for a human being to reflect on a visual stimulus and give hand response [10] [37], so two is the fewest number of whole second that allows two consecutive UI events to be triggered naturally. In addition, the s… view at source ↗
Figure 4
Figure 4. Figure 4: Visible Box Intersection in Life Span Analysis [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Coverage Comparison on Interaction Code happen. Our performance varies across videos because some videos do not have enough test opportunities for complicated gestures (e.g., dragging for precise movements) supported in the app. 4.4.3 RQ2: Coverage vs Complexity. Results in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Correlation Between Coverage and Video Factors, [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

As Augmented Reality (AR) becomes more and more embedded in daily life, ensuring the quality, safety, and reliability of AR applications is increasingly important. However, AR apps present unique challenges for automated testing. Unlike static GUI layouts in traditional mobile apps, AR apps acquire their interaction interface from the surrounding environment, which is volatile and non-deterministic. Recent advancements like ARCore Playback and ARKit Replay allow developers to reuse real-world scenarios by recording and playing back enriched videos, enabling more feasible automated AR testing. However, using playback videos introduces two major challenges: test inputs must be timed precisely, and interactive areas in the video are dynamic, irregular, and difficult to identify. To address these challenges, we propose TARIPlay, a framework that analyzes playback videos to detect, track, and filter proper interactive areas over time for automated testing. In particular, TARIPlay identifies viable test opportunities based on criteria like stability and visibility, then feeds this information to an automated testing engine to simulate user interactions. We perform an experiment with four open-source AR apps and nine playback videos. Evaluation results show that TARIPlay significantly outperforms the existing tool Monkey in test coverage (55.8% over 41.98% on branch coverage) of AR-related code, and can also be used to assess the quality of playback videos for testing suitability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces TARIPlay, a framework for testing AR applications using playback videos. It detects and tracks interactive areas in videos by applying stability and visibility criteria to identify viable test opportunities, which are then used by an automated testing engine to simulate user interactions. Experiments with four open-source AR apps and nine playback videos demonstrate that TARIPlay achieves higher branch coverage (55.8%) of AR-related code compared to the Monkey tool (41.98%).

Significance. If the central claim holds after addressing validation gaps, this could meaningfully advance automated testing for AR apps by leveraging playback features from ARCore and ARKit to handle dynamic interfaces. The use of external open-source apps as benchmarks supports reproducibility. The reported coverage numbers provide a concrete basis for comparison, though the lack of live-session grounding limits immediate impact.

major comments (2)
  1. [Evaluation] Evaluation section: The abstract and results report concrete branch coverage (55.8% vs. 41.98%) but supply no details on how coverage was measured for AR-related code, which code was isolated, how the nine videos were selected, or any statistical tests. This is load-bearing for the outperformance claim over Monkey.
  2. [Approach] Approach section: The core step filters tracked areas using stability and visibility criteria on playback video frames to identify 'viable test opportunities.' No ground-truth comparison to live AR sessions, user-study alignment, or replay validation is described to confirm these areas match actual user-testable interactions in volatile, sensor-driven environments.
minor comments (2)
  1. [Approach] Clarify the precise definitions and thresholds for stability and visibility (e.g., via pseudocode or parameter values) to improve reproducibility.
  2. [Abstract] The abstract claims 'significantly outperforms' without effect sizes or variance measures; add these in the evaluation for precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We have addressed each of the major comments below and will incorporate revisions where appropriate to improve the clarity and rigor of the paper.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The abstract and results report concrete branch coverage (55.8% vs. 41.98%) but supply no details on how coverage was measured for AR-related code, which code was isolated, how the nine videos were selected, or any statistical tests. This is load-bearing for the outperformance claim over Monkey.

    Authors: We agree with the referee that more methodological details are necessary to substantiate the reported coverage improvements. In the revised manuscript, we will add a dedicated subsection in the Evaluation section detailing: the coverage measurement tool and process (using code coverage frameworks compatible with Android AR apps), the specific isolation of AR-related code by identifying packages and classes that directly interface with ARCore APIs, the rationale and criteria for selecting the nine playback videos to represent a variety of real-world AR scenarios, and the application of statistical tests such as Wilcoxon signed-rank test to assess the significance of the 55.8% vs. 41.98% difference. This will strengthen the evaluation claims. revision: yes

  2. Referee: [Approach] Approach section: The core step filters tracked areas using stability and visibility criteria on playback video frames to identify 'viable test opportunities.' No ground-truth comparison to live AR sessions, user-study alignment, or replay validation is described to confirm these areas match actual user-testable interactions in volatile, sensor-driven environments.

    Authors: We acknowledge that a direct validation against live AR sessions is not provided in the current manuscript. The playback videos are derived from real AR sessions recorded with ARCore, preserving the sensor-driven dynamics. The stability criterion ensures areas persist across frames despite minor movements, and visibility ensures they are not occluded, which are key properties for interactive areas in AR. We will revise the Approach section to include a more detailed justification of these criteria based on AR literature and add a limitations paragraph discussing the challenges of live vs. playback validation due to environmental volatility. Future work could involve user studies for alignment. revision: partial

Circularity Check

0 steps flagged

No significant circularity in TARIPlay evaluation

full rationale

The paper describes an empirical testing framework that applies stability and visibility filters to tracked areas in playback videos, then evaluates the resulting test coverage on four independent open-source AR applications using nine playback videos. The central result (55.8% branch coverage versus Monkey's 41.98%) is obtained by direct execution and measurement against external benchmarks rather than any fitted parameter, self-referential definition, or derivation that reduces to the method's own inputs. No equations, uniqueness theorems, or ansatzes appear in the provided description, and the evaluation chain remains self-contained against the chosen open-source apps and deterministic recordings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that recorded playback videos faithfully capture the timing and spatial properties needed for realistic AR testing; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption ARCore Playback and ARKit Replay videos provide sufficiently accurate and reusable representations of real-world AR interaction scenarios
    Invoked when the paper states that these recordings enable more feasible automated AR testing.

pith-pipeline@v0.9.0 · 5778 in / 1253 out tokens · 49329 ms · 2026-05-20T16:07:25.764159+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    AutoWalk: Automated Accessibility Testing

    2024. AutoWalk: Automated Accessibility Testing. https://github.com/. Accessed December 2024

  2. [2]

    Amazon. 2024. Amazon Shopping. https://www.amazon.com/. Accessed December 2024

  3. [3]

    Apple. 2024. ARKit: Recording and Replaying AR Session Data. https://developer. apple.com/documentation/arkit/recording-and-replaying-ar-session-data. Ac- cessed December 2024

  4. [4]

    Apple. 2025. ARKit Recording and Replaying. https://developer.apple.com/ documentation/arkit/arsession/recording_and_replaying_ar_session_data Ac- cessed March 2025

  5. [5]

    Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. InProceedings of the 33rd International Conference on Software Engineering(Waikiki, Honolulu, HI, USA)(ICSE ’11). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/1985793.1985795

  6. [6]

    Young-Min Baek and Doo-Hwan Bae. 2016. Automated model-based Android GUI testing using multi-level GUI comparison criteria. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 238–249

  7. [7]

    Diego Correa, María Cecilia Bastarrica, and Renzo Angles. 2018. Automated GUI Testing of Android Apps: A Systematic Mapping Study. InProceedings of the 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS 2018). IEEE, 358–365

  8. [8]

    Pedro Costa, Ana CR Paiva, and Miguel Nabuco. 2014. Pattern based GUI testing for mobile applications. In2014 9th International Conference on the Quality of Information and Communications Technology. IEEE, 66–74

  9. [9]

    Zhicheng Ding, Zhixin Lai, Siyang Li, Panfeng Li, Qikai Yang, and Edward Wong. 2024. Confidence Trigger Detection: Accelerating Real-Time Tracking- by-Detection Systems. In2024 5th International Conference on Electronic Commu- nication and Artificial Intelligence (ICECAI). 587–592. https://doi.org/10.1109/ ICECAI62591.2024.10674884

  10. [10]

    Alastair G Gale. 1997. Human response to visual stimuli. InThe perception of visual information. Springer, 127–147

  11. [11]

    Google. 2024. ARCore: Google Play Services for AR. https://developers.google. com/ar. Accessed October 2024

  12. [12]

    Google. 2024. Google ARCore Playback. https://developers.google.com/ar/ develop/recording-and-playback. Accessed December 2024

  13. [13]

    Google. 2024. Google Lens. https://lens.google/. Accessed December 2024

  14. [14]

    Google. 2025. Plane. https://developers.google.com/ar/reference/java/com/ google/ar/core/Plane Accessed March 2025

  15. [15]

    Google. 2025. UI Automator. https://developer.android.com/training/testing/ui- automator.html. Accessed May 2025

  16. [16]

    Google. 2025. UI/Application Exerciser Monkey. https://developer.android.com/ studio/test/other-testing-tools/monkey. Accessed June 2025

  17. [17]

    Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI testing of An- droid applications via model abstraction and refinement. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 269–280

  18. [18]

    Lin Gui, Chunrong Fang, Zhihong Zhao, and Qingkai Shi. 2022. An Empiri- cal Study of Automated Testing for Mobile Games. InProceedings of the 44th International Conference on Software Engineering (ICSE 2022). ACM, 1–12

  19. [19]

    Patrick Harms. 2019. Automated Usability Evaluation of Virtual Reality Ap- plications. InProceedings of the 11th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS 2019). ACM, 1–12

  20. [20]

    1997.Computer graphics, C version

    Donald Hearn. 1997.Computer graphics, C version. Pearson Education India

  21. [21]

    IKEA. 2024. IKEA. https://www.ikea.com/. Accessed December 2024

  22. [22]

    Youngjun Kim, Hannah Kim, and Yong Oock Kim. 2017. Virtual reality and augmented reality in plastic surgery: a review.Archives of plastic surgery44, 03 (2017), 179–187

  23. [23]

    Pavneet Singh Kochhar, Ferdian Thung, and David Lo. 2015. Code coverage and test suite effectiveness: Empirical study with real bugs in large systems. In2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER). IEEE, 560–564

  24. [24]

    Shuqing Li, Cuiyun Gao, Jianping Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, and Michael R Lyu. 2024. Less cybersickness, please: Demystifying and detecting stereoscopic visual inconsistencies in virtual reality apps.Proceedings of the ACM on Software Engineering1, FSE (2024), 2167–2189

  25. [25]

    Shuqing Li, Qisheng Zheng, Cuiyun Gao, Jia Feng, and Michael R Lyu. 2025. Ex- tended Reality Cybersickness Assessment via User Review Analysis.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 1303–1325

  26. [26]

    Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, and Jan Kautz. 2019. PlaneRCNN: 3d plane detection and reconstruction from a single image. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4450–4459

  27. [27]

    Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. InProceedings of the IEEE/ACM 46th International Conference on Software Engi- neering. 1–13

  28. [28]

    Sascha Minor, Vix Kemanji Ketoma, and Gerrit Meixner. 2023. Test automation for augmented reality applications: a development process model and case study. i-com(2023). https://doi.org/10.1515/icom-2023-0029

  29. [29]

    Jacinto Molina, Xue Qin, and Xiaoyin Wang. 2021. Automatic extraction of code dependency in virtual reality software. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 381–385

  30. [30]

    Murphy, Eleni Stroulia, and Paul Sorenson

    Gail C. Murphy, Eleni Stroulia, and Paul Sorenson. 2014. Cowboys, Ankle Sprains, and Keepers of Quality: How Is Video Game Development Different from Software Development?. InProceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 1–11

  31. [31]

    Niantic. 2024. Pokémon GO. https://pokemongolive.com/. Accessed December 2024

  32. [32]

    Erik G Nilsson. 2009. Design patterns for user interface for mobile applications. Advances in engineering software40, 12 (2009), 1318–1328

  33. [33]

    Xintao Niu, Changhai Nie, Hareton Leung, Yu Lei, Xiaoyin Wang, Jiaxi Xu, and Yan Wang. 2018. An interleaving approach to combinatorial testing and failure- inducing interaction identification.IEEE Transactions on Software Engineering 46, 6 (2018), 584–615

  34. [34]

    Pranav Parekh, Shireen Patel, Nivedita Patel, and Manan Shah. 2020. Systematic review and meta-analysis of augmented reality in medicine, retail, and games. Visual Computing for Industry, Biomedicine, and Art3, 1 (2020), 21. https://doi. org/10.1186/s42492-020-00057-7

  35. [35]

    Nikolai Pärsch, Clemens Harnischmacher, Martin Baumann, Arnd Engeln, and Lutz Krauß. 2019. Designing Augmented Reality Navigation Visualizations for the Vehicle: A Question of Real World Object Coverage?. InHCI in Mobility, Transport, and Automotive Systems: First International Conference, MobiTAS 2019, Held as Part of the 21st HCI International Conferenc...

  36. [36]

    Luca Pascarella, Franz Schwerfeger, Fabio Palomba, and Alberto Bacchelli. 2018. Video-based Reproducing of User Interaction for Android Apps. InProceedings of the 5th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft 2018). ACM, 9–19

  37. [37]

    Marcel Pfister, Jaw-Chyng L Lue, Francisco R Stefanini, Paulo Falabella, Laurie Dustin, Michael J Koss, and Mark S Humayun. 2014. Comparison of reaction re- sponse time between hand and foot controlled devices in simulated microsurgical testing.BioMed research international2014, 1 (2014), 769296

  38. [38]

    Xue Qin, Hao Zhong, and Xiaoyin Wang. 2019. TestMig: Migrating GUI Test Cases from iOS to Android. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). ACM, 270–281

  39. [39]

    Tahmid Rafi, Xueling Zhang, and Xiaoyin Wang. 2022. PreDART: Towards automatic oracle prediction of object placements in augmented reality testing. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13

  40. [40]

    Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 658–666

  41. [41]

    Irving Rodriguez and Xiaoyin Wang. 2017. An empirical study of open source virtual reality software projects. In2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 474–475

  42. [42]

    Dhia Eddine Rzig, Foyzul Hassan, and Chakkrit Tantithamthavorn. 2023. VR- Guide: Efficient Testing of VR Scenes via Dynamic Cut Edges. InProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023). IEEE, 1–13

  43. [43]

    Kabir S Said, Liming Nie, Adekunle A Ajibode, and Xueyi Zhou. 2020. GUI testing for mobile applications: objectives, approaches and challenges. InProceedings of the 12th Asia-Pacific Symposium on Internetware. 51–60

  44. [44]

    Juliana Saraiva, Eduardo Aranha, and Eduardo de Almeida. 2020. Automated Functional Testing for Mobile Applications: A Systematic Mapping Study. In Proceedings of the 28th International Conference on Program Comprehension (ICPC 2020). ACM, 283–294

  45. [45]

    Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis D Breaux, and Jianwei Niu. 2016. Toward a framework for detecting privacy policy violations in android application code. InProceedings of the 38th International conference on software engineering. 25–36

  46. [46]

    Andrea Stocco, Michael Weiss, Marco Calzana, and Paolo Tonella. 2020. Misbe- haviour Prediction for Autonomous Driving Systems . In2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 359–371. https://doi.org/10.1145/3377811.3380353

  47. [47]

    Ivan E Sutherland and Gary W Hodgman. 1974. Reentrant polygon clipping. Commun. ACM17, 1 (1974), 32–42

  48. [48]

    Accessed in June 2025

    Unity Technologies. Accessed in June 2025. Unity MARS. https://unity.com/ products/unity-mars

  49. [49]

    Simon Thorpe, Denis Fize, and Catherine Marlot. 1996. Speed of processing in the human visual system.Nature381 (1996), 520–522. TARIPlay: A Test Framework for AR Applications based on Interactive Area Tracking in Playback Videos

  50. [50]

    Rufin VanRullen and Simon J. Thorpe. 2001. The time course of visual processing: from early perception to decision-making.Journal of Cognitive Neuroscience13, 4 (2001), 454–461

  51. [51]

    Arihant Singh Verma, Aditya Singh Verma, Sourabh Singh Verma, and Harish Sharma. 2023. 3 A comprehensive study for recent trends of AR/VR technology in real world scenarios.Handbook of Augmented and Virtual Reality1 (2023), 31

  52. [52]

    Xiaoyin Wang, Xue Qin, Mitra Bokaei Hosseini, Rocky Slavin, Travis D Breaux, and Jianwei Niu. 2018. Guileak: Tracing privacy policy claims on user input data for android applications. InProceedings of the 40th International Conference on Software Engineering. 37–47

  53. [53]

    Xiaoyin Wang, Tahmid Rafi, and Na Meng. 2023. Vrguide: Efficient testing of vir- tual reality scenes via dynamic cut coverage. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 951–962

  54. [54]

    Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, and Peng Gao. 2019. Iconintent: automatic identification of sensitive ui widgets based on icon clas- sification for android apps. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 257–268

  55. [55]

    Xiaoyi Yang, Yuxing Wang, Tahmid Rafi, Dongfang Liu, Xiaoyin Wang, and Xuel- ing Zhang. 2024. Towards automatic oracle prediction for ar testing: Assessing virtual object placement quality under real-world scenes. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 717–729

  56. [56]

    Xiaoyi Yang, Xueling Zhang, Tahmid Rafi, and Xiaoyin Wang. 2023. Aug- mented Reality Testing Bank. Microsoft Corp.. Available: https://github.com/ ARTBankManager/ARTBank, [Accessed: May 10, 2025]

  57. [57]

    Weicai Ye, Hai Li, Tianxiang Zhang, Xiaowei Zhou, Hujun Bao, and Guofeng Zhang. 2021. SuperPlane: 3D Plane Detection and Description from a Single Image. In2021 IEEE Virtual Reality and 3D User Interfaces (VR). 207–215. https: //doi.org/10.1109/VR50410.2021.00042

  58. [58]

    Shengcheng Yu, Chunrong Fang, Ziyuan Tuo, Quanjun Zhang, Chunyang Chen, Zhenyu Chen, and Zhendong Su. 2023. Vision-based mobile app gui testing: A survey.arXiv preprint arXiv:2310.13518(2023)

  59. [59]

    Nusrat Zahan, Thomas Zimmermann, Patrice Godefroid, Brendan Murphy, Chaiy- ong Ragkhitwetsagul, and Titus Barik. 2022. What are Weak Links in the npm Supply Chain?. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP 2022). ACM, 1–10

  60. [60]

    Wenjie Zhang, Bei Li, Zhengwei Qi, and Hao Zhong. 2022. VRTest: An Extensible Framework for Automatic Testing of Virtual Reality Scenes. InProceedings of the 44th International Conference on Software Engineering: Companion Proceedings (ICSE 2022). ACM, 158–162

  61. [61]

    Xueling Zhang, Rocky Slavin, Xiaoyin Wang, and Jianwei Niu. 2019. Privacy assurance for android augmented reality apps. In2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 114–1141

  62. [62]

    Xueling Zhang, Xiaoyin Wang, Rocky Slavin, Travis Breaux, and Jianwei Niu

  63. [63]

    InProceedings of the ACM/IEEE 42nd international conference on software engineering

    How does misconfiguration of analytic services compromise mobile pri- vacy?. InProceedings of the ACM/IEEE 42nd international conference on software engineering. 1572–1583

  64. [64]

    Xueling Zhang, Xiaoyin Wang, Rocky Slavin, and Jianwei Niu. 2021. Condysta: Context-aware dynamic supplement to static taint analysis. In2021 IEEE Sympo- sium on Security and Privacy (SP). IEEE, 796–812

  65. [65]

    Yan Zhao, Enyi Tang, Haipeng Cai, Xi Guo, Xiaoyin Wang, and Na Meng. 2022. A lightweight approach of human-like playtest for android apps. In2022 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, 309–320