arxiv: 2604.03438 · v1 · submitted 2026-04-03 · 💻 cs.SE

Recognition: no theorem link

Android Instrumentation Testing in Continuous Integration: Practices, Patterns, and Performance

Hamid Parsazadeh , Taher A. Ghaleb , Safwat Hassan

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:24 UTC · model grok-4.3

classification 💻 cs.SE

keywords Android instrumentation testingcontinuous integrationCI workflowsemulator setupGitHub Actionstest reliabilityopen source repositoriessoftware testing practices

0 comments

The pith

Community-based emulator setups prove most reliable and efficient for running Android instrumentation tests in everyday CI, while custom scripts increase reruns and third-party labs raise costs for regressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how open-source Android projects integrate instrumentation tests into continuous integration by reviewing workflow files and build configs across 4,518 repositories that use CI. Only about 10.6 percent of these projects run such tests on devices or emulators, relying mainly on reusable community components or custom scripts for setup. These configurations rarely evolve, but when they do projects shift toward community components. Performance data from GitHub Actions shows community setups deliver the best combination of speed and reliability for routine checks on new code, third-party device labs handle scheduled regressions but at higher cost and failure rates, and custom scripting offers flexibility at the price of more reruns.

Core claim

By examining CI workflow files, scripts, and Gradle configurations in 4,518 repositories, we find that instrumentation tests run in CI in only 481 cases (10.6 percent), typically via community components or repository-specific custom scripts; these setups remain stable over time with a gradual shift toward reusable components; and performance metrics from GitHub Actions metadata indicate community-based setups are most reliable and efficient for daily checks, third-party labs suit regressions despite higher costs and failures, and custom scripting provides flexibility but correlates with more reruns.

What carries the argument

Classification of CI setup styles (community reusable components, custom scripts, third-party device labs) and their measured outcomes via GitHub Actions run-level and step-level metadata on success, duration, reruns, and queue delay.

If this is right

Projects gain reliability and lower rerun rates by adopting community-based emulator setups for routine CI runs.
Third-party device labs become practical only when full regression coverage outweighs their added cost and failure frequency.
Custom scripting remains useful when flexibility is required but demands extra effort to handle higher rerun rates.
Changes in CI setup are most often driven by the desire to expand test coverage rather than performance alone.
Setups tend to stabilize once chosen, with migration toward community components when evolution occurs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Open-source Android teams could reduce maintenance by contributing more reusable community components that standardize emulator setup.
The observed patterns may generalize to other mobile ecosystems where device testing is similarly fragile.
Teams could mix setup styles within one project, using community components for daily checks and third-party labs only for periodic full suites.
Future studies could track how project scale and test suite size influence the choice and success of each approach.

Load-bearing premise

The GitHub Actions metadata and single repository snapshot capture unbiased performance differences across setup styles without major effects from varying project sizes or test complexities.

What would settle it

Run identical instrumentation test suites on the same set of Android projects using each of the three setup styles in parallel CI pipelines and compare measured rerun rates, durations, and failure counts.

Figures

Figures reproduced from arXiv: 2604.03438 by Hamid Parsazadeh, Safwat Hassan, Taher A. Ghaleb.

**Figure 2.** Figure 2: Dominant CI service provider by execution [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Episode-to-episode state machine of execution [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Time-sliced execution-style presence aggregated into [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Android instrumentation tests (end-to-end tests that run on a device or emulator) can catch problems that simpler tests miss. However, running these tests automatically in continuous integration (CI) is often difficult because emulator setup is fragile and configurations tend to drift over time. We study how open-source Android apps run instrumentation tests in CI by analyzing 4,518 repositories that use CI (snapshot: Aug. 10, 2025). We examine CI workflow files, scripts, and build configurations to identify cases where device setup is defined in Gradle (e.g., Gradle Managed Devices). Our results answer three questions about adoption, evolution, and outcomes. First, only about one in ten repositories (481/4,518; 10.6%) run instrumentation tests in CI, typically using either reusable community components or repository-specific custom scripts to set up emulators. Second, these setups usually stay the same over time; when changes happen, projects tend to move from custom scripts toward reusable community components. Third, we study why projects change their CI setup by analyzing their commits, pull requests, and issue messages. We evaluate how different setup styles perform using GitHub Actions run- and step-level metadata (e.g., outcomes, duration, reruns, and queue delay). We find that teams often change approaches to expand test coverage, and that each approach fits different needs: community-based setups are typically the most reliable and efficient for everyday checks on new code, third-party device labs suit scheduled regression testing but can be costlier and fail more often, and custom scripting provides flexibility but is associated with more reruns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a useful large-scale view of low adoption and setup shifts for Android instrumentation tests in CI, but the performance comparisons look vulnerable to confounding by project differences.

read the letter

Hi, the main thing here is that only 10.6% of the 4,518 Android repos they sampled run instrumentation tests in CI, those setups tend to stabilize or move toward community components, and the three styles (community, third-party labs, custom scripts) show different reliability and cost patterns in the GitHub Actions data. That adoption number and the evolution tracking are the clearest new pieces. The paper does a solid job mining public workflow files, Gradle configs, commit histories, and run metadata to classify setups and measure outcomes like reruns and queue delay. Pulling reasons for changes from PRs and issues adds some context on why teams switch, and the practical mapping of styles to use cases (daily checks versus regression) is straightforward to apply. The data collection from real repositories is a strength and looks reproducible. The soft spot is exactly the one the stress-test flags: the performance differences are presented as tied to setup style, yet the analysis does not appear to control for project size, test-suite scale, or app complexity. Projects that choose custom scripts may simply be the more complex ones, so the higher rerun rate could be selection rather than a property of the scripting approach. The abstract is thin on selection criteria and bias checks, so the full methods section needs to show they handled that. This is for Android developers and CI researchers who want empirical numbers on test practices rather than vendor advice. It has enough data grounding to deserve peer review, though it will need revisions to strengthen the causal claims around performance.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical analysis of Android instrumentation testing practices in continuous integration across 4,518 GitHub repositories. It identifies low adoption rates (10.6%), common setup approaches including community-based components, custom scripts, and third-party device labs, examines their temporal evolution, and evaluates performance differences in terms of reliability, efficiency, reruns, and queue delays using GitHub Actions metadata. The central conclusion is that community-based setups offer the best reliability and efficiency for routine checks, third-party labs are suitable for regression testing despite higher costs and failure rates, and custom scripts provide flexibility at the cost of more reruns.

Significance. If the observed performance differences are not confounded by project characteristics, the study offers actionable insights for practitioners selecting CI configurations for Android instrumentation tests. The scale of the repository analysis and the focus on real-world GitHub Actions data contribute to its relevance in software engineering research on testing practices. The identification of evolution patterns from custom to community setups is particularly noteworthy.

major comments (2)

[Results (performance evaluation)] The performance comparison (results section on outcomes, duration, reruns, and queue delay) attributes differences in reliability and reruns to setup style without stratification or multivariate controls for confounders such as repository size, commit volume, test-suite scale, or app complexity. Projects adopting custom scripts may systematically differ in these dimensions, so reported associations could reflect selection effects rather than causal properties of the setup approach.
[Methodology] The methodology provides limited detail on repository selection criteria, inclusion/exclusion filters, and how the snapshot of 4,518 repositories was constructed to ensure representativeness. This weakens the generalizability of the 10.6% adoption rate and the evolution findings.

minor comments (2)

[Abstract] The snapshot date 'Aug. 10, 2025' appears to be in the future; confirm whether this is a typographical error for 2024.
[Abstract] Clarify in the abstract and results whether any statistical tests (e.g., chi-square or regression) were used to compare metrics across setup styles, or if comparisons are purely descriptive.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, acknowledging where the manuscript can be strengthened through additional analysis and detail.

read point-by-point responses

Referee: [Results (performance evaluation)] The performance comparison (results section on outcomes, duration, reruns, and queue delay) attributes differences in reliability and reruns to setup style without stratification or multivariate controls for confounders such as repository size, commit volume, test-suite scale, or app complexity. Projects adopting custom scripts may systematically differ in these dimensions, so reported associations could reflect selection effects rather than causal properties of the setup approach.

Authors: We agree that the performance analysis is observational and does not include multivariate controls, raising the possibility that differences partly reflect project characteristics rather than setup style. In the revision we will add stratification by repository size (stars and commit count) and a multivariate regression controlling for commit volume, test-suite scale (where measurable from build files), and app complexity proxies. We will also explicitly note the observational nature of the findings and avoid causal language. revision: yes
Referee: [Methodology] The methodology provides limited detail on repository selection criteria, inclusion/exclusion filters, and how the snapshot of 4,518 repositories was constructed to ensure representativeness. This weakens the generalizability of the 10.6% adoption rate and the evolution findings.

Authors: We acknowledge the need for greater transparency. The 4,518 repositories were obtained via GitHub API queries for projects with Android Gradle files and GitHub Actions workflows containing instrumentation-test steps, filtered to non-fork, non-archived repositories with at least 10 commits in the preceding year; the snapshot was taken on 10 August 2025. In the revision we will expand the Methodology section with the exact search criteria, inclusion/exclusion rules, and any checks performed for representativeness. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational repository analysis

full rationale

The paper conducts an empirical snapshot study of 4,518 GitHub repositories, classifying CI setups from workflow files and measuring outcomes via direct GitHub Actions metadata. No equations, derivations, fitted parameters, or predictions appear; claims about reliability, efficiency, and reruns are presented as observed associations from the data itself rather than reductions to prior inputs or self-citations. The analysis is self-contained against the external repository corpus with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on assumptions about data representativeness and the accuracy of CI metadata in reflecting real performance rather than formal axioms or free parameters.

axioms (1)

domain assumption The 4,518 repositories represent a valid sample of Android apps using CI.
Based on a snapshot taken on Aug. 10, 2025 from open-source repositories.

pith-pipeline@v0.9.0 · 5596 in / 1233 out tokens · 21806 ms · 2026-05-13T18:24:20.110832+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Bissyandé, and Jacques Klein

Pingfan Kong, Li Li, Jun Gao, Kui Liu, Tegawendé F. Bissyandé, and Jacques Klein. Automated testing of Android apps: A systematic literature review.IEEE Transactions on Reliability, 68(1):45–66, 2019

work page 2019
[2]

Software testing and Android applications: A large-scale empirical study.Empirical Software Engineering, 27(2):31, 2022

Fabiano Pecorelli, Gemma Catolino, Filomena Ferrucci, Andrea De Lu- cia, and Fabio Palomba. Software testing and Android applications: A large-scale empirical study.Empirical Software Engineering, 27(2):31, 2022

work page 2022
[3]

Tarek Mahmud, Meiru Che, Anne H. H. Ngu, and Guowei Yang. Why Android app testing falls short: Empirical insights from open-source projects and a practitioner survey.Empirical Software Engineering, 30(6):163, 2025

work page 2025
[4]

An empirical analysis of flaky tests

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. An empirical analysis of flaky tests. InProceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, pages 643–653, 2014

work page 2014
[5]

A study on the lifecycle of flaky tests

Wing Lam, Kıvanç Mu¸ slu, Hitesh Sajnani, and Suresh Thummalapenta. A study on the lifecycle of flaky tests. InProceedings of the 42nd International Conference on Software Engineering, pages 1471–1482, 2020

work page 2020
[6]

Usage, costs, and benefits of continuous integration in open- source projects

Michael Hilton, Timothy Tunnell, Kevin Huang, Darko Marinov, and Danny Dig. Usage, costs, and benefits of continuous integration in open- source projects. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pages 426–437, 2016

work page 2016
[7]

Devanbu, and Vladimir Filkov

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar T. Devanbu, and Vladimir Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. InProceedings of the 10th Joint Meeting on Foundations of Software Engineering, pages 805–816, 2015

work page 2015
[8]

Ghaleb, Osamah Abduljalil, and Safwat Hassan

Taher A. Ghaleb, Osamah Abduljalil, and Safwat Hassan. CI/CD configuration practices in open-source Android apps: An empirical study. ACM Transactions on Software Engineering and Methodology, 2025

work page 2025
[9]

CI/CD pipelines evolution and restructuring

Fiorella Zampetti, Simone Scalabrino, and Rocco Oliveto. CI/CD pipelines evolution and restructuring. InProceedings of the IEEE International Conference on Software Maintenance and Evolution, 2021

work page 2021
[10]

On the usage, co-usage and migration of CI/CD tools: A qualitative analysis.Empirical Software Engineering, 2023

Parisa Reza Mazrae, Tom Mens, Mahshid Golzadeh, and Alexandre Decan. On the usage, co-usage and migration of CI/CD tools: A qualitative analysis.Empirical Software Engineering, 2023

work page 2023
[11]

Build instrumented tests, 2025

Android Developers. Build instrumented tests, 2025. Accessed: 2025- 08-19

work page 2025
[12]

AndroidJUnitRunner | test your app on android,

Android Developers. AndroidJUnitRunner | test your app on android,

work page
[13]

Accessed: 2025-08-19

work page 2025
[14]

Test from the command line, 2024

Android Developers. Test from the command line, 2024. Accessed: 2025-08-19

work page 2024
[15]

Run apps on the Android Emulator, 2024

Android Developers. Run apps on the Android Emulator, 2024. Accessed: 2025-01-15

work page 2024
[16]

GitHub Action - Android Emulator Runner

ReactiveCircus. GitHub Action - Android Emulator Runner. Accessed: 2025-10-16

work page 2025
[17]

action-android

Malinskiy. action-android. Accessed: 2025-10-16

work page 2025
[18]

Scale your tests with build-managed devices, 2026

Android Developers. Scale your tests with build-managed devices, 2026. Accessed: 2025-10-16

work page 2026
[19]

Firebase test lab, 2025

Google. Firebase test lab, 2025. Accessed: 2025-10-16

work page 2025
[20]

Managing app testing device clouds: Issues and opportunities

Mattia Fazzini and Alessandro Orso. Managing app testing device clouds: Issues and opportunities. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 1257–1259, 2020

work page 2020
[21]

Virtual device farms for mobile app testing at scale: A pursuit for fidelity, efficiency, and accessibility

Hao Lin, Jiaxing Qiu, Hongyi Wang, Zhenhua Li, Liangyi Gong, Di Gao, Yunhao Liu, Feng Qian, Zhao Zhang, Ping Yang, and Tianyin Xu. Virtual device farms for mobile app testing at scale: A pursuit for fidelity, efficiency, and accessibility. InProceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

work page 2023
[22]

Workflow syntax for GitHub Actions, 2025

GitHub Docs. Workflow syntax for GitHub Actions, 2025. Accessed: 2025-12-15

work page 2025
[23]

Using pre-written building blocks in your workflow, 2025

GitHub Docs. Using pre-written building blocks in your workflow, 2025. Documents that workflow steps can use actions defined in other public repositories or container images, which can encapsulate environment setup outside the caller repository

work page 2025
[24]

Creating a composite action, 2025

GitHub Docs. Creating a composite action, 2025. Accessed: 2025-12- 15

work page 2025
[25]

Reuse workflows, 2025

GitHub Docs. Reuse workflows, 2025. Accessed: 2025-12-15

work page 2025
[26]

Self-hosted runners, 2025

GitHub Docs. Self-hosted runners, 2025. Describes self-hosted runners and environment customization to run GitHub Actions jobs, which can make parts of environment setup external to repository artifacts

work page 2025
[27]

GitHub Actions is generally available

GitHub. GitHub Actions is generally available. GitHub Changelog, November 2019

work page 2019
[28]

Why do GitHub actions workflows fail? an empirical study.ACM Transactions on Software Engineering and Methodology, 2025

Lianyu Zheng, Shuang Li, Xi Huang, Jiangnan Huang, Bin Lin, Jinfu Chen, and Jifeng Xuan. Why do GitHub actions workflows fail? an empirical study.ACM Transactions on Software Engineering and Methodology, 2025

work page 2025
[29]

Resource usage and optimization opportunities in workflows of GitHub actions

Islem Bouzenia and Michael Pradel. Resource usage and optimization opportunities in workflows of GitHub actions. InProceedings of the IEEE/ACM International Conference on Software Engineering, 2024

work page 2024
[30]

Ghaleb, and Lionel Briand

Sakina Fatima, Taher A. Ghaleb, and Lionel Briand. Flakify: A black- box, language model-based predictor for flaky tests.IEEE Transactions on Software Engineering, 49(4):1912–1927, 2023

work page 1912
[31]

An empirical study of flaky tests in Android apps

Swapna Thorve, Chandani Sreshtha, and Na Meng. An empirical study of flaky tests in Android apps. InProceedings of the IEEE International Conference on Software Maintenance and Evolution, 2018

work page 2018
[32]

An empirical analysis of UI-based flaky tests

Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. An empirical analysis of UI-based flaky tests. InProceedings of the IEEE/ACM International Conference on Software Engineering, 2021

work page 2021
[33]

Test code flakiness in mobile apps: The developer’s perspective.Information and Software Technology, 168:107394, 2024

Valeria Pontillo, Fabio Palomba, and Filomena Ferrucci. Test code flakiness in mobile apps: The developer’s perspective.Information and Software Technology, 168:107394, 2024

work page 2024
[34]

Ghaleb, Safwat Hassan, and Ying Zou

Taher A. Ghaleb, Safwat Hassan, and Ying Zou. Studying the interplay between the durations and breakages of continuous integration builds. IEEE Transactions on Software Engineering, 49(4):2476–2497, 2023

work page 2023
[35]

Ghaleb, and Safwat Hassan

Xiaoxin Zhou, Taher A. Ghaleb, and Safwat Hassan. Role of CI adoption in mobile app success: An empirical study of open-source Android projects. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR ’26), pages 1–12, New York, NY , USA,

work page
[36]

Edward Abrokwah and Taher A. Ghaleb. An empirical study of com- plexity, heterogeneity, and compliance of GitHub Actions workflows. arXiv preprint arXiv:2507.18062, 2025

work page arXiv 2025
[37]

Can LLMs write CI? a study on automatic generation of github actions configurations

Taher A Ghaleb and Dulina Rathnayake. Can LLMs write CI? a study on automatic generation of github actions configurations. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 767–772. IEEE, 2025

work page 2025
[38]

Md Nazmul Hossain and Taher A. Ghaleb. CIgrate: Automating CI service migration with large language models.arXiv preprint arXiv:2507.20402, 2025

work page arXiv 2025
[39]

Nitika Chopra and Taher A. Ghaleb. From first use to final commit: Studying the evolution of multi-CI service adoption. In2025 IEEE International Conference on Software Maintenance and Evolution (IC- SME), pages 773–778. IEEE, 2025

work page 2025
[40]

Ghaleb, and Safwat Hassan

Marcus Emmanuel Barnes, Taher A. Ghaleb, and Safwat Hassan. LogSieve: Task-aware CI log reduction for sustainable LLM-based analysis. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR ’26), pages 1–12, New York, NY , USA,

work page
[41]

Ghaleb, and Safwat Hassan

Marcus Emmanuel Barnes, Taher A. Ghaleb, and Safwat Hassan. Task- aware reduction for scalable llm-database systems. In2025 IEEE International Conference on Collaborative Advances in Software and COmputiNg (CASCON), pages 631–635, 2025

work page 2025
[42]

Test automation in open-source Android apps: A large-scale empirical study

Jun-Wei Lin, Navid Salehnamadi, and Sam Malek. Test automation in open-source Android apps: A large-scale empirical study. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 1078–1089, 2020

work page 2020
[43]

Tarek Mahmud, Meiru Che, Anne H. H. Ngu, and Guowei Yang. An empirical investigation on Android app testing practices. InProceedings of the 2024 IEEE 35th International Symposium on Software Reliability Engineering, pages 355–366, 2024

work page 2024
[44]

An empirical study of regression testing for Android apps in continuous integration environment

Dingbang Wang, Yu Zhao, Lu Xiao, and Tingting Yu. An empirical study of regression testing for Android apps in continuous integration environment. InProceedings of the 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2023. ESEM 2023

work page 2023
[45]

Rest api endpoints for search, 2025

GitHub Docs. Rest api endpoints for search, 2025. Accessed: 2025-12- 22

work page 2025
[46]

Default Test Commands

Travis CI Documentation. Build an Android project, 2025. See “Default Test Commands”: for Gradle projects Travis CI runsgradle build connectedCheck(or./gradlew build connectedCheck whengradlewis present)

work page 2025
[47]

Why are commits being reverted? a comparative study of industrial and open source projects

Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, David Pursehouse, and Naoyasu Ubayashi. Why are commits being reverted? a comparative study of industrial and open source projects. InProceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution, pages 301–310, 2016

work page 2016
[48]

Wil M. P. van der Aalst.Process Mining: Data Science in Action. Springer, 2 edition, 2016

work page 2016
[49]

Tracing your maintenance work—a cross-project validation of an automated classi- fication dictionary for commit messages

Andreas Mauczka, Markus Huber, Christian Schanes, Wolfgang Schramm, Mario Bernhart, and Thomas Grechenig. Tracing your maintenance work—a cross-project validation of an automated classi- fication dictionary for commit messages. InInternational Conference on Fundamental Approaches to Software Engineering, pages 301–315. Springer, 2012

work page 2012
[50]

Term-weighting approaches in automatic text retrieval.Information Processing & Management, 24(5):513–523, 1988

Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval.Information Processing & Management, 24(5):513–523, 1988

work page 1988
[51]

A statistical interpretation of term specificity and its application in retrieval.Journal of Documentation, 28(1):11–21, 1972

Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval.Journal of Documentation, 28(1):11–21, 1972

work page 1972
[52]

Manning, Prabhakar Raghavan, and Hinrich Schütze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK, 2008

work page 2008
[53]

A review of keyphrase extraction, 2019

Eirini Papagiannopoulou and Grigorios Tsoumakas. A review of keyphrase extraction, 2019. arXiv:1905.05044 [cs.IR]

work page arXiv 2019
[54]

Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50:157–175, 1900

work page 1900
[55]

Wiley, 3 edition, 2013

Alan Agresti.Categorical Data Analysis. Wiley, 3 edition, 2013

work page 2013
[56]

Lawrence Erlbaum Associates, 2 edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2 edition, 1988

work page 1988
[57]

Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

work page 1995
[58]

Mann and Donald R

Henry B. Mann and Donald R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18(1):50–60, 1947

work page 1947
[59]

Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological Bulletin, 114(3):494–509, 1993

Norman Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological Bulletin, 114(3):494–509, 1993

work page 1993
[60]

Kromrey, Jesse Coraggio, Jeff Skowronek, and Lindsey Devine

Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, Jeff Skowronek, and Lindsey Devine. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and cohen’s d indices the most appropriate choices? InProceedings of the Annual Meeting of the Southern Association for Institutional Research, 2006. Commonly cited thresho...

work page 2006
[61]

SciPy 1.0: Fundamental algorithms for scientific computing in python.Nature Methods, 17(3):261–272, 2020

Pauli Virtanen et al. SciPy 1.0: Fundamental algorithms for scientific computing in python.Nature Methods, 17(3):261–272, 2020

work page 2020