pith. machine review for the scientific record. sign in

arxiv: 2604.03438 · v1 · submitted 2026-04-03 · 💻 cs.SE

Recognition: no theorem link

Android Instrumentation Testing in Continuous Integration: Practices, Patterns, and Performance

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:24 UTC · model grok-4.3

classification 💻 cs.SE
keywords Android instrumentation testingcontinuous integrationCI workflowsemulator setupGitHub Actionstest reliabilityopen source repositoriessoftware testing practices
0
0 comments X

The pith

Community-based emulator setups prove most reliable and efficient for running Android instrumentation tests in everyday CI, while custom scripts increase reruns and third-party labs raise costs for regressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how open-source Android projects integrate instrumentation tests into continuous integration by reviewing workflow files and build configs across 4,518 repositories that use CI. Only about 10.6 percent of these projects run such tests on devices or emulators, relying mainly on reusable community components or custom scripts for setup. These configurations rarely evolve, but when they do projects shift toward community components. Performance data from GitHub Actions shows community setups deliver the best combination of speed and reliability for routine checks on new code, third-party device labs handle scheduled regressions but at higher cost and failure rates, and custom scripting offers flexibility at the price of more reruns.

Core claim

By examining CI workflow files, scripts, and Gradle configurations in 4,518 repositories, we find that instrumentation tests run in CI in only 481 cases (10.6 percent), typically via community components or repository-specific custom scripts; these setups remain stable over time with a gradual shift toward reusable components; and performance metrics from GitHub Actions metadata indicate community-based setups are most reliable and efficient for daily checks, third-party labs suit regressions despite higher costs and failures, and custom scripting provides flexibility but correlates with more reruns.

What carries the argument

Classification of CI setup styles (community reusable components, custom scripts, third-party device labs) and their measured outcomes via GitHub Actions run-level and step-level metadata on success, duration, reruns, and queue delay.

If this is right

  • Projects gain reliability and lower rerun rates by adopting community-based emulator setups for routine CI runs.
  • Third-party device labs become practical only when full regression coverage outweighs their added cost and failure frequency.
  • Custom scripting remains useful when flexibility is required but demands extra effort to handle higher rerun rates.
  • Changes in CI setup are most often driven by the desire to expand test coverage rather than performance alone.
  • Setups tend to stabilize once chosen, with migration toward community components when evolution occurs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Open-source Android teams could reduce maintenance by contributing more reusable community components that standardize emulator setup.
  • The observed patterns may generalize to other mobile ecosystems where device testing is similarly fragile.
  • Teams could mix setup styles within one project, using community components for daily checks and third-party labs only for periodic full suites.
  • Future studies could track how project scale and test suite size influence the choice and success of each approach.

Load-bearing premise

The GitHub Actions metadata and single repository snapshot capture unbiased performance differences across setup styles without major effects from varying project sizes or test complexities.

What would settle it

Run identical instrumentation test suites on the same set of Android projects using each of the three setup styles in parallel CI pipelines and compare measured rerun rates, durations, and failure counts.

Figures

Figures reproduced from arXiv: 2604.03438 by Hamid Parsazadeh, Safwat Hassan, Taher A. Ghaleb.

Figure 1
Figure 1. Figure 1: Overview of the dataset construction and analysis pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dominant CI service provider by execution [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Episode-to-episode state machine of execution [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Time-sliced execution-style presence aggregated into [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Android instrumentation tests (end-to-end tests that run on a device or emulator) can catch problems that simpler tests miss. However, running these tests automatically in continuous integration (CI) is often difficult because emulator setup is fragile and configurations tend to drift over time. We study how open-source Android apps run instrumentation tests in CI by analyzing 4,518 repositories that use CI (snapshot: Aug. 10, 2025). We examine CI workflow files, scripts, and build configurations to identify cases where device setup is defined in Gradle (e.g., Gradle Managed Devices). Our results answer three questions about adoption, evolution, and outcomes. First, only about one in ten repositories (481/4,518; 10.6%) run instrumentation tests in CI, typically using either reusable community components or repository-specific custom scripts to set up emulators. Second, these setups usually stay the same over time; when changes happen, projects tend to move from custom scripts toward reusable community components. Third, we study why projects change their CI setup by analyzing their commits, pull requests, and issue messages. We evaluate how different setup styles perform using GitHub Actions run- and step-level metadata (e.g., outcomes, duration, reruns, and queue delay). We find that teams often change approaches to expand test coverage, and that each approach fits different needs: community-based setups are typically the most reliable and efficient for everyday checks on new code, third-party device labs suit scheduled regression testing but can be costlier and fail more often, and custom scripting provides flexibility but is associated with more reruns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical analysis of Android instrumentation testing practices in continuous integration across 4,518 GitHub repositories. It identifies low adoption rates (10.6%), common setup approaches including community-based components, custom scripts, and third-party device labs, examines their temporal evolution, and evaluates performance differences in terms of reliability, efficiency, reruns, and queue delays using GitHub Actions metadata. The central conclusion is that community-based setups offer the best reliability and efficiency for routine checks, third-party labs are suitable for regression testing despite higher costs and failure rates, and custom scripts provide flexibility at the cost of more reruns.

Significance. If the observed performance differences are not confounded by project characteristics, the study offers actionable insights for practitioners selecting CI configurations for Android instrumentation tests. The scale of the repository analysis and the focus on real-world GitHub Actions data contribute to its relevance in software engineering research on testing practices. The identification of evolution patterns from custom to community setups is particularly noteworthy.

major comments (2)
  1. [Results (performance evaluation)] The performance comparison (results section on outcomes, duration, reruns, and queue delay) attributes differences in reliability and reruns to setup style without stratification or multivariate controls for confounders such as repository size, commit volume, test-suite scale, or app complexity. Projects adopting custom scripts may systematically differ in these dimensions, so reported associations could reflect selection effects rather than causal properties of the setup approach.
  2. [Methodology] The methodology provides limited detail on repository selection criteria, inclusion/exclusion filters, and how the snapshot of 4,518 repositories was constructed to ensure representativeness. This weakens the generalizability of the 10.6% adoption rate and the evolution findings.
minor comments (2)
  1. [Abstract] The snapshot date 'Aug. 10, 2025' appears to be in the future; confirm whether this is a typographical error for 2024.
  2. [Abstract] Clarify in the abstract and results whether any statistical tests (e.g., chi-square or regression) were used to compare metrics across setup styles, or if comparisons are purely descriptive.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, acknowledging where the manuscript can be strengthened through additional analysis and detail.

read point-by-point responses
  1. Referee: [Results (performance evaluation)] The performance comparison (results section on outcomes, duration, reruns, and queue delay) attributes differences in reliability and reruns to setup style without stratification or multivariate controls for confounders such as repository size, commit volume, test-suite scale, or app complexity. Projects adopting custom scripts may systematically differ in these dimensions, so reported associations could reflect selection effects rather than causal properties of the setup approach.

    Authors: We agree that the performance analysis is observational and does not include multivariate controls, raising the possibility that differences partly reflect project characteristics rather than setup style. In the revision we will add stratification by repository size (stars and commit count) and a multivariate regression controlling for commit volume, test-suite scale (where measurable from build files), and app complexity proxies. We will also explicitly note the observational nature of the findings and avoid causal language. revision: yes

  2. Referee: [Methodology] The methodology provides limited detail on repository selection criteria, inclusion/exclusion filters, and how the snapshot of 4,518 repositories was constructed to ensure representativeness. This weakens the generalizability of the 10.6% adoption rate and the evolution findings.

    Authors: We acknowledge the need for greater transparency. The 4,518 repositories were obtained via GitHub API queries for projects with Android Gradle files and GitHub Actions workflows containing instrumentation-test steps, filtered to non-fork, non-archived repositories with at least 10 commits in the preceding year; the snapshot was taken on 10 August 2025. In the revision we will expand the Methodology section with the exact search criteria, inclusion/exclusion rules, and any checks performed for representativeness. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational repository analysis

full rationale

The paper conducts an empirical snapshot study of 4,518 GitHub repositories, classifying CI setups from workflow files and measuring outcomes via direct GitHub Actions metadata. No equations, derivations, fitted parameters, or predictions appear; claims about reliability, efficiency, and reruns are presented as observed associations from the data itself rather than reductions to prior inputs or self-citations. The analysis is self-contained against the external repository corpus with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on assumptions about data representativeness and the accuracy of CI metadata in reflecting real performance rather than formal axioms or free parameters.

axioms (1)
  • domain assumption The 4,518 repositories represent a valid sample of Android apps using CI.
    Based on a snapshot taken on Aug. 10, 2025 from open-source repositories.

pith-pipeline@v0.9.0 · 5596 in / 1233 out tokens · 21806 ms · 2026-05-13T18:24:20.110832+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

  1. [1]

    Bissyandé, and Jacques Klein

    Pingfan Kong, Li Li, Jun Gao, Kui Liu, Tegawendé F. Bissyandé, and Jacques Klein. Automated testing of Android apps: A systematic literature review.IEEE Transactions on Reliability, 68(1):45–66, 2019

  2. [2]

    Software testing and Android applications: A large-scale empirical study.Empirical Software Engineering, 27(2):31, 2022

    Fabiano Pecorelli, Gemma Catolino, Filomena Ferrucci, Andrea De Lu- cia, and Fabio Palomba. Software testing and Android applications: A large-scale empirical study.Empirical Software Engineering, 27(2):31, 2022

  3. [3]

    Tarek Mahmud, Meiru Che, Anne H. H. Ngu, and Guowei Yang. Why Android app testing falls short: Empirical insights from open-source projects and a practitioner survey.Empirical Software Engineering, 30(6):163, 2025

  4. [4]

    An empirical analysis of flaky tests

    Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. An empirical analysis of flaky tests. InProceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, pages 643–653, 2014

  5. [5]

    A study on the lifecycle of flaky tests

    Wing Lam, Kıvanç Mu¸ slu, Hitesh Sajnani, and Suresh Thummalapenta. A study on the lifecycle of flaky tests. InProceedings of the 42nd International Conference on Software Engineering, pages 1471–1482, 2020

  6. [6]

    Usage, costs, and benefits of continuous integration in open- source projects

    Michael Hilton, Timothy Tunnell, Kevin Huang, Darko Marinov, and Danny Dig. Usage, costs, and benefits of continuous integration in open- source projects. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pages 426–437, 2016

  7. [7]

    Devanbu, and Vladimir Filkov

    Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar T. Devanbu, and Vladimir Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. InProceedings of the 10th Joint Meeting on Foundations of Software Engineering, pages 805–816, 2015

  8. [8]

    Ghaleb, Osamah Abduljalil, and Safwat Hassan

    Taher A. Ghaleb, Osamah Abduljalil, and Safwat Hassan. CI/CD configuration practices in open-source Android apps: An empirical study. ACM Transactions on Software Engineering and Methodology, 2025

  9. [9]

    CI/CD pipelines evolution and restructuring

    Fiorella Zampetti, Simone Scalabrino, and Rocco Oliveto. CI/CD pipelines evolution and restructuring. InProceedings of the IEEE International Conference on Software Maintenance and Evolution, 2021

  10. [10]

    On the usage, co-usage and migration of CI/CD tools: A qualitative analysis.Empirical Software Engineering, 2023

    Parisa Reza Mazrae, Tom Mens, Mahshid Golzadeh, and Alexandre Decan. On the usage, co-usage and migration of CI/CD tools: A qualitative analysis.Empirical Software Engineering, 2023

  11. [11]

    Build instrumented tests, 2025

    Android Developers. Build instrumented tests, 2025. Accessed: 2025- 08-19

  12. [12]

    AndroidJUnitRunner | test your app on android,

    Android Developers. AndroidJUnitRunner | test your app on android,

  13. [13]

    Accessed: 2025-08-19

  14. [14]

    Test from the command line, 2024

    Android Developers. Test from the command line, 2024. Accessed: 2025-08-19

  15. [15]

    Run apps on the Android Emulator, 2024

    Android Developers. Run apps on the Android Emulator, 2024. Accessed: 2025-01-15

  16. [16]

    GitHub Action - Android Emulator Runner

    ReactiveCircus. GitHub Action - Android Emulator Runner. Accessed: 2025-10-16

  17. [17]

    action-android

    Malinskiy. action-android. Accessed: 2025-10-16

  18. [18]

    Scale your tests with build-managed devices, 2026

    Android Developers. Scale your tests with build-managed devices, 2026. Accessed: 2025-10-16

  19. [19]

    Firebase test lab, 2025

    Google. Firebase test lab, 2025. Accessed: 2025-10-16

  20. [20]

    Managing app testing device clouds: Issues and opportunities

    Mattia Fazzini and Alessandro Orso. Managing app testing device clouds: Issues and opportunities. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 1257–1259, 2020

  21. [21]

    Virtual device farms for mobile app testing at scale: A pursuit for fidelity, efficiency, and accessibility

    Hao Lin, Jiaxing Qiu, Hongyi Wang, Zhenhua Li, Liangyi Gong, Di Gao, Yunhao Liu, Feng Qian, Zhao Zhang, Ping Yang, and Tianyin Xu. Virtual device farms for mobile app testing at scale: A pursuit for fidelity, efficiency, and accessibility. InProceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

  22. [22]

    Workflow syntax for GitHub Actions, 2025

    GitHub Docs. Workflow syntax for GitHub Actions, 2025. Accessed: 2025-12-15

  23. [23]

    Using pre-written building blocks in your workflow, 2025

    GitHub Docs. Using pre-written building blocks in your workflow, 2025. Documents that workflow steps can use actions defined in other public repositories or container images, which can encapsulate environment setup outside the caller repository

  24. [24]

    Creating a composite action, 2025

    GitHub Docs. Creating a composite action, 2025. Accessed: 2025-12- 15

  25. [25]

    Reuse workflows, 2025

    GitHub Docs. Reuse workflows, 2025. Accessed: 2025-12-15

  26. [26]

    Self-hosted runners, 2025

    GitHub Docs. Self-hosted runners, 2025. Describes self-hosted runners and environment customization to run GitHub Actions jobs, which can make parts of environment setup external to repository artifacts

  27. [27]

    GitHub Actions is generally available

    GitHub. GitHub Actions is generally available. GitHub Changelog, November 2019

  28. [28]

    Why do GitHub actions workflows fail? an empirical study.ACM Transactions on Software Engineering and Methodology, 2025

    Lianyu Zheng, Shuang Li, Xi Huang, Jiangnan Huang, Bin Lin, Jinfu Chen, and Jifeng Xuan. Why do GitHub actions workflows fail? an empirical study.ACM Transactions on Software Engineering and Methodology, 2025

  29. [29]

    Resource usage and optimization opportunities in workflows of GitHub actions

    Islem Bouzenia and Michael Pradel. Resource usage and optimization opportunities in workflows of GitHub actions. InProceedings of the IEEE/ACM International Conference on Software Engineering, 2024

  30. [30]

    Ghaleb, and Lionel Briand

    Sakina Fatima, Taher A. Ghaleb, and Lionel Briand. Flakify: A black- box, language model-based predictor for flaky tests.IEEE Transactions on Software Engineering, 49(4):1912–1927, 2023

  31. [31]

    An empirical study of flaky tests in Android apps

    Swapna Thorve, Chandani Sreshtha, and Na Meng. An empirical study of flaky tests in Android apps. InProceedings of the IEEE International Conference on Software Maintenance and Evolution, 2018

  32. [32]

    An empirical analysis of UI-based flaky tests

    Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. An empirical analysis of UI-based flaky tests. InProceedings of the IEEE/ACM International Conference on Software Engineering, 2021

  33. [33]

    Test code flakiness in mobile apps: The developer’s perspective.Information and Software Technology, 168:107394, 2024

    Valeria Pontillo, Fabio Palomba, and Filomena Ferrucci. Test code flakiness in mobile apps: The developer’s perspective.Information and Software Technology, 168:107394, 2024

  34. [34]

    Ghaleb, Safwat Hassan, and Ying Zou

    Taher A. Ghaleb, Safwat Hassan, and Ying Zou. Studying the interplay between the durations and breakages of continuous integration builds. IEEE Transactions on Software Engineering, 49(4):2476–2497, 2023

  35. [35]

    Ghaleb, and Safwat Hassan

    Xiaoxin Zhou, Taher A. Ghaleb, and Safwat Hassan. Role of CI adoption in mobile app success: An empirical study of open-source Android projects. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR ’26), pages 1–12, New York, NY , USA,

  36. [36]

    Edward Abrokwah and Taher A. Ghaleb. An empirical study of com- plexity, heterogeneity, and compliance of GitHub Actions workflows. arXiv preprint arXiv:2507.18062, 2025

  37. [37]

    Can LLMs write CI? a study on automatic generation of github actions configurations

    Taher A Ghaleb and Dulina Rathnayake. Can LLMs write CI? a study on automatic generation of github actions configurations. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 767–772. IEEE, 2025

  38. [38]

    Md Nazmul Hossain and Taher A. Ghaleb. CIgrate: Automating CI service migration with large language models.arXiv preprint arXiv:2507.20402, 2025

  39. [39]

    Nitika Chopra and Taher A. Ghaleb. From first use to final commit: Studying the evolution of multi-CI service adoption. In2025 IEEE International Conference on Software Maintenance and Evolution (IC- SME), pages 773–778. IEEE, 2025

  40. [40]

    Ghaleb, and Safwat Hassan

    Marcus Emmanuel Barnes, Taher A. Ghaleb, and Safwat Hassan. LogSieve: Task-aware CI log reduction for sustainable LLM-based analysis. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR ’26), pages 1–12, New York, NY , USA,

  41. [41]

    Ghaleb, and Safwat Hassan

    Marcus Emmanuel Barnes, Taher A. Ghaleb, and Safwat Hassan. Task- aware reduction for scalable llm-database systems. In2025 IEEE International Conference on Collaborative Advances in Software and COmputiNg (CASCON), pages 631–635, 2025

  42. [42]

    Test automation in open-source Android apps: A large-scale empirical study

    Jun-Wei Lin, Navid Salehnamadi, and Sam Malek. Test automation in open-source Android apps: A large-scale empirical study. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 1078–1089, 2020

  43. [43]

    Tarek Mahmud, Meiru Che, Anne H. H. Ngu, and Guowei Yang. An empirical investigation on Android app testing practices. InProceedings of the 2024 IEEE 35th International Symposium on Software Reliability Engineering, pages 355–366, 2024

  44. [44]

    An empirical study of regression testing for Android apps in continuous integration environment

    Dingbang Wang, Yu Zhao, Lu Xiao, and Tingting Yu. An empirical study of regression testing for Android apps in continuous integration environment. InProceedings of the 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2023. ESEM 2023

  45. [45]

    Rest api endpoints for search, 2025

    GitHub Docs. Rest api endpoints for search, 2025. Accessed: 2025-12- 22

  46. [46]

    Default Test Commands

    Travis CI Documentation. Build an Android project, 2025. See “Default Test Commands”: for Gradle projects Travis CI runsgradle build connectedCheck(or./gradlew build connectedCheck whengradlewis present)

  47. [47]

    Why are commits being reverted? a comparative study of industrial and open source projects

    Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, David Pursehouse, and Naoyasu Ubayashi. Why are commits being reverted? a comparative study of industrial and open source projects. InProceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution, pages 301–310, 2016

  48. [48]

    Wil M. P. van der Aalst.Process Mining: Data Science in Action. Springer, 2 edition, 2016

  49. [49]

    Tracing your maintenance work—a cross-project validation of an automated classi- fication dictionary for commit messages

    Andreas Mauczka, Markus Huber, Christian Schanes, Wolfgang Schramm, Mario Bernhart, and Thomas Grechenig. Tracing your maintenance work—a cross-project validation of an automated classi- fication dictionary for commit messages. InInternational Conference on Fundamental Approaches to Software Engineering, pages 301–315. Springer, 2012

  50. [50]

    Term-weighting approaches in automatic text retrieval.Information Processing & Management, 24(5):513–523, 1988

    Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval.Information Processing & Management, 24(5):513–523, 1988

  51. [51]

    A statistical interpretation of term specificity and its application in retrieval.Journal of Documentation, 28(1):11–21, 1972

    Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval.Journal of Documentation, 28(1):11–21, 1972

  52. [52]

    Manning, Prabhakar Raghavan, and Hinrich Schütze

    Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK, 2008

  53. [53]

    A review of keyphrase extraction, 2019

    Eirini Papagiannopoulou and Grigorios Tsoumakas. A review of keyphrase extraction, 2019. arXiv:1905.05044 [cs.IR]

  54. [54]

    Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50:157–175, 1900

  55. [55]

    Wiley, 3 edition, 2013

    Alan Agresti.Categorical Data Analysis. Wiley, 3 edition, 2013

  56. [56]

    Lawrence Erlbaum Associates, 2 edition, 1988

    Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2 edition, 1988

  57. [57]

    Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

    Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

  58. [58]

    Mann and Donald R

    Henry B. Mann and Donald R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18(1):50–60, 1947

  59. [59]

    Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological Bulletin, 114(3):494–509, 1993

    Norman Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological Bulletin, 114(3):494–509, 1993

  60. [60]

    Kromrey, Jesse Coraggio, Jeff Skowronek, and Lindsey Devine

    Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, Jeff Skowronek, and Lindsey Devine. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and cohen’s d indices the most appropriate choices? InProceedings of the Annual Meeting of the Southern Association for Institutional Research, 2006. Commonly cited thresho...

  61. [61]

    SciPy 1.0: Fundamental algorithms for scientific computing in python.Nature Methods, 17(3):261–272, 2020

    Pauli Virtanen et al. SciPy 1.0: Fundamental algorithms for scientific computing in python.Nature Methods, 17(3):261–272, 2020