pith. sign in

arxiv: 1907.11073 · v2 · pith:M2PEY5LSnew · submitted 2019-07-25 · 💻 cs.SE · cs.CY· physics.soc-ph

An Empirical Analysis of the Python Package Index (PyPI)

classification 💻 cs.SE cs.CYphysics.soc-ph
keywords packagepackagespypipythonreleasesfindsoftwareauthors
0
0 comments X p. Extension
pith:M2PEY5LS Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{M2PEY5LS}

Prints a linked pith:M2PEY5LS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In this research, we provide a comprehensive empirical summary of the Python Package Repository, PyPI, including both package metadata and source code covering 178,592 packages, 1,745,744 releases, 76,997 contributors, and 156,816,750 import statements. We provide counts and trends for packages, releases, dependencies, category classifications, licenses, and package imports, as well as authors, maintainers, and organizations. As one of the largest and oldest software repositories as of publication, PyPI provides insight not just into the Python ecosystem today, but also trends in software development and licensing more broadly over time. Within PyPI, we find that the growth of the repository has been robust under all measures, with a compound annual growth rate of 47% for active packages, 39% for new authors, and 61% for new import statements over the last 15 years. As with many similar social systems, we find a number of highly right-skewed distributions, including the distribution of releases per package, packages and releases per author, imports per package, and size per package and release. However, we also find that most packages are contributed by single individuals, not multiple individuals or organizations. The data, methods, and calculations herein provide an anchor for public discourse on PyPI and serve as a foundation for future research on the Python software ecosystem.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

    cs.SE 2025-06 conditional novelty 8.0

    First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

  2. Analyzing the Availability of E-Mail Addresses for PyPI Libraries

    cs.SE 2026-01 unverdicted novelty 3.0

    79.1% of PyPI libraries provide at least one valid email address, primarily from PyPI metadata, with high coverage extending to dependency chains.