pith. sign in

arxiv: 2604.17668 · v1 · submitted 2026-04-19 · 💻 cs.CR

Original Sin of npm: A Study on Vulnerability Propagation in JavaScript Dependency Networks

Pith reviewed 2026-05-10 05:10 UTC · model grok-4.3

classification 💻 cs.CR
keywords npmvulnerability propagationJavaScript packagesdependency networkssoftware securitytransitive dependenciespackage vulnerabilitiesopen source security
0
0 comments X

The pith

21.6 percent of npm packages carry at least one known vulnerability through their dependency networks, most of them high severity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how vulnerabilities spread across JavaScript packages on npm by examining dependency relationships among more than one million entries. It shows that 21.6 percent of packages become vulnerable through direct or transitive links, with 42 percent of those cases rated high severity. Fixes for vulnerable packages take nearly five years on average after the first bad version is released. A small set of vulnerabilities drives much of the exposure, as the top 23 account for half of all instances. The results point to the value of targeting root causes in a few packages to reduce widespread risk.

Core claim

By tracing dependency networks across 1,077,946 JavaScript packages, the study establishes that 232,836 packages, or 21.60 percent, have at least one known vulnerability in their networks, with 42 percent of those rated high severity. The average interval from publication of the first vulnerable version to a fix is 4 years and 11 months, while vulnerability reports appear about 19 days after fixes become available. A small number of vulnerabilities are highly concentrated, with the top 7 covering 25 percent and the top 23 covering 50 percent of cases.

What carries the argument

Dependency networks that connect packages through direct and transitive links, allowing measurement of how vulnerabilities from a few sources reach many others.

If this is right

  • Remediation efforts focused on the small set of high-frequency vulnerabilities could address half of all observed cases.
  • Shortening the average five-year window between vulnerable release and fix would reduce overall exposure time across the network.
  • Developers can lower risk by tracking and updating packages that appear frequently in many dependency chains.
  • Package managers could add alerts that flag transitive dependencies tied to the concentrated top vulnerabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same concentration pattern could be checked in other ecosystems such as Python or Java to see whether a few root packages drive most risk there too.
  • The 19-day gap between fixes and reports suggests testing whether faster coordinated disclosure would shrink exposure windows.
  • Surveying actual deployed applications and comparing their real vulnerability counts to the network-wide 21.6 percent figure would test how well the model matches production use.

Load-bearing premise

The assembled vulnerability data and dependency connections correctly capture all relevant known issues and reflect actual usage patterns in practice.

What would settle it

Repeating the full analysis with an alternative vulnerability database or dependency mapping tool and obtaining markedly different shares of affected packages would show the reported percentages do not hold.

Figures

Figures reproduced from arXiv: 2604.17668 by Hyoungshick Kim, Michael Robinson, Muhammad Ejaz Ahmed, Muhammad Ikram, Sajal Halder, Seyit Camtepe.

Figure 1
Figure 1. Figure 1: Example dependency graph for parent package [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline describing our methodology for the cre [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accumulation of the number of published vulner [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Heatmap of Base Severity and Exploitability Score. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trend of various (LOW, MEDIUM, HIGH and CRIT [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Kaplan-Meier survival distribution for event “vul [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of vulnerabilities by severity at the [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Survival probability for event “vulnerability is fixed” [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distribution of CVE vulnerabilities [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: An example of a package dependency graph. [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of total and average affected packages [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Distribution of top 23 CVE vulnerabilities fre [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Total number and average distribution of packages across dependency levels for top-5, top-10, and top-20 CVEs based [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
read the original abstract

Understanding vulnerability propagation is essential for assessing how vulnerabilities spread across components of a software package. This supports more accurate impact analysis and enhances threat detection and mitigation. In this paper, we investigate how a small number of vulnerable JavaScript packages contribute to the creation of a disproportionately large number of vulnerable packages. This paper presents insights from 1,515 reported vulnerabilities gathered from a custom-built vulnerability database containing 1,077,946 JavaScript packages sourced from `npm-follower' and their associated dependency networks. Dependency networks were constructed using the deps.dev API, with vulnerabilities identified by parsing package names and version numbers through the Google Open Source Vulnerability API. Our findings reveal that 61.30% (660,748) of packages are reliant on one or more dependency packages, and 21.60% (232,836) of total packages have at least one known vulnerability throughout their dependency networks -- of which most (42%) are of High severity. We also found that it takes, on average, approximately 4 years and 11 months to fix a vulnerable package from when the first vulnerable version is published on npm -- although publication times of vulnerabilities occur approximately 19 days after a fix is available. Finally, we observe a high concentration of frequently present vulnerabilities throughout dependency networks, with the top-7 most frequent vulnerabilities accounting for 25% of vulnerability cases and the top-23 most frequent accounting for 50%. Based on these findings, we propose recommendations for developers and package managers to mitigate the threat and occurrence of vulnerabilities within the npm dependency network and the broader software repository community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper analyzes vulnerability propagation in the npm JavaScript ecosystem using a custom database of 1,077,946 packages from npm-follower, dependency graphs from deps.dev, and vulnerability data from the Google OSV API on 1,515 reported vulnerabilities. It claims that 61.30% of packages rely on one or more dependencies, 21.60% (232,836) have at least one known vulnerability in their networks (42% high severity), the average time to fix a vulnerable package is 4 years and 11 months after the first vulnerable version is published, vulnerabilities are concentrated (top-7 account for 25%, top-23 for 50%), and offers mitigation recommendations.

Significance. If the data pipeline is accurate, the work provides large-scale empirical evidence on transitive vulnerability exposure and remediation delays in npm, which could inform package manager policies and developer practices. The scale of the dataset and focus on concentration effects are strengths.

major comments (3)
  1. [Methods (data collection and vulnerability identification)] Methods section on data collection and vulnerability identification: the pipeline for matching package names/versions to OSV entries (including transitive dependencies) is described at a high level but provides no validation of matching accuracy, version-range resolution, false-positive rates, or sample-based precision checks. This directly underpins the central 21.60% statistic and severity distributions.
  2. [Results (time-to-fix analysis)] Results on time-to-fix: the reported average of 4 years and 11 months (and the 19-day offset for vulnerability publication) depends on precise dating of first vulnerable version vs. fixing version, yet no details are given on date extraction, handling of version ordering, or assumptions about when a fix becomes available. This affects the reliability of the remediation timeline claim.
  3. [Dependency network construction] Dependency network construction: while deps.dev is used to build graphs, there is no assessment of graph completeness, potential missing transitive edges, or how version ranges are resolved when counting affected packages. Any systematic incompleteness would alter the 232,836 count and concentration findings.
minor comments (2)
  1. [Abstract] Abstract and results: ensure consistent reporting of total unique vulnerabilities vs. the 1,515 reported ones when discussing frequency distributions.
  2. [Results (vulnerability concentration)] Results on concentration: a table listing the top-7 and top-23 vulnerabilities with their frequencies would improve transparency and allow readers to assess the claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments identify important areas where additional methodological transparency and validation would strengthen the manuscript. We address each major comment below and commit to revisions that improve rigor without altering the core findings.

read point-by-point responses
  1. Referee: Methods section on data collection and vulnerability identification: the pipeline for matching package names/versions to OSV entries (including transitive dependencies) is described at a high level but provides no validation of matching accuracy, version-range resolution, false-positive rates, or sample-based precision checks. This directly underpins the central 21.60% statistic and severity distributions.

    Authors: We agree that the current description is high-level and that explicit validation would increase confidence in the 21.60% figure. In the revised manuscript we will expand the Methods section with: (i) the exact procedure for querying the OSV API using package name and resolved version, (ii) how version ranges are interpreted according to OSV's affected range syntax, and (iii) results of a manual precision audit on a random sample of 200 packages (reporting match accuracy and estimated false-positive rate). These additions will directly support the reported statistics. revision: yes

  2. Referee: Results on time-to-fix: the reported average of 4 years and 11 months (and the 19-day offset for vulnerability publication) depends on precise dating of first vulnerable version vs. fixing version, yet no details are given on date extraction, handling of version ordering, or assumptions about when a fix becomes available. This affects the reliability of the remediation timeline claim.

    Authors: We acknowledge the need for greater detail on the temporal calculations. The revised Results section will include a dedicated subsection describing: (1) extraction of publication dates from npm registry metadata, (2) identification of the earliest vulnerable version using OSV ranges and semantic versioning, (3) determination of the fixing version as the first version outside the vulnerable range, and (4) the precise computation of the 19-day offset between fix availability and vulnerability disclosure. This will make the 4-year-11-month average fully reproducible. revision: yes

  3. Referee: Dependency network construction: while deps.dev is used to build graphs, there is no assessment of graph completeness, potential missing transitive edges, or how version ranges are resolved when counting affected packages. Any systematic incompleteness would alter the 232,836 count and concentration findings.

    Authors: We recognize that an explicit assessment of deps.dev graph quality is warranted. In the revision we will add to the Methods section: (i) a description of how deps.dev resolves version ranges to concrete versions, (ii) a sample-based completeness check (cross-validation of dependency trees for 100 packages against direct npm queries), and (iii) a limitations paragraph noting that while deps.dev provides broad coverage, isolated missing transitive edges remain possible. These changes will contextualize the 232,836 count and concentration results. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical counts from external APIs

full rationale

The paper's core results (21.60% vulnerable packages, 4y11m average fix time, severity distributions, top-vulnerability concentrations) are computed as straightforward counts, percentages, and means over a dataset assembled from npm-follower, deps.dev graphs, and OSV API lookups. No equations, fitted parameters, predictions, or first-principles derivations appear; the reported figures do not reduce to any internal definition or self-citation chain. The methodology section describes data ingestion and matching steps, but these are external data-processing operations rather than self-referential constructions. This is a standard empirical measurement study whose claims remain falsifiable against the same public APIs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the completeness and accuracy of three external data sources and on the assumption that package names plus version strings uniquely identify vulnerable instances across the entire npm corpus.

axioms (2)
  • domain assumption npm-follower and deps.dev API return complete and accurate dependency graphs for the sampled packages.
    Used to construct the dependency networks whose vulnerability counts are reported.
  • domain assumption Google Open Source Vulnerability API correctly maps package names and versions to reported vulnerabilities without significant false positives or omissions.
    Used to label packages as vulnerable.

pith-pipeline@v0.9.0 · 5606 in / 1527 out tokens · 71696 ms · 2026-05-10T05:10:06.035005+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Mahmoud Alfadel, Diego Elias Costa, and Emad Shihab. 2023. Empirical analysis of security vulnerabilities in python packages. Empirical Software Engineering 28, 3 (2023), 59

  2. [2]

    Claudia Ayala, Xavier Franch, Reidar Conradi, Jingyue Li, and Daniela Cruzes

  3. [3]

    Springer New York, New York, NY, 167–186

    Developing Software with Open Source Software Components . Springer New York, New York, NY, 167–186. https://doi .org/10.1007/978-1-4614-6596-6_9

  4. [4]

    Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: cost negotiation and community values in three software ecosystems. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering . 109–120

  5. [5]

    Marco Carvalho, Jared DeMott, Richard Ford, and David A Wheeler. 2014. Heart- bleed 101. IEEE security & privacy 12, 4 (2014), 63–67

  6. [6]

    Joël Cox, Eric Bouwers, Marko Van Eekelen, and Joost Visser. 2015. Measur- ing dependency freshness in software systems. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering , Vol. 2. IEEE, 109–118

  7. [7]

    CVE. [n.d.]. CVE Program. https://www .cve.org/About/Overview

  8. [8]

    Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the evolution of technical lag in the npm package dependency network. In2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 404–414

  9. [9]

    Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In Proceedings of the 15th international conference on mining software repositories . 181–191

  10. [10]

    Alexandre Decan, Tom Mens, and Philippe Grosjean. 2019. An empirical compar- ison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering 24, 1 (2019), 381–416

  11. [11]

    FiRST. [n.d.]. Common Vulnerability Scoring System Version 4.0 . https:// www.first.org/cvss/v4-0/

  12. [12]

    FIRST. [n.d.]. FIRST Vision and Mission Statement . https://www .first.org/about/ mission

  13. [13]

    Giammaria Giordano, Gerardo Festa, Gemma Catolino, Fabio Palomba, Filomena Ferrucci, and Carmine Gravino. 2024. On the adoption and effects of source code reuse on defect proneness and maintenance effort.Empirical Software Engineering 29, 1 (2024), 20

  14. [14]

    Antonios Gkortzis, Daniel Feitosa, and Diomidis Spinellis. 2021. Software reuse cuts both ways: An empirical analysis of its relationship with security vulnera- bilities. Journal of Systems and Software 172 (2021), 110653

  15. [15]

    Google. [n.d.]. Open Source Vulnerabilities. https://google .github.io/osv.dev/api/

  16. [16]

    Raphael Hiesgen, Marcin Nawrocki, Thomas C Schmidt, and Matthias Wählisch

  17. [17]

    A vailable: https://arxiv.org/abs/2205.02544

    The race to the vulnerable: Measuring the log4j shell incident. arXiv preprint arXiv:2205.02544 (2022)

  18. [18]

    Manuel Hoffman, Frank Nagle, and Yanuo Zhou. 2024. The Value of Open Source Software. In Harvard Business School Strategy Unit Working Paper No. 24-038

  19. [19]

    Muhammad Ikram, Rahat Masood, Gareth Tyson, Mohamed Ali Kaafar, Noha Loizon, and Roya Ensafi. 2020. Measuring and Analysing the Chain of Implicit Trust: A Study of Third-party Resources Loading. ACM Trans. Priv. Secur. 23, 2, Article 8 (April 2020), 27 pages. https://doi .org/10.1145/3380466

  20. [20]

    Open Source Insights. [n.d.]. deps.dev API. https://docs .deps.dev/api/v3/

  21. [21]

    Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Struc- ture and evolution of package dependency networks. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 102-112 (2017)

  22. [22]

    Chengwei Liu, Sen Chen, Lingling Fan, Bihuan Chen, Yang Liu, and Xin Peng

  23. [23]

    In Proceedings of the 44th International Con- ference on Software Engineering

    Demystifying the vulnerability propagation and its evolution via depen- dency trees in the npm ecosystem. In Proceedings of the 44th International Con- ference on Software Engineering . 672–684

  24. [24]

    npm. [n.d.]. Reporting malware in an npm package . https://docs .npmjs.com/ reporting-malware-in-an-npm-package

  25. [25]

    NVD. [n.d.]. General Information. https://nvd .nist.gov/general

  26. [26]

    NVD. [n.d.]. Vulnerability Metrics. https://nvd .nist.gov/vuln-metrics/cvss

  27. [27]

    Donald Pinckney, Federico Cassano, Arjun Guha, and Jonathan Bell. 2023. npm- follower: A Complete Dataset Tracking the NPM Ecosystem. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering . 2132–2136

  28. [28]

    Sonatype. [n.d.]. 2024 State of the Software Supply Chain . https:// www.sonatype.com/state-of-the-software-supply-chain/Introduction

  29. [29]

    Jonathan Spring, Eric Hatleback, Allen Householder, Art Manion, and Deana Shick. 2021. Time to Change the CVSS? IEEE Security & Privacy 19, 2 (2021), 74–78

  30. [30]

    GitHub Staff. 2024. Octoverse 2024: The state of open source

  31. [31]

    Synopsys. [n.d.]. 2024 Open Source Security and Risk Analysis Re- port. https://www .synopsys.com/software-integrity/resources/analyst-reports/ open-source-security-risk-analysis .html

  32. [32]

    Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A look at the dynam- ics of the JavaScript package ecosystem. In Proceedings of the 13th International Conference on Mining Software Repositories . 351–361

  33. [33]

    Seunghoon Woo, Eunjin Choi, Heejo Lee, and Hakjoo Oh. 2023. {V1SCAN}: Discovering 1-day Vulnerabilities in Reused {C/C++} Open-source Software Components Using Code Classification Techniques. In 32nd USENIX Security Symposium (USENIX Security 23) . 6541–6556

  34. [34]

    Ahmed Zerouali, Eleni Constantinou, Tom Mens, Gregorio Robles, and Jesús González-Barahona. 2018. An empirical analysis of technical lag in npm package dependencies. In International Conference on Software Reuse . Springer, 95–110

  35. [35]

    Ahmed Zerouali, Tom Mens, Alexandre Decan, and Coen De Roover. 2022. On the impact of security vulnerabilities in the npm and RubyGems dependency networks. Empirical Software Engineering 27, 5 (2022), 107

  36. [36]

    Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel

  37. [37]

    Rejected

    Small world with high risks: A study of security threats in the npm ecosystem. In 28th USENIX Security symposium (USENIX security 19) . 995–1010. A Appendix A.1 Vulnerability Database Creation Algorithm In this section, we present a detailed explanation of our Algorithm 1 for the data linkage process outlined in Section 3.2. The Unique Vul- nerability Dat...