Cross-Ecosystem Vulnerability Analysis for Python Applications
read the original abstract
Python applications depend on third-party native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these native libraries, determining which Python packages are affected requires analysis across ecosystem boundaries, from Python dependency graphs to OS distribution packages. Current vulnerability scanners produce false negatives by overlooking vulnerabilities in vendored native libaries and false positives by failing to account for security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream project releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, We identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis reduces false positives by 52% on average compared to upstream version matching, and by up to 97% for heavily-patched libraries. We responsibly disclosed all findings to maintainers; 54 issues have been fixed to date.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.