pith. sign in

arxiv: 2606.22827 · v2 · pith:5A7AA3XWnew · submitted 2026-06-22 · 💻 cs.CR · cs.SE

What You See Is Not What You Execute: Memory-Based Runtime SBOM Generation for Supply Chain Security

Pith reviewed 2026-06-29 05:03 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords SBOMmemory forensicsruntime analysisPython applicationssupply chain securitydependency graphsvulnerability assessmentVolatility plugins
0
0 comments X

The pith

MEM-SBOM generates runtime SBOMs for Python applications by parsing volatile memory without prior instrumentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework to create Software Bills of Materials that reflect the exact components loaded and executed during a Python program's run. Standard SBOM methods use static metadata or files that often miss dynamically loaded packages in languages like Python. MEM-SBOM instead examines memory dumps to recover modules directly from the interpreter, determine versions, map dependencies, and flag vulnerable code paths. This matters for supply chain security because it works in production settings and after-the-fact investigations where advance monitoring is impossible. Tests on 51 applications confirm it recovers everything that existing tools overlook.

Core claim

MEM-SBOM is the first memory forensics framework that generates SBOMs directly from the runtime state of Python applications. It recovers the modules from the interpreter's internal structures, resolves package versions, and analyzes bytecode to build dependency graphs and identify vulnerable functions. Implemented as Volatility 3 plugins, it achieves 100 percent extraction accuracy on 51 real-world applications, identifies cases where vulnerable routines are called, and recovers all runtime packages missed by existing SBOM tools.

What carries the argument

MEM-SBOM, a suite of Volatility 3 plugins that read Python interpreter internal structures from memory dumps to extract loaded modules, resolve versions, and construct executed dependency graphs.

If this is right

  • Runtime SBOMs can show precisely which functions from a dependency are invoked, enabling targeted vulnerability checks.
  • Supply chain analysis becomes possible in incident response without requiring any pre-installed monitoring.
  • Applications receive more complete dependency graphs than those produced by metadata or filesystem-based tools.
  • One tested application was shown to invoke vulnerable routines inside its tornado dependency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory-parsing technique might extend to other interpreted languages whose runtime keeps similar internal records.
  • Forensic teams could combine MEM-SBOM output with existing memory-analysis tools to trace execution history after an incident.
  • Repeated testing across more Python versions would be needed to confirm the parsing method remains reliable as interpreters evolve.

Load-bearing premise

The Python interpreter's internal data structures stay consistent enough in memory dumps that they can be parsed to recover every actually executed module and its version.

What would settle it

A memory dump from a running Python application that contains a loaded package MEM-SBOM completely fails to report while independent confirmation shows the package was executed.

Figures

Figures reproduced from arXiv: 2606.22827 by Andrew Case, Hala Ali, Irfan Ahmed.

Figure 1
Figure 1. Figure 1: Overview of the MEM-SBOM Framework cache (i.e., sys.path_importer_cache), which maps filesys￾tem paths to finder objects responsible for module dis￾covery (see Section 2.2). Each FileFinder instance main￾tains a persistent _path_cache containing all recognized directory entries, including source files (.py), bytecode files (.pyc), extension modules (.so, .pyd), and package￾metadata directories (.dist-info,… view at source ↗
Figure 2
Figure 2. Figure 2: Transitive dependency path from spyder to the vulnerable tornado package via zmq, showing that no vulnerable functions are reached [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Direct call path from streamlit to the vulnerable tornado.httputil.parse_body_arguments() function via the put() method, showing that the vulnerable function is reached server, and upload_file_request_handler modules. The byte￾code of put() confirms a direct call to the vulnerable function [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
read the original abstract

Modern software development relies heavily on third-party components from public repositories, expanding the software supply chain attack surface. In response to these growing risks, federal initiatives have advanced the Software Bill of Materials (SBOM) as a standardized mechanism for improving transparency by describing software components, dependencies, and their relationships. However, SBOMs built from metadata or filesystem artifacts fail to capture the components loaded and executed at runtime, especially in dynamic ecosystems such as Python. Moreover, generating runtime SBOMs through instrumentation requires monitoring to be deployed in advance and the system to remain observable throughout execution. Such conditions are difficult to satisfy in production environments and incident-response scenarios. Volatile memory, in contrast, provides a reliable source for recovering the actual runtime state of a running application without requiring prior instrumentation. Therefore, this paper presents MEM-SBOM, the first memory forensics framework that generates SBOMs directly from the runtime state of Python applications. It recovers the modules from the interpreter's internal structures, resolves package versions, and analyzes bytecode to build dependency graphs and identify vulnerable functions. We implemented MEM-SBOM as a suite of Volatility 3 plugins and evaluated it against 51 real-world Python applications. It achieves 100% extraction accuracy, identifies Streamlit as the only application that calls the vulnerable routines of the tornado dependency, and recovers all runtime packages missed by existing SBOM tools, providing more accurate dependency graphs and better vulnerability assessment. These capabilities make MEM-SBOM a practical foundation for software supply chain security and incident response by providing a forensically sound runtime view of what is executed on a system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents MEM-SBOM, the first memory forensics framework that generates runtime SBOMs for Python applications directly from volatile memory using a suite of Volatility 3 plugins. It recovers loaded modules from the CPython interpreter's internal structures, resolves package versions, analyzes bytecode to construct dependency graphs, and identifies vulnerable functions. The evaluation on 51 real-world applications reports 100% extraction accuracy, recovery of all runtime packages missed by existing SBOM tools, and a specific finding that Streamlit is the only tested application calling vulnerable routines in the tornado dependency.

Significance. If the empirical results hold, the contribution is significant for software supply chain security and incident response. It provides a forensically sound method to obtain an accurate runtime view of executed components in dynamic Python environments without requiring prior instrumentation, addressing limitations of metadata- or filesystem-based SBOMs and enabling better vulnerability assessment through recovered dependency graphs.

major comments (2)
  1. [Evaluation] Evaluation section: The central claim of 100% extraction accuracy on 51 applications is load-bearing for the paper's contribution, yet the provided description (including the abstract) supplies no details on methodology for establishing ground truth, selection criteria for the applications, handling of edge cases such as dynamically generated modules or varying Python interpreter versions, or error conditions in memory parsing. This omission prevents assessment of whether the reported accuracy supports the claims of reliable runtime SBOM generation.
  2. [Implementation] Implementation description: The claims that the framework 'recovers the modules from the interpreter's internal structures' and 'analyzes bytecode to build dependency graphs' require concrete technical details on the Volatility plugin logic (e.g., specific data structures walked or bytecode traversal algorithm) to substantiate that the approach works across the tested applications without hidden assumptions about memory layout stability.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by a one-sentence summary of the evaluation methodology and any limitations considered.
  2. Consider adding a table listing the 51 applications with key characteristics (e.g., Python version, number of dependencies) to contextualize the 100% accuracy result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and will revise the manuscript to incorporate additional details as outlined.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The central claim of 100% extraction accuracy on 51 applications is load-bearing for the paper's contribution, yet the provided description (including the abstract) supplies no details on methodology for establishing ground truth, selection criteria for the applications, handling of edge cases such as dynamically generated modules or varying Python interpreter versions, or error conditions in memory parsing. This omission prevents assessment of whether the reported accuracy supports the claims of reliable runtime SBOM generation.

    Authors: We agree that the Evaluation section requires expanded methodological details to substantiate the 100% accuracy claim. In the revised manuscript, we will add a new subsection that explicitly describes: ground truth establishment through direct comparison against runtime sys.modules inspection and importlib.metadata package data captured at dump time; selection criteria for the 51 applications (a mix of popular PyPI packages and real-world tools such as Streamlit, chosen for diversity in size, dependency complexity, and usage patterns); handling of edge cases including dynamically generated modules via exec() and importlib (tested in our evaluation with no accuracy loss); support across Python interpreter versions 3.8–3.11; and error conditions in memory parsing (e.g., partial structure recovery due to memory fragmentation), with none encountered in the reported experiments. These additions will enable readers to fully evaluate the reliability of the results. revision: yes

  2. Referee: [Implementation] Implementation description: The claims that the framework 'recovers the modules from the interpreter's internal structures' and 'analyzes bytecode to build dependency graphs' require concrete technical details on the Volatility plugin logic (e.g., specific data structures walked or bytecode traversal algorithm) to substantiate that the approach works across the tested applications without hidden assumptions about memory layout stability.

    Authors: We acknowledge that the Implementation section provides only high-level descriptions and will expand it with concrete technical details in the revision. The module recovery plugin walks the PyInterpreterState structure (accessed via the _PyRuntime symbol) to locate the modules PyDictObject, then iterates PyModuleObject entries to extract __name__, __file__, and version strings from __version__ attributes or PKG-INFO metadata in memory. The bytecode analysis plugin traverses PyCodeObject instances by following the co_code field, emulating Python's opcode parsing to identify IMPORT_NAME, LOAD_CONST, and CALL_FUNCTION opcodes for dependency graph construction; this logic is version-aware via offsets derived from the Python version string recovered from the interpreter. These specifics will clarify the data structures and algorithms used and confirm stability assumptions across the tested CPython builds. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical engineering result with independent evaluation

full rationale

The paper describes an implementation of Volatility 3 plugins that walk CPython interpreter structures in memory dumps to extract loaded modules, resolve versions, build dependency graphs from bytecode, and identify vulnerable functions. The central claims rest on this engineering pipeline and its empirical performance (100% extraction accuracy on 51 real-world applications). No equations, fitted parameters renamed as predictions, self-citation load-bearing premises, uniqueness theorems, or ansatzes are invoked. The evaluation is presented as direct testing against real applications rather than any reduction to prior fitted quantities or self-referential definitions. This is a standard non-circular engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework relies on standard memory forensics assumptions and existing Volatility tooling without introducing fitted parameters or new postulated entities.

axioms (1)
  • domain assumption Volatile memory contains sufficient and accurate information about the Python interpreter's loaded modules and bytecode state at the time of the dump.
    This premise underpins the entire recovery process described in the abstract.

pith-pipeline@v0.9.1-grok · 5821 in / 1187 out tokens · 33261 ms · 2026-06-29T05:03:17.175771+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains

    cs.SE 2026-07 unverdicted novelty 7.0

    The paper defines Agent Skill Supply Chains (ASSCs) and SkillDepAnalyzer to extract and analyze dependency graphs from over 1.43 million LLM agent skills, revealing structural patterns and security signals.

Reference graph

Works this paper leans on

39 extracted references · 17 canonical work pages · cited by 1 Pith paper

  1. [1]

    Forensic Science International: Digital Investigation 48, 301685

    Ali, H., Case, A., Ahmed, I., 2025a. Leveraging memory forensics to investigate and detect illegal 3d printing activities. Forensic Science International: Digital Investigation 53, 301925. doi:10.1016/j.fsidi. 2025.301925. Ali,H.,Case,A.,Ahmed,I.,2025b.Memoryanalysisofthepythonruntime environment. Forensic Science International: Digital Investigation 53, ...

  2. [2]

    GitHub repository

    Trivy: A comprehensive and versatile security scanner.https://github.com/aquasecurity/trivy. GitHub repository. Accessed: 2025-03-11. Benedetti, G., Cofano, S., Brighente, A., Conti, M.,

  3. [3]

    Accessed: 2025-11-11

    Django security releases issued: 5.2.8, 5.1.14, and 4.2.26.https://www.djangoproject.com/weblog/2025/nov/05/ security-releases/. Accessed: 2025-11-11. Bufalino, J., Di Francesco, M., Blaise, A., Secci, S.,

  4. [4]

    CoRRabs/2510.03163(2025).https://doi.org/10.48550/ARXIV.2510

    Sbomproof: Beyondallegedsbomcompliance forsupplychainsecurityofcontainer images. arXiv preprint arXiv:2510.05798 doi:10.48550/arXiv.2510. 05798. preprint. Chen, V.,

  5. [5]

    GitHub repository

    Awesome python.https://github.com/vinta/ awesome-python. GitHub repository. Accessed: 2025-10-02. Cofano, S., Benedetti, G., Dell’Amico, M.,

  6. [6]

    Sbom generation tools in the python ecosystem: An in-detail analysis, in: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE. pp. 427–434. doi:10.1109/ TrustCom63139.2024.00077. CybersecurityandInfrastructureSecurityAgency(CISA),2021.Defending AgainstSoftwareSupplyChainAttacks. Technical...

  7. [7]

    gov/sites/default/files/2023-04/sbom-types-document-508c.pdf

    Types of software bill of material (sbom) documents.https://www.cisa. gov/sites/default/files/2023-04/sbom-types-document-508c.pdf. Ac- cessed: 2025-03-11. Cybersecurity and Infrastructure Security Agency (CISA),

  8. [8]

    Technical Report

    2025 Minimum Elements for a Software Bill of Materials (SBOM): Public Comment Draft. Technical Report. U.S. Department of Homeland Security. Draft for public comment. CycloneDX Project, 2025a. cdxgen: Universal polyglot sbom generator. https://github.com/CycloneDX/cdxgen. GitHub repository. Accessed: 2025-04-26. CycloneDX Project, 2025b. Cyclonedx: A stan...

  9. [9]

    Halbritter, A., Merli, D.,

    Accessed: 2025-11-15. Halbritter, A., Merli, D.,

  10. [10]

    Harrison, A.,

    Accuracy evaluation of sbom tools for web applications and system-level software, in: Proceedings of the 19th InternationalConferenceonAvailability,ReliabilityandSecurity,Asso- ciationforComputingMachinery.pp.1–9.doi:10.1145/3664476.3670926. Harrison, A.,

  11. [11]

    GitHub repository

    Sbom4python: Open-source tool to generate an sbom for installed python modules.https://github.com/anthonyharrison/ sbom4python. GitHub repository. Accessed: 2025-04-26. Kampourakis, V., Kavallieratios, G., Gkioulos, V., Katsikas, S.,

  12. [12]

    Cracksinthechain:Atechnicalanalysisofreal-lifesupplychainsecurity incidents.Computers&Security159,104673.doi:10.1016/j.cose.2025. 104673. Kawaguchi, N., Hart, C.,

  13. [13]

    On the deployment control and runtime monitoringofcontainersbasedonconsumersidesboms,in:2024IEEE 21st Consumer Communications & Networking Conference (CCNC), IEEE. pp. 1022–1025. doi:10.1109/CCNC51664.2024.10454654. Lakshmanan, R.,

  14. [14]

    Accessed: 2025-11-10

    Malicious python packages on pypi downloaded 39,000+ times, steal sensitive data.https://thehackernews.com/2025/ 04/malicious-python-packages-on-pypi.html. Accessed: 2025-11-10. Mike Fiedler,

  15. [15]

    org/posts/2023-09-18-inbound-malware-reporting/

    Inbound malware volume report.https://blog.pypi. org/posts/2023-09-18-inbound-malware-reporting/. Accessed: 2025- 11-10. Mirakhorli, M., Garcia, D., Dillon, S., Laporte, K., Morrison, M., Lu, H., Koscinski, V., Enoch, C.,

  16. [16]

    arXiv preprint arXiv:2402.11151 doi:10.48550/arXiv.2402.11151

    A landscape study of open source and proprietary tools for software bill of materials (sbom). arXiv preprint arXiv:2402.11151 doi:10.48550/arXiv.2402.11151. preprint. Nahum, M., Grolman, E., Maimon, I., Mimran, D., Brodt, O., Elyashar, A., Elovici, Y., Shabtai, A.,

  17. [17]

    Computers & Security 144, 103977

    Ossintegrity: Collaborative open- source code integrity verification. Computers & Security 144, 103977. doi:10.1016/j.cose.2024.103977. NationalCounterintelligenceandSecurityCenter(NCSC),2021. Software Supply Chain Attacks. Technical Report. Office of the Director of National Intelligence (ODNI). Accessed: 2025-11-15. Nelson, N.,

  18. [18]

    Ac- cessed: 2025-01-06

    Novel pypi malware uses compiled python bytecode toevadedetection.https://www.darkreading.com/application-security/ novel-pypi-malware-compiled-python-bytecode-evade-detection. Ac- cessed: 2025-01-06. nexB,

  19. [19]

    GitHub repository

    pip-requirements-parser.https://github.com/nexB/ pip-requirements-parser. GitHub repository. Accessed: 2025-10-05. Office of the Director of National Intelligence (ODNI), National Security Agency (NSA), Cybersecurity and Infrastructure Security Agency (CISA),

  20. [20]

    https://www.cisa.gov/sites/default/files/2024-08/SECURING_THE_ SOFTWARE_SUPPLY_CHAIN_RECOMMENDED_PRACTICES_FOR_SOFTWARE_BILL_OF_ MATERIALS_CONSUMPTION-508.pdf

    Securing the software supply chain: Recommended practices for software bill of materials consumption. https://www.cisa.gov/sites/default/files/2024-08/SECURING_THE_ SOFTWARE_SUPPLY_CHAIN_RECOMMENDED_PRACTICES_FOR_SOFTWARE_BILL_OF_ MATERIALS_CONSUMPTION-508.pdf. Accessed: 2025-03-11. OSS Review Toolkit,

  21. [21]

    GitHub repository

    Oss review toolkit (ort): Foss policy automation and sbom generation framework.https://github.com/ oss-review-toolkit/ort. GitHub repository. Accessed: 2025-04-27. Park, T., Lettner, J., Na, Y., Volckaert, S., Franz, M.,

  22. [22]

    Python Documentation: site — Site- specific configuration hook.https://docs.python.org/3/library/site. html. Accessed: 2025-01-09. Reichelt, D.G., Bulej, L., Jung, R., Van Hoorn, A.,

  23. [23]

    Association for Computing Machinery

    A Machine Learning-Based Approach For Detecting Malicious PyPI Packages. Association for Computing Machinery. p. 1617–1626. doi:10.1145/ 3672608.3707756. Scholtes, R., Khodayari, S., Staicu, C.A., Pellegrino, G.,

  24. [24]

    Charon: Polyglotcodeanalysisfordetectingvulnerabilitiesinscriptinglanguages nativeextensions,in:2025IEEE10thEuropeanSymposiumonSecurity and Privacy (EuroS&P), IEEE. pp. 153–168. doi:10.1109/EuroSP63326. 2025.00018. Sharma, A., Wittlinger, M., Baudry, B., Monperrus, M.,

  25. [25]

    arXiv preprint arXiv:2407.00246 doi:10.48550/arXiv.2407.00246

    Sbom.exe: Counteringdynamiccodeinjectionbasedonsoftwarebillofmaterialsin java. arXiv preprint arXiv:2407.00246 doi:10.48550/arXiv.2407.00246. preprint. Smith, E.V.,

  26. [26]

    python.org/pep-0420/

    PEP 420: Implicit Namespace Packages.https://peps. python.org/pep-0420/. Accessed: 2025-10-30. Snow, E.,

  27. [27]

    https://peps.python.org/pep-0451/

    PEP 451: A ModuleSpec Type for the Import System. https://peps.python.org/pep-0451/. Accessed: 2025-10-30. Software Package Data Exchange (SPDX),

  28. [28]

    Accessed: 2025- 03-12

    Spdx: Open standard for software bill of materials (sbom).https://spdx.dev/. Accessed: 2025- 03-12. Sonatype,

  29. [29]

    Accessed: 2025-03-10

    10th annual state of the software supply chain.https:// www.sonatype.com/state-of-the-software-supply-chain/introduction. Accessed: 2025-03-10. Sonatype Nexus Community,

  30. [30]

    GitHub repository

    Jake: Dependency analysis and sbom generation tool.https://github.com/sonatype-nexus-community/jake/ tree/803ec4f63e2c117352463065cf69d12a69f38450. GitHub repository. Accessed: 2025-10-02. Stalnaker, T., Wintersgill, N., Chaparro, O., Di Penta, M., German, D.M., Poshyvanyk, D.,

  31. [31]

    Xueying Du et al

    Boms away! inside the minds of stakeholders: A comprehensive study of bills of materials for software systems, in: Proceedings of the 46th IEEE/ACM International Conference on SoftwareEngineering,AssociationforComputingMachinery.pp.1–13. doi:10.1145/3597503.3623347. Stufft, D.,

  32. [32]

    org/pep-0503/

    PEP 503: Simple Repository API.https://peps.python. org/pep-0503/. Accessed: 2025-11-12. Tooling and Implementation Working Group,

  33. [33]

    Third Edition

    Framing software component transparency: Establishing a common software bill of materials (sbom), third edition.https://www.cisa.gov/sites/ default/files/2024-10/SBOM%20Framing%20Software%20Component% 20Transparency%202024.pdf. Third Edition. Volatility Foundation,

  34. [34]

    GitHub repository

    Volatility 3: The volatile memory extrac- tionframework.https://github.com/volatilityfoundation/volatility3. GitHub repository. Accessed: 2024-12-01. Vu, D.L., Newman, Z., Meyers, J.S.,

  35. [35]

    Bad snakes: Understanding andimprovingpythonpackageindexmalwarescanning,in:Proceedings of the 45th International Conference on Software Engineering (ICSE), IEEE. pp. 499–511. doi:10.1109/ICSE48619.2023.00052. Wang, J.,

  36. [36]

    Accessed: 2025-11-10

    Malicious pypi package zlibxjson targets browser data and credentials.https://www.fortinet.com/blog/threat-research/ malicious-pypi-package-zlibxjson-steals-browser-data. Accessed: 2025-11-10. Xia, B., Bi, T., Xing, Z., Lu, Q., Zhu, L.,

  37. [37]

    An empirical study on software bill of materials: Where we stand and the road ahead, in: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE. pp. 2630–2642. doi:10.1109/ICSE48619.2023.00219. Yu, S., Song, W., Hu, X., Yin, H.,

  38. [38]

    On the correctness of metadata- based sbom generation: A differential analysis approach, in: 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE. pp. 29–36. doi:10.1109/DSN58291.2024. 00018. Zahan, N., Lin, E., Tamanna, M., Enck, W., Williams, L.,

  39. [39]

    doi:10.1109/MSEC.2023.3237100

    Software billsofmaterialsarerequired.arewethereyet?IEEESecurity&Privacy 21, 82–88. doi:10.1109/MSEC.2023.3237100. A. Appendix Hala Ali, Andrew Case, Irfan Ahmed:Preprint submitted to ElsevierPage 19 of 19 What You See Is Not What You Execute Memory-Based Runtime SBOM Generation for Supply Chain Security Table 12 Evaluation dataset of 51 Python application...