pith. sign in

arxiv: 2604.20015 · v1 · submitted 2026-04-21 · 💻 cs.SE

FIKA: Expanding Dependency Reachability with Executability Guarantees

Pith reviewed 2026-05-10 01:29 UTC · model grok-4.3

classification 💻 cs.SE
keywords dependency reachabilityexecutability guaranteesthird-party librariesJava projectsvulnerability analysisstatic analysistest coveragecall site reachability
0
0 comments X

The pith

FIKA generates executable code snippets to prove reachability for third-party library call sites that tests miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

FIKA is a pipeline that creates and runs code to generate execution traces confirming that call sites to third-party libraries are actually reachable. Static analysis tools lack this execution evidence and thus overstate potential dependency problems. On eight Java projects, existing tests already cover 54 percent of call sites on average, but FIKA adds another 20 percent and confirms executability for 2363 dependency methods. In six projects it reaches strong guarantees that more than 75 percent of call sites can execute, and it resolves inconclusive cases from static vulnerability tools such as Semgrep.

Core claim

By generating executable snippets for dependency call sites, running them, and inspecting the resulting traces, FIKA supplies concrete evidence that the sites are reachable. This expands coverage beyond the 54 percent already shown by test suites, reaches 2363 dependency methods, delivers over 75 percent executable guarantees in most projects, and strengthens the output of static reachability tools when they remain inconclusive.

What carries the argument

The FIKA pipeline, which identifies call sites, produces runnable snippets that invoke them, executes the snippets, and analyzes traces to certify executability.

If this is right

  • Coverage of executable dependency call sites rises by 20 percent on average across projects.
  • Over 75 percent of call sites receive strong executability guarantees in six of the eight evaluated projects.
  • 2363 additional dependency methods gain demonstrated executability beyond what tests alone provide.
  • Static vulnerability reachability results become more decisive when FIKA supplies execution evidence for inconclusive cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same generate-and-execute approach could be adapted to languages other than Java if equivalent snippet generation is feasible.
  • Integrating FIKA with broader automated testing frameworks might raise coverage even higher in future work.
  • Dependency management platforms could eventually use such guarantees to suppress alerts that lack any execution path.

Load-bearing premise

The generated snippets and their traces give sound, non-spurious evidence of reachability without creating false positives or overlooking execution contexts that would occur in real runs.

What would settle it

A case where a call site labeled executable by FIKA never actually runs in the live application, or a reachable site that FIKA misses because snippet generation cannot produce a valid execution path.

Figures

Figures reproduced from arXiv: 2604.20015 by Benoit Baudry, Martin Monperrus, Meriem Ben Chaaben, Yogya Gamage.

Figure 1
Figure 1. Figure 1: The architecture of FIKA. It takes the project source code as input. Static analysis produces a set of call sites of third-party libraries, which is then used to create reachability scenarios for invoking non-covered call sites. The generated reachability scenarios are validated and integrated following successful validation. evidence of executability for a total of 2363 third-party library call sites. Ove… view at source ↗
Figure 4
Figure 4. Figure 4: An example generated reachability scenario by [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Automated third-party library analysis tools help developers by addressing key dependency management challenges, such as automating version updates, detecting vulnerabilities, and detecting breaking updates. Dependency reachability analysis aims at improving the precision of dependency management, by reducing the space of dependency issues to the ones that actually matter. Most tools for dependency reachability analysis are static and fundamentally limited by the absence of execution. In this paper, we propose FIKA, a pipeline for providing guarantees of executability for third-party library call sites. FIKA generates code that is executed, and whose execution trace provides guarantees that a third-party library call site is actually reachable. We apply our approach to a dataset of eight Java projects to empirically evaluate the effectiveness of FIKA. On average, 54% of these call sites are covered by the existing test suites, and therefore, have evidence for their executability. FIKA further improves this coverage by 20% and is able to demonstrate executability for 2363 dependency methods. In six out of eight projects, FIKA provides strong guarantees that more than 75% of call sites are executable. We further demonstrate that FIKA is capable of improving the results provided by Semgrep, a state-of-the-art static vulnerability reachability analysis tool. We show that FIKA can help prioritize the vulnerability updates with stronger guarantees of executability in cases where Semgrep yields inconclusive reachability results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents FIKA, a pipeline that generates executable code snippets whose execution traces provide dynamic guarantees of reachability for third-party library call sites in Java projects. On a dataset of eight projects, existing test suites cover 54% of call sites on average; FIKA claims to improve this by 20%, demonstrate executability for 2363 dependency methods, achieve >75% executable call sites in six projects, and resolve inconclusive cases from the static tool Semgrep.

Significance. If the executability evidence is sound and context-preserving, the work could meaningfully strengthen dependency reachability analysis by supplying dynamic confirmation where static tools are inconclusive, aiding vulnerability prioritization. The scale of the evaluation (eight projects, thousands of methods) and the concrete coverage gains suggest practical relevance for software engineering tools if the methodological details hold.

major comments (3)
  1. [Evaluation] Evaluation section: The reported 20% coverage improvement, 54% baseline, and 2363 executable methods are presented without details on project selection criteria, how additional coverage is measured (e.g., via new test execution or trace analysis), or any statistical significance testing; this undermines assessment of the central empirical claims.
  2. [Approach] Approach section (snippet generation): The description does not specify how generated snippets preserve original call-site data-flow and control dependencies versus using synthetic drivers or direct invocations; without this, execution traces may confirm executability only under artificial conditions, risking spurious reachability evidence.
  3. [Evaluation] Comparison with Semgrep: The claim that FIKA improves inconclusive Semgrep results lacks concrete examples of those cases, the exact resolution mechanism, and evidence that FIKA does not introduce its own false positives or miss contexts that static analysis would capture.
minor comments (1)
  1. [Abstract] The abstract and evaluation could more explicitly state the total number of call sites analyzed and the per-project breakdown to allow readers to verify the 'six out of eight' and 'more than 75%' statements.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing clarifications and indicating revisions that will be incorporated to improve the paper.

read point-by-point responses
  1. Referee: Evaluation section: The reported 20% coverage improvement, 54% baseline, and 2363 executable methods are presented without details on project selection criteria, how additional coverage is measured (e.g., via new test execution or trace analysis), or any statistical significance testing; this undermines assessment of the central empirical claims.

    Authors: We agree that the Evaluation section would benefit from greater detail on these aspects. In the revised manuscript, we will add explicit criteria for project selection (open-source Java projects using Maven or Gradle with available test suites and substantial third-party dependencies), clarify that additional coverage is measured by executing FIKA-generated snippets for uncovered call sites and analyzing their traces for reachability (distinct from existing test executions), and report per-project variability. Formal statistical significance testing was not applied, as the study is an empirical evaluation on a fixed set of eight projects rather than a controlled experiment; we will explicitly note this limitation and the rationale. revision: yes

  2. Referee: Approach section (snippet generation): The description does not specify how generated snippets preserve original call-site data-flow and control dependencies versus using synthetic drivers or direct invocations; without this, execution traces may confirm executability only under artificial conditions, risking spurious reachability evidence.

    Authors: We acknowledge that the current description of snippet generation could be more explicit on this point. The FIKA approach uses static analysis to extract the call site's enclosing context, including relevant control-flow paths and data dependencies, and constructs snippets that initialize necessary variables and replicate the original invocation environment rather than relying on synthetic drivers or isolated direct calls. We will revise the Approach section to include a more detailed explanation, an illustrative example, and pseudocode to demonstrate how dependencies are preserved, thereby reducing the risk of artificial conditions. revision: yes

  3. Referee: Evaluation section: Comparison with Semgrep: The claim that FIKA improves inconclusive Semgrep results lacks concrete examples of those cases, the exact resolution mechanism, and evidence that FIKA does not introduce its own false positives or miss contexts that static analysis would capture.

    Authors: We will expand the comparison with Semgrep in the revised manuscript by including concrete examples of inconclusive cases and detailing the resolution mechanism: FIKA generates executable snippets for those call sites and uses execution traces to confirm reachability where Semgrep could not. We will also add a discussion of limitations, noting that dynamic execution provides positive evidence of executability but may not cover all possible contexts (potentially missing some static reachability paths) and does not introduce false positives for reachability claims since it relies on actual execution; this positions FIKA as a complement to static tools. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical pipeline

full rationale

The paper describes an empirical pipeline (FIKA) that generates executable snippets from dependency call sites in eight external Java projects, executes them to produce traces, and reports coverage improvements (20% average, 2363 methods) plus comparison to the independent static tool Semgrep. No equations, fitted parameters, self-definitional relations, or load-bearing self-citations appear in the provided text. Claims rest on direct experimental outcomes against external benchmarks rather than any reduction to inputs by construction. This matches the default expectation for a non-circular empirical software engineering study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the assumption that generated executable code can be run safely inside the project's test harness and that successful execution constitutes a sound witness of reachability. No free parameters, axioms, or invented entities are visible in the abstract.

pith-pipeline@v0.9.0 · 5555 in / 1201 out tokens · 24920 ms · 2026-05-10T01:29:05.863331+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Surviving Software Dependencies: Software reuse is finally here but comes with risks,

    R. Cox, “Surviving Software Dependencies: Software reuse is finally here but comes with risks,”Communications of the ACM, vol. 17, p. 24–47, Apr. 2019

  2. [2]

    Quality, productivity and economic benefits of software reuse: a review of industrial studies,

    P. Mohagheghi and R. Conradi, “Quality, productivity and economic benefits of software reuse: a review of industrial studies,”Empirical Software Engineering, vol. 12, p. 471–516, Oct. 2007

  3. [3]

    An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software,

    T. Rausch, W. Hummer, P. Leitner, and S. Schulte, “An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software,” inProceedings of the International Conference on Mining Software Repositories (MSR), pp. 345–355, 2017

  4. [4]

    SoK: Taxonomy of Attacks on Open-Source Software Supply Chains,

    P. Ladisa, H. Plate, M. Martinez, and O. Barais, “SoK: Taxonomy of Attacks on Open-Source Software Supply Chains,” inProceedings of the Symposium on Security and Privacy (S&P), 2023

  5. [5]

    BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates,

    F. Reyes, Y . Gamage, G. Skoglund, B. Baudry, and M. Monperrus, “BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates,” inProceedings of the International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 159–170, 2024

  6. [6]

    Unmasking Phantom Dependencies with Software Bill-of- Materials as Ecosystem-Neutral Metadata

    S. Larson, “Unmasking Phantom Dependencies with Software Bill-of- Materials as Ecosystem-Neutral Metadata.” Python Software Founda- tion, 2025

  7. [7]

    The Log4j Incident: A Comprehensive Measurement Study of a Critical Vulnerability,

    R. Hiesgen, M. Nawrocki, T. C. Schmidt, and M. W ¨ahlisch, “The Log4j Incident: A Comprehensive Measurement Study of a Critical Vulnerability,”IEEE Transactions on Network and Service Management, vol. 21, no. 6, pp. 5921–5934, 2024

  8. [8]

    Research Directions in Software Supply Chain Security,

    L. Williams, G. Benedetti, S. Hamer, R. Paramitha, I. Rahman, M. Tamanna, G. Tystahl, N. Zahan, P. Morrison, Y . Acar, M. Cukier, C. K ¨astner, A. Kapravelos, D. Wermke, and W. Enck, “Research Directions in Software Supply Chain Security,”ACM Transactions on Software Engineering and Methodology, May 2025

  9. [9]

    Securing dependencies: A comprehensive study of dependabot’s impact on vulnerability mitigation,

    H. Mohayeji, A. Agaronian, E. Constantinou, N. Zannone, and A. Sere- brenik, “Securing dependencies: A comprehensive study of dependabot’s impact on vulnerability mitigation,”Empirical Software Engineering, vol. 30, no. 3, 2025

  10. [10]

    Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite,

    A. Dann, H. Plate, B. Hermann, S. E. Ponta, and E. Bodden, “Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite,” IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3613– 3625, 2022

  11. [11]

    The design space of lockfiles across package managers,

    Y . Gamage, D. Tiwari, M. Monperrus, and B. Baudry, “The design space of lockfiles across package managers,”Empirical Software Engineering, vol. 31, no. 3, p. 63, 2026

  12. [12]

    Challenges of producing software bill of materials for java,

    M. Balliu, B. Baudry, S. Bobadilla, M. Ekstedt, M. Monperrus, J. Ron, A. Sharma, G. Skoglund, C. Soto-Valero, and M. Wittlinger, “Challenges of producing software bill of materials for java,”IEEE Security & Privacy, vol. 21, no. 6, pp. 12–23, 2023

  13. [13]

    Can automated pull requests encourage software developers to upgrade out-of-date dependencies?,

    S. Mirhosseini and C. Parnin, “Can automated pull requests encourage software developers to upgrade out-of-date dependencies?,” in2017 32nd IEEE/ACM International Conference on Automated Software En- gineering (ASE), pp. 84–94, 2017

  14. [14]

    A com- prehensive study of bloated dependencies in the maven ecosystem,

    C. Soto-Valero, N. Harrand, M. Monperrus, and B. Baudry, “A com- prehensive study of bloated dependencies in the maven ecosystem,” Empirical Software Engineering, 2021

  15. [15]

    Frankenstein: fast and lightweight call graph generation for software builds,

    M. Keshani, G. Gousios, and S. Proksch, “Frankenstein: fast and lightweight call graph generation for software builds,”Empirical Soft- ware Engineering, vol. 29, no. 1, p. 1, 2024

  16. [16]

    On the sound- ness of call graph construction in the presence of dynamic language features - a benchmark and tool evaluation,

    L. Sui, J. Dietrich, M. Emery, S. Rasheed, and A. Tahir, “On the sound- ness of call graph construction in the presence of dynamic language features - a benchmark and tool evaluation,” inProgramming Languages and Systems, pp. 69–88, Springer International Publishing, 2018

  17. [17]

    Call graph soundness in android static analysis,

    J. Samhi, R. Just, T. F. Bissyand ´e, M. D. Ernst, and J. Klein, “Call graph soundness in android static analysis,” inThe ACM International Conference on the Foundations of Software Engineering (FSE), ISSTA 2024, p. 945–957, Association for Computing Machinery, 2024

  18. [18]

    Coverage- based debloating for java bytecode,

    C. Soto-Valero, T. Durieux, N. Harrand, and B. Baudry, “Coverage- based debloating for java bytecode,”ACM Transactions on Software Engineering and Methodology, vol. 32, Apr. 2023

  19. [19]

    Exploiting Library Vulnerability via Migration Based Automating Test Generation,

    Z. Chen, X. Hu, X. Xia, Y . Gao, T. Xu, D. Lo, and X. Yang, “Exploiting Library Vulnerability via Migration Based Automating Test Generation,” inProceedings of the International Conference on Software Engineering (ICSE), 2024

  20. [20]

    ztd J AV A: Mitigating software supply chain vulnerabilities via zero-trust dependencies,

    P. Amusuo, K. A. Robinson, T. Singla, H. Peng, A. Machiry, S. Torres- Arias, L. Simon, and J. C. Davis, “ztd J AV A: Mitigating software supply chain vulnerabilities via zero-trust dependencies,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), (Los Alamitos, CA, USA), IEEE Computer Society, May 2025

  21. [21]

    Sootup: A redesign of the soot static analysis framework,

    K. Karakaya, S. Schott, J. Klauke, E. Bodden, M. Schmidt, L. Luo, and D. He, “Sootup: A redesign of the soot static analysis framework,” inTools and Algorithms for the Construction and Analysis of Systems, pp. 229–247, Springer Nature Switzerland, 2024

  22. [22]

    Spoon: A library for implementing analyses and transformations of java source code,

    R. Pawlak, M. Monperrus, N. Petitprez, C. Noguera, and L. Seinturier, “Spoon: A library for implementing analyses and transformations of java source code,”Software: practice & experience, vol. 46, no. 9, p. 1155–1179, 2016

  23. [23]

    Deepseek-v3.2: Pushing the frontier of open large language models,

    DeepSeek-AI, A. Liu, A. Mei, B. Lin, and B. X. et al., “Deepseek-v3.2: Pushing the frontier of open large language models,” 2025

  24. [24]

    Langchain

    LangChain, Inc., “Langchain.” https://github.com/langchain-ai/ langchain, 2023. Accessed: 2025

  25. [25]

    Langgraph: Stateful, multi-actor applications with llms

    LangChain, Inc., “Langgraph: Stateful, multi-actor applications with llms.” https://github.com/langchain-ai/langgraph, 2024. Accessed: 2025

  26. [26]

    Automatic special- ization of third-party java dependencies,

    C. Soto-Valero, D. Tiwari, T. Toady, and B. Baudry, “Automatic special- ization of third-party java dependencies,”IEEE Transactions on Software Engineering, vol. 49, no. 11, pp. 5027–5045, 2023

  27. [27]

    Automatically Generating Rules of Malicious Software Packages via Large Language Model ,

    X. Zhang, X. Du, H. Chen, Y . He, W. Niu, and Q. Li, “ Automatically Generating Rules of Malicious Software Packages via Large Language Model ,” inProceedings of the International Conference on Dependable Systems and Networks (DSN), pp. 734–747, June 2025

  28. [28]

    Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments,

    G. Matute, W. Ni, T. Barik, A. Cheung, and S. E. Chasins, “Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments,”Proceedings of the Confer- ence on Programming Language Design and Implementation (PLDI), pp. 2051–2072, 2024

  29. [29]

    Reachcheck: Compositional library-aware call graph reachability analysis in the ides,

    C. Wang, L. Lin, C. Wang, J. Huang, C. Wu, and R. Wu, “Reachcheck: Compositional library-aware call graph reachability analysis in the ides,” ACM Transactions on Software Engineering and Methodology, Sept. 2025

  30. [30]

    Do you have 5 min? improving call graph analysis with runtime information,

    J. Samhi, M. Miltenberger, M. Alecci, S. Arzt, T. Bissyand ´e, and J. Klein, “Do you have 5 min? improving call graph analysis with runtime information,” inThe ACM International Conference on the Foundations of Software Engineering (FSE), p. 540–544, 2025

  31. [31]

    Mockingbird: A framework for enabling targeted dynamic analysis of java programs,

    D. Lockwood, B. Holland, and S. Kothari, “Mockingbird: A framework for enabling targeted dynamic analysis of java programs,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 39–42, 2019

  32. [32]

    Gaps: Guiding dynamic android analysis with static path synthesis,

    S. Doria and E. Losiouk, “Gaps: Guiding dynamic android analysis with static path synthesis,”arXiv preprint arXiv:2511.23213, 2025

  33. [33]

    On the effect of transitivity and granularity on vulnerability propagation in the maven ecosystem,

    A. M. Mir, M. Keshani, and S. Proksch, “On the effect of transitivity and granularity on vulnerability propagation in the maven ecosystem,” in2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 201–211, 2023

  34. [34]

    Detection, assessment and mitigation of vulnerabilities in open source dependencies,

    S. E. Ponta, H. Plate, and A. Sabetta, “Detection, assessment and mitigation of vulnerabilities in open source dependencies,”Empirical Software Engineering, vol. 25, no. 5, pp. 3175–3215, 2020

  35. [35]

    Test mimicry to assess the exploitability of library vulnerabilities,

    H. J. Kang, T. G. Nguyen, B. Le, C. S. P ˘as˘areanu, and D. Lo, “Test mimicry to assess the exploitability of library vulnerabilities,” in Proceedings of the International Symposium on Software Testing and Analysis, p. 276–288, 2022

  36. [36]

    Magneto: A step-wise approach to exploit vulnerabilities in dependent libraries via llm-empowered directed fuzzing,

    Z. Zhou, Y . Yang, S. Wu, Y . Huang, B. Chen, and X. Peng, “Magneto: A step-wise approach to exploit vulnerabilities in dependent libraries via llm-empowered directed fuzzing,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, p. 1633–1644, Association for Computing Machinery, 2024

  37. [37]

    Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing

    Y . Zhao, M. Wu, X. Hu, S. Wang, M. Luo, and X. Xia, “Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing,”arXiv preprint arXiv:2604.04102, 2026

  38. [38]

    Understanding the impact of apis behavioral breaking changes on client applications,

    D. Jayasuriya, V . Terragni, J. Dietrich, and K. Blincoe, “Understanding the impact of apis behavioral breaking changes on client applications,” inProceedings of the International Conference on the Foundations of Software Engineering (FSE), 2024

  39. [39]

    Towards Supporting Open Source Library Maintainers with Community-Based Analytics,

    R. Raj and D. E. Costa, “Towards Supporting Open Source Library Maintainers with Community-Based Analytics,” inProceedings of the International Conference on Software Engineering (ICSE), 2026