FIKA: Expanding Dependency Reachability with Executability Guarantees
Pith reviewed 2026-05-10 01:29 UTC · model grok-4.3
The pith
FIKA generates executable code snippets to prove reachability for third-party library call sites that tests miss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By generating executable snippets for dependency call sites, running them, and inspecting the resulting traces, FIKA supplies concrete evidence that the sites are reachable. This expands coverage beyond the 54 percent already shown by test suites, reaches 2363 dependency methods, delivers over 75 percent executable guarantees in most projects, and strengthens the output of static reachability tools when they remain inconclusive.
What carries the argument
The FIKA pipeline, which identifies call sites, produces runnable snippets that invoke them, executes the snippets, and analyzes traces to certify executability.
If this is right
- Coverage of executable dependency call sites rises by 20 percent on average across projects.
- Over 75 percent of call sites receive strong executability guarantees in six of the eight evaluated projects.
- 2363 additional dependency methods gain demonstrated executability beyond what tests alone provide.
- Static vulnerability reachability results become more decisive when FIKA supplies execution evidence for inconclusive cases.
Where Pith is reading between the lines
- The same generate-and-execute approach could be adapted to languages other than Java if equivalent snippet generation is feasible.
- Integrating FIKA with broader automated testing frameworks might raise coverage even higher in future work.
- Dependency management platforms could eventually use such guarantees to suppress alerts that lack any execution path.
Load-bearing premise
The generated snippets and their traces give sound, non-spurious evidence of reachability without creating false positives or overlooking execution contexts that would occur in real runs.
What would settle it
A case where a call site labeled executable by FIKA never actually runs in the live application, or a reachable site that FIKA misses because snippet generation cannot produce a valid execution path.
Figures
read the original abstract
Automated third-party library analysis tools help developers by addressing key dependency management challenges, such as automating version updates, detecting vulnerabilities, and detecting breaking updates. Dependency reachability analysis aims at improving the precision of dependency management, by reducing the space of dependency issues to the ones that actually matter. Most tools for dependency reachability analysis are static and fundamentally limited by the absence of execution. In this paper, we propose FIKA, a pipeline for providing guarantees of executability for third-party library call sites. FIKA generates code that is executed, and whose execution trace provides guarantees that a third-party library call site is actually reachable. We apply our approach to a dataset of eight Java projects to empirically evaluate the effectiveness of FIKA. On average, 54% of these call sites are covered by the existing test suites, and therefore, have evidence for their executability. FIKA further improves this coverage by 20% and is able to demonstrate executability for 2363 dependency methods. In six out of eight projects, FIKA provides strong guarantees that more than 75% of call sites are executable. We further demonstrate that FIKA is capable of improving the results provided by Semgrep, a state-of-the-art static vulnerability reachability analysis tool. We show that FIKA can help prioritize the vulnerability updates with stronger guarantees of executability in cases where Semgrep yields inconclusive reachability results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents FIKA, a pipeline that generates executable code snippets whose execution traces provide dynamic guarantees of reachability for third-party library call sites in Java projects. On a dataset of eight projects, existing test suites cover 54% of call sites on average; FIKA claims to improve this by 20%, demonstrate executability for 2363 dependency methods, achieve >75% executable call sites in six projects, and resolve inconclusive cases from the static tool Semgrep.
Significance. If the executability evidence is sound and context-preserving, the work could meaningfully strengthen dependency reachability analysis by supplying dynamic confirmation where static tools are inconclusive, aiding vulnerability prioritization. The scale of the evaluation (eight projects, thousands of methods) and the concrete coverage gains suggest practical relevance for software engineering tools if the methodological details hold.
major comments (3)
- [Evaluation] Evaluation section: The reported 20% coverage improvement, 54% baseline, and 2363 executable methods are presented without details on project selection criteria, how additional coverage is measured (e.g., via new test execution or trace analysis), or any statistical significance testing; this undermines assessment of the central empirical claims.
- [Approach] Approach section (snippet generation): The description does not specify how generated snippets preserve original call-site data-flow and control dependencies versus using synthetic drivers or direct invocations; without this, execution traces may confirm executability only under artificial conditions, risking spurious reachability evidence.
- [Evaluation] Comparison with Semgrep: The claim that FIKA improves inconclusive Semgrep results lacks concrete examples of those cases, the exact resolution mechanism, and evidence that FIKA does not introduce its own false positives or miss contexts that static analysis would capture.
minor comments (1)
- [Abstract] The abstract and evaluation could more explicitly state the total number of call sites analyzed and the per-project breakdown to allow readers to verify the 'six out of eight' and 'more than 75%' statements.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing clarifications and indicating revisions that will be incorporated to improve the paper.
read point-by-point responses
-
Referee: Evaluation section: The reported 20% coverage improvement, 54% baseline, and 2363 executable methods are presented without details on project selection criteria, how additional coverage is measured (e.g., via new test execution or trace analysis), or any statistical significance testing; this undermines assessment of the central empirical claims.
Authors: We agree that the Evaluation section would benefit from greater detail on these aspects. In the revised manuscript, we will add explicit criteria for project selection (open-source Java projects using Maven or Gradle with available test suites and substantial third-party dependencies), clarify that additional coverage is measured by executing FIKA-generated snippets for uncovered call sites and analyzing their traces for reachability (distinct from existing test executions), and report per-project variability. Formal statistical significance testing was not applied, as the study is an empirical evaluation on a fixed set of eight projects rather than a controlled experiment; we will explicitly note this limitation and the rationale. revision: yes
-
Referee: Approach section (snippet generation): The description does not specify how generated snippets preserve original call-site data-flow and control dependencies versus using synthetic drivers or direct invocations; without this, execution traces may confirm executability only under artificial conditions, risking spurious reachability evidence.
Authors: We acknowledge that the current description of snippet generation could be more explicit on this point. The FIKA approach uses static analysis to extract the call site's enclosing context, including relevant control-flow paths and data dependencies, and constructs snippets that initialize necessary variables and replicate the original invocation environment rather than relying on synthetic drivers or isolated direct calls. We will revise the Approach section to include a more detailed explanation, an illustrative example, and pseudocode to demonstrate how dependencies are preserved, thereby reducing the risk of artificial conditions. revision: yes
-
Referee: Evaluation section: Comparison with Semgrep: The claim that FIKA improves inconclusive Semgrep results lacks concrete examples of those cases, the exact resolution mechanism, and evidence that FIKA does not introduce its own false positives or miss contexts that static analysis would capture.
Authors: We will expand the comparison with Semgrep in the revised manuscript by including concrete examples of inconclusive cases and detailing the resolution mechanism: FIKA generates executable snippets for those call sites and uses execution traces to confirm reachability where Semgrep could not. We will also add a discussion of limitations, noting that dynamic execution provides positive evidence of executability but may not cover all possible contexts (potentially missing some static reachability paths) and does not introduce false positives for reachability claims since it relies on actual execution; this positions FIKA as a complement to static tools. revision: yes
Circularity Check
No significant circularity in empirical pipeline
full rationale
The paper describes an empirical pipeline (FIKA) that generates executable snippets from dependency call sites in eight external Java projects, executes them to produce traces, and reports coverage improvements (20% average, 2363 methods) plus comparison to the independent static tool Semgrep. No equations, fitted parameters, self-definitional relations, or load-bearing self-citations appear in the provided text. Claims rest on direct experimental outcomes against external benchmarks rather than any reduction to inputs by construction. This matches the default expectation for a non-circular empirical software engineering study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Surviving Software Dependencies: Software reuse is finally here but comes with risks,
R. Cox, “Surviving Software Dependencies: Software reuse is finally here but comes with risks,”Communications of the ACM, vol. 17, p. 24–47, Apr. 2019
2019
-
[2]
Quality, productivity and economic benefits of software reuse: a review of industrial studies,
P. Mohagheghi and R. Conradi, “Quality, productivity and economic benefits of software reuse: a review of industrial studies,”Empirical Software Engineering, vol. 12, p. 471–516, Oct. 2007
2007
-
[3]
An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software,
T. Rausch, W. Hummer, P. Leitner, and S. Schulte, “An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software,” inProceedings of the International Conference on Mining Software Repositories (MSR), pp. 345–355, 2017
2017
-
[4]
SoK: Taxonomy of Attacks on Open-Source Software Supply Chains,
P. Ladisa, H. Plate, M. Martinez, and O. Barais, “SoK: Taxonomy of Attacks on Open-Source Software Supply Chains,” inProceedings of the Symposium on Security and Privacy (S&P), 2023
2023
-
[5]
BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates,
F. Reyes, Y . Gamage, G. Skoglund, B. Baudry, and M. Monperrus, “BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates,” inProceedings of the International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 159–170, 2024
2024
-
[6]
Unmasking Phantom Dependencies with Software Bill-of- Materials as Ecosystem-Neutral Metadata
S. Larson, “Unmasking Phantom Dependencies with Software Bill-of- Materials as Ecosystem-Neutral Metadata.” Python Software Founda- tion, 2025
2025
-
[7]
The Log4j Incident: A Comprehensive Measurement Study of a Critical Vulnerability,
R. Hiesgen, M. Nawrocki, T. C. Schmidt, and M. W ¨ahlisch, “The Log4j Incident: A Comprehensive Measurement Study of a Critical Vulnerability,”IEEE Transactions on Network and Service Management, vol. 21, no. 6, pp. 5921–5934, 2024
2024
-
[8]
Research Directions in Software Supply Chain Security,
L. Williams, G. Benedetti, S. Hamer, R. Paramitha, I. Rahman, M. Tamanna, G. Tystahl, N. Zahan, P. Morrison, Y . Acar, M. Cukier, C. K ¨astner, A. Kapravelos, D. Wermke, and W. Enck, “Research Directions in Software Supply Chain Security,”ACM Transactions on Software Engineering and Methodology, May 2025
2025
-
[9]
Securing dependencies: A comprehensive study of dependabot’s impact on vulnerability mitigation,
H. Mohayeji, A. Agaronian, E. Constantinou, N. Zannone, and A. Sere- brenik, “Securing dependencies: A comprehensive study of dependabot’s impact on vulnerability mitigation,”Empirical Software Engineering, vol. 30, no. 3, 2025
2025
-
[10]
Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite,
A. Dann, H. Plate, B. Hermann, S. E. Ponta, and E. Bodden, “Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite,” IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3613– 3625, 2022
2022
-
[11]
The design space of lockfiles across package managers,
Y . Gamage, D. Tiwari, M. Monperrus, and B. Baudry, “The design space of lockfiles across package managers,”Empirical Software Engineering, vol. 31, no. 3, p. 63, 2026
2026
-
[12]
Challenges of producing software bill of materials for java,
M. Balliu, B. Baudry, S. Bobadilla, M. Ekstedt, M. Monperrus, J. Ron, A. Sharma, G. Skoglund, C. Soto-Valero, and M. Wittlinger, “Challenges of producing software bill of materials for java,”IEEE Security & Privacy, vol. 21, no. 6, pp. 12–23, 2023
2023
-
[13]
Can automated pull requests encourage software developers to upgrade out-of-date dependencies?,
S. Mirhosseini and C. Parnin, “Can automated pull requests encourage software developers to upgrade out-of-date dependencies?,” in2017 32nd IEEE/ACM International Conference on Automated Software En- gineering (ASE), pp. 84–94, 2017
2017
-
[14]
A com- prehensive study of bloated dependencies in the maven ecosystem,
C. Soto-Valero, N. Harrand, M. Monperrus, and B. Baudry, “A com- prehensive study of bloated dependencies in the maven ecosystem,” Empirical Software Engineering, 2021
2021
-
[15]
Frankenstein: fast and lightweight call graph generation for software builds,
M. Keshani, G. Gousios, and S. Proksch, “Frankenstein: fast and lightweight call graph generation for software builds,”Empirical Soft- ware Engineering, vol. 29, no. 1, p. 1, 2024
2024
-
[16]
On the sound- ness of call graph construction in the presence of dynamic language features - a benchmark and tool evaluation,
L. Sui, J. Dietrich, M. Emery, S. Rasheed, and A. Tahir, “On the sound- ness of call graph construction in the presence of dynamic language features - a benchmark and tool evaluation,” inProgramming Languages and Systems, pp. 69–88, Springer International Publishing, 2018
2018
-
[17]
Call graph soundness in android static analysis,
J. Samhi, R. Just, T. F. Bissyand ´e, M. D. Ernst, and J. Klein, “Call graph soundness in android static analysis,” inThe ACM International Conference on the Foundations of Software Engineering (FSE), ISSTA 2024, p. 945–957, Association for Computing Machinery, 2024
2024
-
[18]
Coverage- based debloating for java bytecode,
C. Soto-Valero, T. Durieux, N. Harrand, and B. Baudry, “Coverage- based debloating for java bytecode,”ACM Transactions on Software Engineering and Methodology, vol. 32, Apr. 2023
2023
-
[19]
Exploiting Library Vulnerability via Migration Based Automating Test Generation,
Z. Chen, X. Hu, X. Xia, Y . Gao, T. Xu, D. Lo, and X. Yang, “Exploiting Library Vulnerability via Migration Based Automating Test Generation,” inProceedings of the International Conference on Software Engineering (ICSE), 2024
2024
-
[20]
ztd J AV A: Mitigating software supply chain vulnerabilities via zero-trust dependencies,
P. Amusuo, K. A. Robinson, T. Singla, H. Peng, A. Machiry, S. Torres- Arias, L. Simon, and J. C. Davis, “ztd J AV A: Mitigating software supply chain vulnerabilities via zero-trust dependencies,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), (Los Alamitos, CA, USA), IEEE Computer Society, May 2025
2025
-
[21]
Sootup: A redesign of the soot static analysis framework,
K. Karakaya, S. Schott, J. Klauke, E. Bodden, M. Schmidt, L. Luo, and D. He, “Sootup: A redesign of the soot static analysis framework,” inTools and Algorithms for the Construction and Analysis of Systems, pp. 229–247, Springer Nature Switzerland, 2024
2024
-
[22]
Spoon: A library for implementing analyses and transformations of java source code,
R. Pawlak, M. Monperrus, N. Petitprez, C. Noguera, and L. Seinturier, “Spoon: A library for implementing analyses and transformations of java source code,”Software: practice & experience, vol. 46, no. 9, p. 1155–1179, 2016
2016
-
[23]
Deepseek-v3.2: Pushing the frontier of open large language models,
DeepSeek-AI, A. Liu, A. Mei, B. Lin, and B. X. et al., “Deepseek-v3.2: Pushing the frontier of open large language models,” 2025
2025
-
[24]
Langchain
LangChain, Inc., “Langchain.” https://github.com/langchain-ai/ langchain, 2023. Accessed: 2025
2023
-
[25]
Langgraph: Stateful, multi-actor applications with llms
LangChain, Inc., “Langgraph: Stateful, multi-actor applications with llms.” https://github.com/langchain-ai/langgraph, 2024. Accessed: 2025
2024
-
[26]
Automatic special- ization of third-party java dependencies,
C. Soto-Valero, D. Tiwari, T. Toady, and B. Baudry, “Automatic special- ization of third-party java dependencies,”IEEE Transactions on Software Engineering, vol. 49, no. 11, pp. 5027–5045, 2023
2023
-
[27]
Automatically Generating Rules of Malicious Software Packages via Large Language Model ,
X. Zhang, X. Du, H. Chen, Y . He, W. Niu, and Q. Li, “ Automatically Generating Rules of Malicious Software Packages via Large Language Model ,” inProceedings of the International Conference on Dependable Systems and Networks (DSN), pp. 734–747, June 2025
2025
-
[28]
Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments,
G. Matute, W. Ni, T. Barik, A. Cheung, and S. E. Chasins, “Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments,”Proceedings of the Confer- ence on Programming Language Design and Implementation (PLDI), pp. 2051–2072, 2024
2051
-
[29]
Reachcheck: Compositional library-aware call graph reachability analysis in the ides,
C. Wang, L. Lin, C. Wang, J. Huang, C. Wu, and R. Wu, “Reachcheck: Compositional library-aware call graph reachability analysis in the ides,” ACM Transactions on Software Engineering and Methodology, Sept. 2025
2025
-
[30]
Do you have 5 min? improving call graph analysis with runtime information,
J. Samhi, M. Miltenberger, M. Alecci, S. Arzt, T. Bissyand ´e, and J. Klein, “Do you have 5 min? improving call graph analysis with runtime information,” inThe ACM International Conference on the Foundations of Software Engineering (FSE), p. 540–544, 2025
2025
-
[31]
Mockingbird: A framework for enabling targeted dynamic analysis of java programs,
D. Lockwood, B. Holland, and S. Kothari, “Mockingbird: A framework for enabling targeted dynamic analysis of java programs,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 39–42, 2019
2019
-
[32]
Gaps: Guiding dynamic android analysis with static path synthesis,
S. Doria and E. Losiouk, “Gaps: Guiding dynamic android analysis with static path synthesis,”arXiv preprint arXiv:2511.23213, 2025
-
[33]
On the effect of transitivity and granularity on vulnerability propagation in the maven ecosystem,
A. M. Mir, M. Keshani, and S. Proksch, “On the effect of transitivity and granularity on vulnerability propagation in the maven ecosystem,” in2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 201–211, 2023
2023
-
[34]
Detection, assessment and mitigation of vulnerabilities in open source dependencies,
S. E. Ponta, H. Plate, and A. Sabetta, “Detection, assessment and mitigation of vulnerabilities in open source dependencies,”Empirical Software Engineering, vol. 25, no. 5, pp. 3175–3215, 2020
2020
-
[35]
Test mimicry to assess the exploitability of library vulnerabilities,
H. J. Kang, T. G. Nguyen, B. Le, C. S. P ˘as˘areanu, and D. Lo, “Test mimicry to assess the exploitability of library vulnerabilities,” in Proceedings of the International Symposium on Software Testing and Analysis, p. 276–288, 2022
2022
-
[36]
Magneto: A step-wise approach to exploit vulnerabilities in dependent libraries via llm-empowered directed fuzzing,
Z. Zhou, Y . Yang, S. Wu, Y . Huang, B. Chen, and X. Peng, “Magneto: A step-wise approach to exploit vulnerabilities in dependent libraries via llm-empowered directed fuzzing,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, p. 1633–1644, Association for Computing Machinery, 2024
2024
-
[37]
Y . Zhao, M. Wu, X. Hu, S. Wang, M. Luo, and X. Xia, “Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing,”arXiv preprint arXiv:2604.04102, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Understanding the impact of apis behavioral breaking changes on client applications,
D. Jayasuriya, V . Terragni, J. Dietrich, and K. Blincoe, “Understanding the impact of apis behavioral breaking changes on client applications,” inProceedings of the International Conference on the Foundations of Software Engineering (FSE), 2024
2024
-
[39]
Towards Supporting Open Source Library Maintainers with Community-Based Analytics,
R. Raj and D. E. Costa, “Towards Supporting Open Source Library Maintainers with Community-Based Analytics,” inProceedings of the International Conference on Software Engineering (ICSE), 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.