Revisiting Code Debloating with Ground Truth-based Evaluation

Ashish Gehani; Fahad Shaon; Fareed Zaffar; Mohit Kumar; Moiz Ali; Muhammad Bilal; Sazzadur Rahaman

arxiv: 2604.17717 · v2 · submitted 2026-04-20 · 💻 cs.SE

Revisiting Code Debloating with Ground Truth-based Evaluation

Muhammad Bilal , Moiz Ali , Mohit Kumar , Fareed Zaffar , Fahad Shaon , Ashish Gehani , Sazzadur Rahaman This is my paper

Pith reviewed 2026-05-10 04:58 UTC · model grok-4.3

classification 💻 cs.SE

keywords code debloatingground truth evaluationdynamic analysisstatic analysisapplication securitycode reductionsoftware maintenance

0 comments

The pith

Ground-truth evaluation shows dynamic debloaters remove needed code while static ones retain excess or add variants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Program debloating removes unused code to shrink attack surfaces and overhead, yet prior assessments used indirect proxies such as test suites or gadget counts. This paper applies direct ground-truth comparison, where the exact code required for correct behavior is known in advance, to eight representative tools spanning source, intermediate-representation, and binary transformations. The results indicate that dynamic-analysis tools discard up to 94 percent of code that must be kept, while static-analysis tools over-approximate dependencies, retain unnecessary portions, and sometimes emit specialized function copies. These mismatches produce functional failures, inconsistent behavior, and new vulnerabilities rather than the intended security gains.

Core claim

Using ground truth that precisely identifies which code must be retained or removed, dynamic-analysis debloaters eliminate up to 94 percent of necessary code, static-analysis debloaters exhibit high false-retention rates from coarse dependency over-approximation, and some static passes introduce additional specialized function variants; the resulting programs therefore suffer functional incorrectness, systematic inconsistency, robustness failures, and exploitable vulnerabilities.

What carries the argument

Ground-truth-based evaluation paradigm that directly measures retained versus required code across eight tools and three transformation levels.

If this is right

Incorrect removals produce programs that no longer behave as originally intended.
False retentions leave attack surfaces larger than the debloated size suggests.
Added specialized function variants create new potential entry points for exploits.
Imprecise debloating can introduce systematic inconsistencies and robustness failures across program executions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid dynamic-static techniques may reduce the complementary error patterns observed in the two families of tools.
Ground-truth benchmarks could replace size-reduction or gadget-count metrics as the primary correctness check in future debloating studies.
Production use of debloated binaries would benefit from independent verification against full behavioral specifications rather than tool output alone.

Load-bearing premise

The ground truth constructed for each program correctly captures all intended behaviors and true code dependencies.

What would settle it

A program whose complete input space and exact dependency set are known, for which one of the evaluated debloaters produces a binary that preserves exactly the ground-truth code and passes every possible input.

Figures

Figures reproduced from arXiv: 2604.17717 by Ashish Gehani, Fahad Shaon, Fareed Zaffar, Mohit Kumar, Moiz Ali, Muhammad Bilal, Sazzadur Rahaman.

read the original abstract

Program debloating aims to remove unused code to reduce performance overhead, attack surfaces, and maintenance costs. Over time, debloating has evolved across multiple layers (container, library, and application), each building on the principles of application-level debloating. Despite its central role, application-level debloating continues to rely on imperfect proxies for measuring performance, such as test-case-driven evaluation for correctness, code size for runtime efficiency, and gadget count reduction for estimating security posture. While there is widespread skepticism about using such imperfect proxies, the community still lacks standardized methodologies or benchmarks to assess the true performance of application-level software debloating. This experience paper aims to address the gap. We revisit the foundations of application-level debloating through a ground-truth-based evaluation paradigm. Our analysis of eight state-of-the-art debloaters - Blade, Chisel, Cov, CovA, Lmcas, Trimmer, Occam, and Razor - uncovers insights previously unattainable through traditional evaluations. These tools collectively span the spectrum of source-to-source, IR-to-IR, and binary-to-binary transformation paradigms, characterizing a holistic reassessment across abstraction levels. Our analysis reveals that while dynamic analysis-based tools often remove up to 94% of code that should be retained, static analysis-based approaches exhibit the opposite behavior, showing high false retention rates due to coarse-grained dependency over-approximation. Additionally, static analyses may add code by introducing specialized variants of functions. False retentions and removals not only cause functional incorrectness but may also lead to systematic inconsistency, robustness failures, and exploitable vulnerabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is a ground-truth comparison of eight debloating tools that quantifies how dynamic ones over-remove needed code and static ones over-retain or add variants, but the 94% figure stands or falls on how that ground truth was actually built.

read the letter

The main thing to know is that this experience paper moves beyond the usual proxy metrics like test-case pass rates or gadget counts and instead measures eight tools directly against ground truth. It reports that dynamic-analysis debloaters (Blade, Chisel, etc.) can drop up to 94% of code that should have stayed, while static ones keep too much due to coarse over-approximation and sometimes insert specialized function variants. That contrast across source-to-source, IR, and binary levels is new and gives a clearer picture of where current tools fail in practice, including risks of functional bugs and vulnerabilities.

Referee Report

1 major / 0 minor

Summary. The manuscript is an experience paper that revisits application-level code debloating evaluation. It analyzes eight tools (Blade, Chisel, Cov, CovA, Lmcas, Trimmer, Occam, Razor) spanning source-to-source, IR-to-IR, and binary-to-binary paradigms using a ground-truth-based approach rather than traditional proxies such as test-case coverage, code size, or gadget counts. The central findings are that dynamic-analysis tools remove up to 94% of code that should be retained, while static-analysis tools exhibit high false-retention rates from coarse dependency over-approximation and can even introduce additional code via specialized function variants, potentially causing functional incorrectness, robustness issues, and vulnerabilities.

Significance. If the ground-truth construction proves robust, independent, and reproducible, the work could meaningfully advance the debloating field by replacing imperfect evaluation proxies with a more reliable paradigm. It provides concrete evidence of systematic weaknesses in both dynamic and static debloaters that prior proxy-based studies could not surface, potentially guiding future tool development toward better preservation of intended behavior.

major comments (1)

Abstract: The claim that dynamic tools remove up to 94% of code that should be retained (and the contrasting static-tool behavior) is load-bearing for the entire dynamic-vs-static contrast and the paper's call for better evaluation. However, no details are provided on how the ground-truth set of 'code that should be retained' was constructed, whether it was derived independently of the tools' own dynamic or static approximations, or how it ensures coverage of all intended behaviors without test-case bias. This directly affects the validity of the reported rates.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify key aspects of our ground-truth evaluation approach. We address the major comment below and will revise the manuscript to improve transparency.

read point-by-point responses

Referee: Abstract: The claim that dynamic tools remove up to 94% of code that should be retained (and the contrasting static-tool behavior) is load-bearing for the entire dynamic-vs-static contrast and the paper's call for better evaluation. However, no details are provided on how the ground-truth set of 'code that should be retained' was constructed, whether it was derived independently of the tools' own dynamic or static approximations, or how it ensures coverage of all intended behaviors without test-case bias. This directly affects the validity of the reported rates.

Authors: We agree that the abstract would benefit from a concise description of the ground-truth methodology to better support the central claims. The construction process is detailed in Section 3 of the manuscript: the ground-truth set is built independently by first executing each original (undeblated) program under a broad input corpus that includes the standard test suites plus additional inputs generated to exercise all documented features, public APIs, and edge cases derived from program specifications. Dynamic tracing records all executed code, while static reachability analysis identifies any additional code that could be required for intended behavior. This process occurs prior to and without reference to any debloater's internal approximations, ensuring independence. We acknowledge that relying on inputs introduces some potential for incomplete coverage, but the extended corpus is designed to minimize test-case bias beyond what prior proxy-based studies use. We will revise the abstract to incorporate a brief clause summarizing this independent construction, e.g., 'Ground truth is established independently via dynamic tracing and static reachability on the original programs to identify all code necessary for intended functionality.' revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical tool evaluation

full rationale

The paper is an experience report that evaluates eight external debloating tools (Blade, Chisel, Cov, CovA, Lmcas, Trimmer, Occam, Razor) by comparing their outputs to a separately constructed ground-truth set of necessary code. No derivation chain, equations, fitted parameters, or self-citations are presented as load-bearing; the central claims rest on direct observational contrasts (dynamic tools removing up to 94% of ground-truth code, static tools showing false retention) against an external benchmark. This structure is self-contained against the tools and ground truth and contains none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the accuracy of ground-truth labels for code retention and the assumption that the chosen tools and programs are representative.

axioms (1)

domain assumption Ground truth for which code should be retained or removed can be reliably established for the evaluated programs.
The entire evaluation paradigm rests on the existence and correctness of this ground truth.

pith-pipeline@v0.9.0 · 5606 in / 1126 out tokens · 40787 ms · 2026-05-10T04:58:09.471405+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications,

G. Xu, N. Mitchell, M. Arnold, A. Rountev, and G. Sevitsky, “Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications,” inProceedings of the Workshop on 10 FutureofSoftwareEngineeringResearch,FoSER2010,atthe18thACMSIGSOFT International Symposium on Foundations of Software Engineerin...

work page 2010
[2]

Areweallinthesame

J.McGrenereandG.Moore,“Areweallinthesame"bloat"?”inProceedingsofthe GraphicsInterface2000Conference,May15-17,2000,Montréal,Québec,Canada, S. S. Fels and P. Poulin, Eds. Canadian Human-Computer Communications Society, 2000, pp. 187–196

work page 2000
[3]

Less is more: Quantifying the security benefits of debloating web applications,

B. A. Azad, P. Laperdrix, and N. Nikiforakis, “Less is more: Quantifying the security benefits of debloating web applications,” in28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, N. Heninger and P. Traynor, Eds. USENIX Association, 2019, pp. 1697–1714

work page 2019
[4]

The interplay of software bloat, hardware energy proportionality and system bottlenecks,

S. Bhattacharya, K. Rajamani, K. Gopinath, and M. Gupta, “The interplay of software bloat, hardware energy proportionality and system bottlenecks,” in Proceedings of the 4th Workshop on Power-Aware Computing and Systems, ser. HotPower ’11. New York, NY, USA: Association for Computing Machinery,

work page
[5]

Available: https://doi.org/10.1145/2039252.2039253

[Online]. Available: https://doi.org/10.1145/2039252.2039253

work page doi:10.1145/2039252.2039253
[6]

Automated software winnowing,

G. Malecha, A. Gehani, and N. Shankar, “Automated software winnowing,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 1504–1511. [Online]. Available: https://doi.org/10.1145/2695664.2695751

work page doi:10.1145/2695664.2695751 2015
[7]

Go with the flow: profiling copies to find runtime bloat,

G. Xu, M. Arnold, N. Mitchell, A. Rountev, and G. Sevitsky, “Go with the flow: profiling copies to find runtime bloat,” inProceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 419–430. [Online]. Available: https://doi.org/10.1145/154...

work page doi:10.1145/1542476.1542523 2009
[8]

Effective program debloating via reinforcement learning,

K. Heo, W. Lee, P. Pashakhanloo, and M. Naik, “Effective program debloating via reinforcement learning,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, D. Lie, M. Mannan, M. Backes, and X. Wang, Eds. ACM, 2018, pp. 380–394

work page 2018
[9]

RAZOR: A framework for post-deployment software debloating,

C. Qian, H. Hu, M. Alharthi, S. P. H. Chung, T. Kim, and W. Lee, “RAZOR: A framework for post-deployment software debloating,” in28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, N. Heninger and P. Traynor, Eds. USENIX Association, 2019, pp. 1733–1750. [Online]. Available: https://www.usenix.org/conference/use...

work page 2019
[10]

Lightweight, multi-stage, compiler-assisted application specialization,

M. Alhanahnah, R. Jain, V. Rastogi, S. Jha, and T. W. Reps, “Lightweight, multi-stage, compiler-assisted application specialization,” in7th IEEE European Symposium on Security and Privacy, EuroS&P 2022, Genoa, Italy, June 6-10,

work page 2022
[11]

IEEE, 2022, pp. 251–269

work page 2022
[12]

Debloating software through piece-wise compilation and loading,

A. Quach, A. Prakash, and L. Yan, “Debloating software through piece-wise compilation and loading,” inUSENIX Security, 2018, pp. 869–886

work page 2018
[13]

In: 2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp

S.MishraandM.Polychronakis,“Saffire:Context-sensitivefunctionspecialization against code reuse attacks,” inIEEE European Symposium on Security and Privacy, EuroS&P 2020, Genoa, Italy, September 7-11, 2020. IEEE, 2020, pp. 17–33. [Online]. Available: https://doi.org/10.1109/EuroSP48549.2020.00010

work page doi:10.1109/eurosp48549.2020.00010 2020
[14]

Dockerslim,

“Dockerslim,” https://github.com/slimtoolkit/slim

work page
[15]

In: Proceedings of the 2017 11th Joint Meeting on Foundations of Soft- ware Engineering

V. Rastogi, D. Davidson, L. D. Carli, S. Jha, and P. D. McDaniel, “Cimplifier: automatically debloating containers,” inProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, E. Bodden, W. Schäfer, A. van Deursen, and A. Zisman, Eds. ACM, 2017, pp. 476–486. [Online]. Availa...

work page doi:10.1145/3106237.3106271 2017
[16]

Confine: Auto- mated system call policy generation for container attack surface reduction,

S. Ghavamnia, T. Palit, A. Benameur, and M. Polychronakis, “Confine: Auto- mated system call policy generation for container attack surface reduction,” in International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020

work page 2020
[17]

Speaker: Split-phase execution of application containers,

L. Lei, J. Sun, K. Sun, C. Shenefiel, R. Ma, Y. Wang, and Q. Li, “Speaker: Split-phase execution of application containers,” inDetection of Intrusions and Malware, and Vulnerability Assessment: 14th International Conference, DIMVA 2017, Bonn, Germany, July 6-7, 2017, Proceedings 14. Springer, 2017, pp. 230–251

work page 2017
[18]

Sok: A tale of reduction, security, and correctness-evaluating program debloating paradigms and their compositions,

M. Ali, M. Muzammil, F. Karim, A. Naeem, R. Haroon, M. Haris, H. Nadeem, W. Sabir, F. Shaon, F. Zaffaret al., “Sok: A tale of reduction, security, and correctness-evaluating program debloating paradigms and their compositions,” in European Symposium on Research in Computer Security. Springer, 2023, pp. 229–249

work page 2023
[19]

A broad comparative evaluation of software debloating tools,

M.D.Brown,A.Meily,B.Fairservice,A.Sood,J.Dorn,E.Kilmer,andR.Eytchi- son, “A broad comparative evaluation of software debloating tools,” in33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024, D. Balzarotti and W. Xu, Eds. USENIX Association, 2024

work page 2024
[20]

A broad comparative evaluation of software debloating tools,

——, “A broad comparative evaluation of software debloating tools,” in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, Aug. 2024, pp. 3927–3943. [Online]. Available: https://www.usenix.org/conference/usenixsecurity24/presentation/brown

work page 2024
[21]

BLADE: towards scalablesourcecodedebloating,

M. Ali, R. Habib, A. Gehani, S. Rahaman, and Z. A. Uzmi, “BLADE: towards scalablesourcecodedebloating,”inIEEESecureDevelopmentConference,SecDev 2023, Atlanta, GA, USA, October 18-20, 2023. IEEE, 2023, pp. 75–87

work page 2023
[22]

Studyingandunderstandingthetradeoffsbetween generality and reduction in software debloating,

Q.Xin,Q.Zhang,andA.Orso,“Studyingandunderstandingthetradeoffsbetween generality and reduction in software debloating,” in37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022. ACM, 2022, pp. 99:1–99:13

work page 2022
[23]

Software-artifact infrastructurerepository,

“Software-artifact infrastructurerepository,” https://sir.csc.ncsu.edu/portal/index. php, accessed 2025-01-22

work page 2025
[24]

A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

J. Cohen, “A coefficient of agreement for nominal scales,”Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960. [Online]. Available: https://doi.org/10.1177/001316446002000104

work page doi:10.1177/001316446002000104 1960
[25]

Test-case reduction for C compiler bugs,

J.Regehr,Y.Chen,P.Cuoq,E.Eide,C.Ellison,andX.Yang,“Test-casereduction for C compiler bugs,” inACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, J. Vitek, H. Lin, and F. Tip, Eds. ACM, 2012, pp. 335–346. [Online]. Available: https://doi.org/10.1145/2254064.2254104

work page doi:10.1145/2254064.2254104 2012
[26]

Perses: syntax-guided program reduction,

C. Sun, Y. Li, Q. Zhang, T. Gu, and Z. Su, “Perses: syntax-guided program reduction,” inProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, M. Chaudron, I. Crnkovic, M. Chechik, and M. Harman, Eds. ACM, 2018, pp. 361–371. [Online]. Available: https://doi.org/10.1145/3180155.3180236

work page doi:10.1145/3180155.3180236 2018
[27]

Subdomain-based generality-aware debloating,

Q. Xin, M. Kim, Q. Zhang, and A. Orso, “Subdomain-based generality-aware debloating,” in35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 2020, pp. 224–236. [Online]. Available: https://doi.org/10.1145/3324884.3416644

work page doi:10.1145/3324884.3416644 2020
[28]

Program debloating via stochastic optimization,

——, “Program debloating via stochastic optimization,” inICSE-NIER 2020: 42nd International Conference on Software Engineering, New Ideas and Emerging Results, Seoul, South Korea, 27 June - 19 July, 2020, G. Rothermel and D. Bae, Eds. ACM, 2020, pp. 65–68. [Online]. Available: https://doi.org/10.1145/3377816.3381739

work page doi:10.1145/3377816.3381739 2020
[29]

TRIMMER:applicationspecial- ization for code debloating,

H.Sharif,M.Abubakar,A.Gehani,andF.Zaffar,“TRIMMER:applicationspecial- ization for code debloating,” inProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, 2018, pp. 329–339

work page 2018
[30]

Code specialization through dynamic feature observation,

P. Biswas, N. Burow, and M. Payer, “Code specialization through dynamic feature observation,” inCODASPY ’21: Eleventh ACM Conference on Data and Application Security and Privacy, Virtual Event, USA, April 26-28, 2021, A. Joshi, B. Carminati, and R. M. Verma, Eds. ACM, 2021, pp. 257–268. [Online]. Available: https://doi.org/10.1145/3422337.3447844

work page doi:10.1145/3422337.3447844 2021
[31]

Evaluating container debloaters,

M. Hassan, T. Tahir, M. Farrukh, A. Naveed, A. Naeem, F. Zaffar, F. Shaon, A. Gehani, and S. Rahaman, “Evaluating container debloaters,” inIEEE Secure Development Conference, SecDev 2023, Atlanta, GA, USA, October 18-20, 2023. IEEE, 2023, pp. 88–98

work page 2023
[32]

Sok:Softwaredebloatinglandscape and future directions,

M.Alhanahnah,Y.Boshmaf,andA.Gehani,“Sok:Softwaredebloatinglandscape and future directions,” inProceedings of the 2024 Workshop on Forming an Ecosystem Around Software Transformation, FEAST 2024, Salt Lake City, UT, USA, October 14-18, 2024, R. Craven and M. S. Mickelson, Eds. ACM, 2024, pp. 11–18. [Online]. Available: https://doi.org/10.1145/3689937.3695792 11

work page doi:10.1145/3689937.3695792 2024

[1] [1]

Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications,

G. Xu, N. Mitchell, M. Arnold, A. Rountev, and G. Sevitsky, “Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications,” inProceedings of the Workshop on 10 FutureofSoftwareEngineeringResearch,FoSER2010,atthe18thACMSIGSOFT International Symposium on Foundations of Software Engineerin...

work page 2010

[2] [2]

Areweallinthesame

J.McGrenereandG.Moore,“Areweallinthesame"bloat"?”inProceedingsofthe GraphicsInterface2000Conference,May15-17,2000,Montréal,Québec,Canada, S. S. Fels and P. Poulin, Eds. Canadian Human-Computer Communications Society, 2000, pp. 187–196

work page 2000

[3] [3]

Less is more: Quantifying the security benefits of debloating web applications,

B. A. Azad, P. Laperdrix, and N. Nikiforakis, “Less is more: Quantifying the security benefits of debloating web applications,” in28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, N. Heninger and P. Traynor, Eds. USENIX Association, 2019, pp. 1697–1714

work page 2019

[4] [4]

The interplay of software bloat, hardware energy proportionality and system bottlenecks,

S. Bhattacharya, K. Rajamani, K. Gopinath, and M. Gupta, “The interplay of software bloat, hardware energy proportionality and system bottlenecks,” in Proceedings of the 4th Workshop on Power-Aware Computing and Systems, ser. HotPower ’11. New York, NY, USA: Association for Computing Machinery,

work page

[5] [5]

Available: https://doi.org/10.1145/2039252.2039253

[Online]. Available: https://doi.org/10.1145/2039252.2039253

work page doi:10.1145/2039252.2039253

[6] [6]

Automated software winnowing,

G. Malecha, A. Gehani, and N. Shankar, “Automated software winnowing,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 1504–1511. [Online]. Available: https://doi.org/10.1145/2695664.2695751

work page doi:10.1145/2695664.2695751 2015

[7] [7]

Go with the flow: profiling copies to find runtime bloat,

G. Xu, M. Arnold, N. Mitchell, A. Rountev, and G. Sevitsky, “Go with the flow: profiling copies to find runtime bloat,” inProceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 419–430. [Online]. Available: https://doi.org/10.1145/154...

work page doi:10.1145/1542476.1542523 2009

[8] [8]

Effective program debloating via reinforcement learning,

K. Heo, W. Lee, P. Pashakhanloo, and M. Naik, “Effective program debloating via reinforcement learning,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, D. Lie, M. Mannan, M. Backes, and X. Wang, Eds. ACM, 2018, pp. 380–394

work page 2018

[9] [9]

RAZOR: A framework for post-deployment software debloating,

C. Qian, H. Hu, M. Alharthi, S. P. H. Chung, T. Kim, and W. Lee, “RAZOR: A framework for post-deployment software debloating,” in28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, N. Heninger and P. Traynor, Eds. USENIX Association, 2019, pp. 1733–1750. [Online]. Available: https://www.usenix.org/conference/use...

work page 2019

[10] [10]

Lightweight, multi-stage, compiler-assisted application specialization,

M. Alhanahnah, R. Jain, V. Rastogi, S. Jha, and T. W. Reps, “Lightweight, multi-stage, compiler-assisted application specialization,” in7th IEEE European Symposium on Security and Privacy, EuroS&P 2022, Genoa, Italy, June 6-10,

work page 2022

[11] [11]

IEEE, 2022, pp. 251–269

work page 2022

[12] [12]

Debloating software through piece-wise compilation and loading,

A. Quach, A. Prakash, and L. Yan, “Debloating software through piece-wise compilation and loading,” inUSENIX Security, 2018, pp. 869–886

work page 2018

[13] [13]

In: 2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp

S.MishraandM.Polychronakis,“Saffire:Context-sensitivefunctionspecialization against code reuse attacks,” inIEEE European Symposium on Security and Privacy, EuroS&P 2020, Genoa, Italy, September 7-11, 2020. IEEE, 2020, pp. 17–33. [Online]. Available: https://doi.org/10.1109/EuroSP48549.2020.00010

work page doi:10.1109/eurosp48549.2020.00010 2020

[14] [14]

Dockerslim,

“Dockerslim,” https://github.com/slimtoolkit/slim

work page

[15] [15]

In: Proceedings of the 2017 11th Joint Meeting on Foundations of Soft- ware Engineering

V. Rastogi, D. Davidson, L. D. Carli, S. Jha, and P. D. McDaniel, “Cimplifier: automatically debloating containers,” inProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, E. Bodden, W. Schäfer, A. van Deursen, and A. Zisman, Eds. ACM, 2017, pp. 476–486. [Online]. Availa...

work page doi:10.1145/3106237.3106271 2017

[16] [16]

Confine: Auto- mated system call policy generation for container attack surface reduction,

S. Ghavamnia, T. Palit, A. Benameur, and M. Polychronakis, “Confine: Auto- mated system call policy generation for container attack surface reduction,” in International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020

work page 2020

[17] [17]

Speaker: Split-phase execution of application containers,

L. Lei, J. Sun, K. Sun, C. Shenefiel, R. Ma, Y. Wang, and Q. Li, “Speaker: Split-phase execution of application containers,” inDetection of Intrusions and Malware, and Vulnerability Assessment: 14th International Conference, DIMVA 2017, Bonn, Germany, July 6-7, 2017, Proceedings 14. Springer, 2017, pp. 230–251

work page 2017

[18] [18]

Sok: A tale of reduction, security, and correctness-evaluating program debloating paradigms and their compositions,

M. Ali, M. Muzammil, F. Karim, A. Naeem, R. Haroon, M. Haris, H. Nadeem, W. Sabir, F. Shaon, F. Zaffaret al., “Sok: A tale of reduction, security, and correctness-evaluating program debloating paradigms and their compositions,” in European Symposium on Research in Computer Security. Springer, 2023, pp. 229–249

work page 2023

[19] [19]

A broad comparative evaluation of software debloating tools,

M.D.Brown,A.Meily,B.Fairservice,A.Sood,J.Dorn,E.Kilmer,andR.Eytchi- son, “A broad comparative evaluation of software debloating tools,” in33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024, D. Balzarotti and W. Xu, Eds. USENIX Association, 2024

work page 2024

[20] [20]

A broad comparative evaluation of software debloating tools,

——, “A broad comparative evaluation of software debloating tools,” in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, Aug. 2024, pp. 3927–3943. [Online]. Available: https://www.usenix.org/conference/usenixsecurity24/presentation/brown

work page 2024

[21] [21]

BLADE: towards scalablesourcecodedebloating,

M. Ali, R. Habib, A. Gehani, S. Rahaman, and Z. A. Uzmi, “BLADE: towards scalablesourcecodedebloating,”inIEEESecureDevelopmentConference,SecDev 2023, Atlanta, GA, USA, October 18-20, 2023. IEEE, 2023, pp. 75–87

work page 2023

[22] [22]

Studyingandunderstandingthetradeoffsbetween generality and reduction in software debloating,

Q.Xin,Q.Zhang,andA.Orso,“Studyingandunderstandingthetradeoffsbetween generality and reduction in software debloating,” in37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022. ACM, 2022, pp. 99:1–99:13

work page 2022

[23] [23]

Software-artifact infrastructurerepository,

“Software-artifact infrastructurerepository,” https://sir.csc.ncsu.edu/portal/index. php, accessed 2025-01-22

work page 2025

[24] [24]

A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

J. Cohen, “A coefficient of agreement for nominal scales,”Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960. [Online]. Available: https://doi.org/10.1177/001316446002000104

work page doi:10.1177/001316446002000104 1960

[25] [25]

Test-case reduction for C compiler bugs,

J.Regehr,Y.Chen,P.Cuoq,E.Eide,C.Ellison,andX.Yang,“Test-casereduction for C compiler bugs,” inACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, J. Vitek, H. Lin, and F. Tip, Eds. ACM, 2012, pp. 335–346. [Online]. Available: https://doi.org/10.1145/2254064.2254104

work page doi:10.1145/2254064.2254104 2012

[26] [26]

Perses: syntax-guided program reduction,

C. Sun, Y. Li, Q. Zhang, T. Gu, and Z. Su, “Perses: syntax-guided program reduction,” inProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, M. Chaudron, I. Crnkovic, M. Chechik, and M. Harman, Eds. ACM, 2018, pp. 361–371. [Online]. Available: https://doi.org/10.1145/3180155.3180236

work page doi:10.1145/3180155.3180236 2018

[27] [27]

Subdomain-based generality-aware debloating,

Q. Xin, M. Kim, Q. Zhang, and A. Orso, “Subdomain-based generality-aware debloating,” in35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 2020, pp. 224–236. [Online]. Available: https://doi.org/10.1145/3324884.3416644

work page doi:10.1145/3324884.3416644 2020

[28] [28]

Program debloating via stochastic optimization,

——, “Program debloating via stochastic optimization,” inICSE-NIER 2020: 42nd International Conference on Software Engineering, New Ideas and Emerging Results, Seoul, South Korea, 27 June - 19 July, 2020, G. Rothermel and D. Bae, Eds. ACM, 2020, pp. 65–68. [Online]. Available: https://doi.org/10.1145/3377816.3381739

work page doi:10.1145/3377816.3381739 2020

[29] [29]

TRIMMER:applicationspecial- ization for code debloating,

H.Sharif,M.Abubakar,A.Gehani,andF.Zaffar,“TRIMMER:applicationspecial- ization for code debloating,” inProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, 2018, pp. 329–339

work page 2018

[30] [30]

Code specialization through dynamic feature observation,

P. Biswas, N. Burow, and M. Payer, “Code specialization through dynamic feature observation,” inCODASPY ’21: Eleventh ACM Conference on Data and Application Security and Privacy, Virtual Event, USA, April 26-28, 2021, A. Joshi, B. Carminati, and R. M. Verma, Eds. ACM, 2021, pp. 257–268. [Online]. Available: https://doi.org/10.1145/3422337.3447844

work page doi:10.1145/3422337.3447844 2021

[31] [31]

Evaluating container debloaters,

M. Hassan, T. Tahir, M. Farrukh, A. Naveed, A. Naeem, F. Zaffar, F. Shaon, A. Gehani, and S. Rahaman, “Evaluating container debloaters,” inIEEE Secure Development Conference, SecDev 2023, Atlanta, GA, USA, October 18-20, 2023. IEEE, 2023, pp. 88–98

work page 2023

[32] [32]

Sok:Softwaredebloatinglandscape and future directions,

M.Alhanahnah,Y.Boshmaf,andA.Gehani,“Sok:Softwaredebloatinglandscape and future directions,” inProceedings of the 2024 Workshop on Forming an Ecosystem Around Software Transformation, FEAST 2024, Salt Lake City, UT, USA, October 14-18, 2024, R. Craven and M. S. Mickelson, Eds. ACM, 2024, pp. 11–18. [Online]. Available: https://doi.org/10.1145/3689937.3695792 11

work page doi:10.1145/3689937.3695792 2024