On Interaction Effects in Greybox Fuzzing

Alberto Bacchelli; Konstantinos Kitsios; Marcel B\"ohme

arxiv: 2510.19984 · v1 · submitted 2025-10-22 · 💻 cs.SE

On Interaction Effects in Greybox Fuzzing

Konstantinos Kitsios , Marcel B\"ohme , Alberto Bacchelli This is my paper

Pith reviewed 2026-05-18 04:17 UTC · model grok-4.3

classification 💻 cs.SE

keywords greybox fuzzingmutator interactionsinteraction effectssoftware testingcode coveragebug detectionfuzz testingmutation sequences

0 comments

The pith

Mutator order matters in greybox fuzzing because pairwise interactions affect coverage gains, and learning conditional probabilities to pick sequences improves results over fixed or independent choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether the sequence of mutators applied to a seed input changes how much new code a greybox fuzzer covers. The authors collect data on every possible pair of mutators, fit a linear model, and observe clear interaction effects where the success of one mutator depends on the one applied before it. They then build MuoFuzz, which estimates the conditional probability that the next mutator will produce an interesting input and draws full sequences from this model using a random walk. If the approach holds, fuzzers that currently pick mutators with fixed probabilities or optimize each one alone can be outperformed by explicitly accounting for these ordered dependencies.

Core claim

Greybox fuzzers become more effective when they learn and exploit interaction effects among mutators by estimating the conditional probability that a given mutator will yield coverage gains after the previous one, then sampling promising sequences via random walk rather than using fixed selection probabilities or independent optimization.

What carries the argument

A conditional probability model over ordered mutator pairs, sampled by random walk to produce full mutation sequences for seed inputs.

If this is right

MuoFuzz records the highest code coverage on both the FuzzBench and MAGMA benchmark suites.
MuoFuzz discovers four bugs that AFL++ misses and one bug missed by both AFL++ and MOPT.
Choosing mutator sequences according to learned conditional probabilities is more efficient than fixed-probability selection or per-mutator optimization alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pairwise-interaction modeling could be applied to other mutation-driven search processes such as evolutionary test generation or program repair.
If the learned probabilities shift as the corpus grows, periodic re-estimation during a campaign might further increase gains.
The linear-model evidence for interactions suggests that similar dependencies may exist between other fuzzing decisions, such as seed scheduling or energy assignment.

Load-bearing premise

Interaction effects measured from short, isolated mutator-pair trials on a small set of seeds will still guide better choices when the model runs inside full-length fuzzing campaigns on many different programs.

What would settle it

Running MuoFuzz, AFL++, and MOPT head-to-head on the FuzzBench and MAGMA suites and finding that MuoFuzz does not produce the highest coverage or the unique bugs reported would show the learned sequences add no practical gain.

Figures

Figures reproduced from arXiv: 2510.19984 by Alberto Bacchelli, Konstantinos Kitsios, Marcel B\"ohme.

**Figure 2.** Figure 2: High-level overview of MuoFuzz. The division with the sum turns each row into a probability distribution that sums to 1. The distribution for the freetype program is shown on the right side of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

A greybox fuzzer is an automated software testing tool that generates new test inputs by applying randomly chosen mutators (e.g., flipping a bit or deleting a block of bytes) to a seed input in random order and adds all coverage-increasing inputs to the corpus of seeds. We hypothesize that the order in which mutators are applied to a seed input has an impact on the effectiveness of greybox fuzzers. In our experiments, we fit a linear model to a dataset that contains the effectiveness of all possible mutator pairs and indeed observe the conjectured interaction effect. This points us to more efficient fuzzing by choosing the most promising mutator sequence with a higher likelihood. We propose MuoFuzz, a greybox fuzzer that learns and chooses the most promising mutator sequences. MuoFuzz learns the conditional probability that the next mutator will yield an interesting input, given the previously selected mutator. Then, it samples from the learned probability using a random walk to generate mutator sequences. We compare the performance of MuoFuzz to AFL++, which uses a fixed selection probability, and MOPT, which optimizes the selection probability of each mutator in isolation. Experimental results on the FuzzBench and MAGMA benchmarks show that MuoFuzz achieves the highest code coverage and finds four bugs missed by AFL++ and one missed by both AFL++ and MOPT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds interaction effects among mutators via pair-wise data and builds MuoFuzz to sample conditional probabilities, but the jump from those measurements to gains in long campaigns is not yet tightly shown.

read the letter

The main thing to know is that this work treats mutator order as dependent rather than independent and gives a practical way to learn and use those dependencies in a greybox fuzzer. They collect effectiveness numbers for every mutator pair on limited seeds, fit a linear model, and see clear interaction terms. That observation leads to MuoFuzz, which estimates the conditional probability that the next mutator will produce new coverage given the previous one and then draws sequences with a random walk. The approach sits between AFL++'s static probabilities and MOPT's per-mutator tuning, and the benchmarks show higher coverage plus a few extra bugs on FuzzBench and MAGMA. That is the concrete advance and the part that is worth looking at closely. The linear model is only used to motivate the hypothesis, not to drive the final numbers, so the performance claims rest on the end-to-end experiments. The soft spot is exactly the one the stress-test note flags: the pair-wise measurements come from isolated runs on restricted seeds, while real fuzzing runs evolve large, changing corpora over long periods. Nothing in the abstract or the reported results directly checks whether the learned conditionals stay predictive once the corpus diversifies or whether the observed gains could be explained by the random-walk sampling mechanics alone. Run counts, variance estimates, and online update details are also light. This paper is for people who build or tune coverage-guided fuzzers and want a new knob for mutator selection. A reader who already works on AFL++ variants or MOPT-style optimizers will get the most out of the comparison. It is coherent enough and has enough empirical grounding to deserve a serious referee, even if the revision will need tighter statistical controls and some ablation on the source of the gains. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper hypothesizes that the sequence in which mutators are applied affects greybox fuzzer effectiveness due to interaction effects. It fits a linear model to pairwise mutator effectiveness data collected on limited seeds and reports observing such interactions. It then proposes MuoFuzz, which learns conditional probabilities P(next mutator is interesting | previous mutator) from coverage data and samples mutator sequences via random walk. On FuzzBench and MAGMA, MuoFuzz reports higher coverage than AFL++ (fixed probabilities) and MOPT (independent optimization) and discovers bugs missed by the baselines.

Significance. If the results hold, the work supplies empirical support for the value of modeling mutator interactions rather than treating selection probabilities as independent. The evaluation on standard, reproducible benchmarks (FuzzBench, MAGMA) is a strength and enables direct comparison with prior fuzzers. The approach of learning conditionals from observed coverage outcomes is falsifiable and grounded in data rather than hand-tuned parameters.

major comments (2)

[Evaluation] The pairwise interaction measurements are performed on restricted seed sets in isolation, yet the central performance claims rest on MuoFuzz using the resulting conditional probabilities inside full, long-running campaigns on evolving corpora. No ablation or transfer experiment is reported that checks whether the learned conditionals remain predictive once the corpus diversifies and runtimes extend, leaving open whether observed gains are due to interaction modeling or other adaptive mechanics.
[MuoFuzz Design] The manuscript does not state whether the conditional probability table is computed once from an initial dataset or is updated online as new coverage-increasing inputs are discovered during a campaign. This detail is load-bearing for the claim that interaction effects observed in the linear model translate to sustained gains in the deployed fuzzer.

minor comments (2)

[Abstract] The abstract states that MuoFuzz finds four bugs missed by AFL++ and one missed by both AFL++ and MOPT, but the results section should list the specific bugs, the programs on which they were found, and the exact run counts used for each comparison.
[Experiments] Statistical controls (number of independent runs, variance reporting, or significance tests) for the coverage curves are not described in sufficient detail to allow readers to judge the robustness of the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our evaluation and design that we address point by point below. We provide the strongest honest clarifications supported by the manuscript while indicating revisions where appropriate.

read point-by-point responses

Referee: [Evaluation] The pairwise interaction measurements are performed on restricted seed sets in isolation, yet the central performance claims rest on MuoFuzz using the resulting conditional probabilities inside full, long-running campaigns on evolving corpora. No ablation or transfer experiment is reported that checks whether the learned conditionals remain predictive once the corpus diversifies and runtimes extend, leaving open whether observed gains are due to interaction modeling or other adaptive mechanics.

Authors: The pairwise measurements on restricted seed sets were intentionally performed in isolation to establish the existence of interaction effects via the linear model, as stated in the manuscript. The performance claims for MuoFuzz are evaluated exclusively through full, long-running campaigns on the FuzzBench and MAGMA benchmarks, where the learned conditional probabilities are applied directly to evolving corpora over extended runtimes. These experiments demonstrate higher coverage and additional unique bugs compared to AFL++ (fixed probabilities) and MOPT (independent optimization). While we did not report a dedicated ablation isolating the transferability of the conditionals, the end-to-end results on standard benchmarks already reflect usage in realistic, diversifying settings and support that interaction modeling contributes to the observed gains beyond other adaptive mechanics. We will add a clarifying paragraph in the Evaluation section discussing the applicability of the pre-learned conditionals during long campaigns. revision: partial
Referee: [MuoFuzz Design] The manuscript does not state whether the conditional probability table is computed once from an initial dataset or is updated online as new coverage-increasing inputs are discovered during a campaign. This detail is load-bearing for the claim that interaction effects observed in the linear model translate to sustained gains in the deployed fuzzer.

Authors: We thank the referee for identifying this missing detail. In MuoFuzz, the conditional probability table is computed once from the initial coverage dataset collected on limited seeds (the same data underlying the linear model) and is not updated online during the campaign. This static design directly translates the observed pairwise interactions into sequence sampling via random walk without additional runtime adaptation. We will revise the MuoFuzz Design section to explicitly describe the one-time computation process, its grounding in the initial data, and how it enables sustained use of the learned interactions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation begins with a hypothesis on mutator ordering, supports it via an empirical linear model fit to a dataset of mutator-pair effectiveness (used only to observe interaction effects), and then implements MuoFuzz to learn conditional probabilities for sequencing and evaluates the resulting fuzzer directly on FuzzBench and MAGMA benchmarks. No equation or claim reduces a reported performance number or 'prediction' to the initial model fit by construction; the benchmark results are generated from independent full fuzzing campaigns rather than being statistically forced by the motivating linear model. The approach remains self-contained against external benchmarks with no load-bearing self-citation or ansatz smuggling.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on empirical data fitting for both the linear model and the conditional probabilities; no new physical or mathematical axioms are introduced beyond standard assumptions in coverage-guided fuzzing.

free parameters (1)

conditional probabilities for mutator transitions
These are learned from coverage outcomes during the training phase and directly determine the random-walk sampling behavior.

axioms (1)

domain assumption Coverage increase is a reliable proxy for interesting inputs that advance the fuzzing campaign.
Invoked when defining the target of the learned conditional probability.

pith-pipeline@v0.9.0 · 5780 in / 1288 out tokens · 29794 ms · 2026-05-18T04:17:44.276534+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We fit a linear model ... interaction term ... two-way ANOVA ... Pr(m_n = j | m_{n-1} = i) ... Markov random walk
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

interaction effect between two mutators on the number of interesting inputs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report

work page 2023
[2]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem.Machine learning47 (2002), 235–256

work page 2002
[3]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. https: //doi.org/10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[4]

Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the exponential cost of vulnerability discovery. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 713–724

work page 2020
[5]

Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury

work page
[6]

InProceedings of the 2017 ACM SIGSAC conference on computer and communications security

Directed greybox fuzzing. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2329–2344

work page 2017
[7]

Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- based greybox fuzzing as markov chain. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1032–1043. ICSE 2026, April 12–18, 2026, Rio de Janeiro, Brazil Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli

work page 2016
[8]

Brown, Vincent J

Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-Based n-gram Models of Natural Language.Compu- tational Linguistics18, 4 (1992), 467–479. https://www.aclweb.org/anthology/J92- 4003/

work page 1992
[9]

2006.Direct methods for sparse linear systems

Timothy A Davis. 2006.Direct methods for sparse linear systems. SIAM

work page 2006
[10]

Draper and Harry Smith

Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis.Wiley Series in Probability and Statistics3 (1998)

work page 1998
[11]

Jeffrey L. Elman. 1990. Finding Structure in Time.Cognitive Science14, 2 (1990), 179–211. https://doi.org/10.1207/s15516709cog1402_1

work page doi:10.1207/s15516709cog1402_1 1990
[12]

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++}: Combining incremental steps of fuzzing research. In14th USENIX Workshop on Offensive Technologies (WOOT 20)

work page 2020
[13]

R. A. Fisher. 1925. Statistical Methods for Research Workers.Edinburgh Oliver and Boyd12 (1925), 356–369

work page 1925
[14]

Vasudev Gohil, Rahul Kande, Chen Chen, Ahmad-Reza Sadeghi, and Jeyavijayan Rajendran. 2024. Mabfuzz: Multi-armed bandit algorithms for fuzzing processors. In2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6

work page 2024
[15]

2016.Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.Deep Learning. MIT Press. http://www.deeplearningbook.org

work page 2016
[16]

Google. n.d.. libFuzzer – A Library for Coverage-Guided Fuzz Testing. https: //llvm.org/docs/LibFuzzer.html. Accessed: 2024-08-25

work page 2024
[17]

Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A ground- truth fuzzing benchmark.Proceedings of the ACM on Measurement and Analysis of Computing Systems4, 3 (2020), 1–29

work page 2020
[18]

Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. InProceedings of the 36th international conference on software engineering. 435–445

work page 2014
[19]

Patrick Jauernig, Domagoj Jakobovic, Stjepan Picek, Emmanuel Stapf, and Ahmad- Reza Sadeghi. 2022. DARWIN: Survival of the fittest fuzzing mutators.arXiv preprint arXiv:2210.11783(2022)

work page arXiv 2022
[20]

On Interaction Effects in Greybox Fuzzing

Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli. 2025.Replication Package for "On Interaction Effects in Greybox Fuzzing". https://doi.org/10.5281/ zenodo.17391101

work page 2025
[21]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating fuzz testing. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security. 2123–2138

work page 2018
[22]

Yuki Koike, Hiroyuki Katsura, Hiromu Yakura, and Yuma Kurogome. 2022. Slopt: Bandit optimization framework for mutation-based fuzzing. InProceedings of the 38th Annual Computer Security Applications Conference. 519–533

work page 2022
[23]

Mario Köppen. 2000. The curse of dimensionality. In5th online world conference on soft computing in industrial applications (WSC5), Vol. 1. 4–8

work page 2000
[24]

Kutner, Christopher J

Michael H. Kutner, Christopher J. Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models(5th ed.). McGraw-Hill/Irwin, Boston, MA

work page 2005
[25]

2020.Bandit Algorithms

Tor Lattimore and Csaba Szepesvári. 2020.Bandit Algorithms. Cambridge Univer- sity Press, Cambridge, UK. https://tor-lattimore.com/downloads/book/book.pdf

work page 2020
[26]

Myungho Lee, Sooyoung Cha, and Hakjoo Oh. 2023. Learning seed-adaptive mutation strategies for greybox fuzzing. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 384–396

work page 2023
[27]

Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. InProceedings of the 33rd ACM/IEEE international conference on automated software engineering. 475–485

work page 2018
[28]

Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. 2019. Cerebro: context-aware adaptive fuzzing for effective vulnerability detection. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 533–544

work page 2019
[29]

Jie Liang, Mingzhe Wang, Chijin Zhou, Zhiyong Wu, Jianzhong Liu, and Yu Jiang

work page
[30]

InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

Dodrio: Parallelizing Taint Analysis Based Fuzzing via Redundancy-Free Scheduling. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 244–254

work page
[31]

Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOPT: Optimized Mutation Scheduling for Fuzzers. In28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1949–1966. https://www.usenix.org/conference/usenixsecurity19/ presentation/lyu

work page 2019
[32]

Mann and Donald R

Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. https://doi.org/10.1214/aoms/ 1177730491

work page doi:10.1214/aoms/ 1947
[33]

Jonathan Metzman, László Szekeres, Laurent Simon, Read Sprabery, and Abhishek Arya. 2021. Fuzzbench: an open fuzzer benchmarking platform and service. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 1393–1403

work page 2021
[34]

Maria-Irina Nicolae, Max Eisele, and Andreas Zeller. 2023. Revisiting neural program smoothing for fuzzing. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 133–145

work page 2023
[35]

J. R. Norris. 1998.Markov Chains. Cambridge University Press, Cambridge, UK

work page 1998
[36]

Pennsylvania State University. 2025. ANOVA Assumptions. https://online.stat. psu.edu/stat500/lesson/10/10.2/10.2.1 Accessed: 2025-03-05

work page 2025
[37]

Ruixiang Qian, Quanjun Zhang, Chunrong Fang, Ding Yang, Shun Li, Binyu Li, and Zhenyu Chen. 2024. DiPri: Distance-based Seed Prioritization for Greybox Fuzzing.ACM Transactions on Software Engineering and Methodology(2024)

work page 2024
[38]

Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware evolutionary fuzzing.. InNDSS, Vol. 17. 1–14

work page 2017
[39]

Moritz Schloegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias Scharnowski, Addison Crump, Arash Ale Ebrahim, Nicolai Bissantz, Marius Muench, and Thorsten Holz. 2024. SoK: Prudent Evaluation Practices for Fuzzing.arXiv preprint arXiv:2405.10220(2024)

work page arXiv 2024
[40]

Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statis- tical modeling with python. In9th Python in Science Conference. https://www. statsmodels.org/stable/generated/statsmodels.stats.anova.anova_lm.html

work page 2010
[41]

Dongdong She, Rahul Krishna, Lu Yan, Suman Jana, and Baishakhi Ray. 2020. MTFuzz: fuzzing with a multi-task neural network. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 737–749

work page 2020
[42]

Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana. 2019. Neuzz: Efficient fuzzing with neural program smoothing. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, 803–817

work page 2019
[43]

Robert Swiecki. 2015. Honggfuzz: A general-purpose, easy-to-use fuzzer with simple, command-line interface. https://github.com/google/honggfuzz. Accessed: 2024-08-25

work page 2015
[44]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101–132

work page 2000
[46]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. InProceedings of the 31st International Conference on Neural In- formation Processing Systems (NeurIPS). Curran Associates, Inc., 6000–6010. https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Xiaohong Wang, Chengcheng Hu, Ruopeng Ma, et al. 2021. CMFuzz: context- aware adaptive mutation for fuzzers.Empirical Software Engineering26, 10 (2021), 1–28. https://doi.org/10.1007/s10664-020-09927-3

work page doi:10.1007/s10664-020-09927-3 2021
[48]

Mingyuan Wu, Ling Jiang, Jiahong Xiang, Yanwei Huang, Heming Cui, Lingming Zhang, and Yuqun Zhang. 2022. One fuzzing strategy to rule them all. InPro- ceedings of the 44th International Conference on Software Engineering. 1634–1645

work page 2022
[49]

Michal Zalewski. 2014. American Fuzzy Lop (AFL). https://github.com/google/ AFL. Accessed: 2024-08-25

work page 2014
[50]

Andreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su. 2024. Greybox Fuzzing. InThe Fuzzing Book, An- dreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su (Eds.). Fuzzingbook.org. https://www.fuzzingbook.org/html/ GreyboxFuzzer.html

work page 2024
[51]

Han Zheng, Flavio Toffalini, Marcel Böhme, and Mathias Payer. 2025. MendelFuzz: The Return of the Deterministic Stage.Proceedings of the ACM on Software Engineering2, FSE (2025), 44–64

work page 2025

[1] [1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report

work page 2023

[2] [2]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem.Machine learning47 (2002), 235–256

work page 2002

[3] [3]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. https: //doi.org/10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995

[4] [4]

Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the exponential cost of vulnerability discovery. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 713–724

work page 2020

[5] [5]

Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury

work page

[6] [6]

InProceedings of the 2017 ACM SIGSAC conference on computer and communications security

Directed greybox fuzzing. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2329–2344

work page 2017

[7] [7]

Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- based greybox fuzzing as markov chain. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1032–1043. ICSE 2026, April 12–18, 2026, Rio de Janeiro, Brazil Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli

work page 2016

[8] [8]

Brown, Vincent J

Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-Based n-gram Models of Natural Language.Compu- tational Linguistics18, 4 (1992), 467–479. https://www.aclweb.org/anthology/J92- 4003/

work page 1992

[9] [9]

2006.Direct methods for sparse linear systems

Timothy A Davis. 2006.Direct methods for sparse linear systems. SIAM

work page 2006

[10] [10]

Draper and Harry Smith

Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis.Wiley Series in Probability and Statistics3 (1998)

work page 1998

[11] [11]

Jeffrey L. Elman. 1990. Finding Structure in Time.Cognitive Science14, 2 (1990), 179–211. https://doi.org/10.1207/s15516709cog1402_1

work page doi:10.1207/s15516709cog1402_1 1990

[12] [12]

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++}: Combining incremental steps of fuzzing research. In14th USENIX Workshop on Offensive Technologies (WOOT 20)

work page 2020

[13] [13]

R. A. Fisher. 1925. Statistical Methods for Research Workers.Edinburgh Oliver and Boyd12 (1925), 356–369

work page 1925

[14] [14]

Vasudev Gohil, Rahul Kande, Chen Chen, Ahmad-Reza Sadeghi, and Jeyavijayan Rajendran. 2024. Mabfuzz: Multi-armed bandit algorithms for fuzzing processors. In2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6

work page 2024

[15] [15]

2016.Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.Deep Learning. MIT Press. http://www.deeplearningbook.org

work page 2016

[16] [16]

Google. n.d.. libFuzzer – A Library for Coverage-Guided Fuzz Testing. https: //llvm.org/docs/LibFuzzer.html. Accessed: 2024-08-25

work page 2024

[17] [17]

Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A ground- truth fuzzing benchmark.Proceedings of the ACM on Measurement and Analysis of Computing Systems4, 3 (2020), 1–29

work page 2020

[18] [18]

Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. InProceedings of the 36th international conference on software engineering. 435–445

work page 2014

[19] [19]

Patrick Jauernig, Domagoj Jakobovic, Stjepan Picek, Emmanuel Stapf, and Ahmad- Reza Sadeghi. 2022. DARWIN: Survival of the fittest fuzzing mutators.arXiv preprint arXiv:2210.11783(2022)

work page arXiv 2022

[20] [20]

On Interaction Effects in Greybox Fuzzing

Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli. 2025.Replication Package for "On Interaction Effects in Greybox Fuzzing". https://doi.org/10.5281/ zenodo.17391101

work page 2025

[21] [21]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating fuzz testing. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security. 2123–2138

work page 2018

[22] [22]

Yuki Koike, Hiroyuki Katsura, Hiromu Yakura, and Yuma Kurogome. 2022. Slopt: Bandit optimization framework for mutation-based fuzzing. InProceedings of the 38th Annual Computer Security Applications Conference. 519–533

work page 2022

[23] [23]

Mario Köppen. 2000. The curse of dimensionality. In5th online world conference on soft computing in industrial applications (WSC5), Vol. 1. 4–8

work page 2000

[24] [24]

Kutner, Christopher J

Michael H. Kutner, Christopher J. Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models(5th ed.). McGraw-Hill/Irwin, Boston, MA

work page 2005

[25] [25]

2020.Bandit Algorithms

Tor Lattimore and Csaba Szepesvári. 2020.Bandit Algorithms. Cambridge Univer- sity Press, Cambridge, UK. https://tor-lattimore.com/downloads/book/book.pdf

work page 2020

[26] [26]

Myungho Lee, Sooyoung Cha, and Hakjoo Oh. 2023. Learning seed-adaptive mutation strategies for greybox fuzzing. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 384–396

work page 2023

[27] [27]

Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. InProceedings of the 33rd ACM/IEEE international conference on automated software engineering. 475–485

work page 2018

[28] [28]

Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. 2019. Cerebro: context-aware adaptive fuzzing for effective vulnerability detection. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 533–544

work page 2019

[29] [29]

Jie Liang, Mingzhe Wang, Chijin Zhou, Zhiyong Wu, Jianzhong Liu, and Yu Jiang

work page

[30] [30]

InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

Dodrio: Parallelizing Taint Analysis Based Fuzzing via Redundancy-Free Scheduling. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 244–254

work page

[31] [31]

Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOPT: Optimized Mutation Scheduling for Fuzzers. In28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1949–1966. https://www.usenix.org/conference/usenixsecurity19/ presentation/lyu

work page 2019

[32] [32]

Mann and Donald R

Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. https://doi.org/10.1214/aoms/ 1177730491

work page doi:10.1214/aoms/ 1947

[33] [33]

Jonathan Metzman, László Szekeres, Laurent Simon, Read Sprabery, and Abhishek Arya. 2021. Fuzzbench: an open fuzzer benchmarking platform and service. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 1393–1403

work page 2021

[34] [34]

Maria-Irina Nicolae, Max Eisele, and Andreas Zeller. 2023. Revisiting neural program smoothing for fuzzing. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 133–145

work page 2023

[35] [35]

J. R. Norris. 1998.Markov Chains. Cambridge University Press, Cambridge, UK

work page 1998

[36] [36]

Pennsylvania State University. 2025. ANOVA Assumptions. https://online.stat. psu.edu/stat500/lesson/10/10.2/10.2.1 Accessed: 2025-03-05

work page 2025

[37] [37]

Ruixiang Qian, Quanjun Zhang, Chunrong Fang, Ding Yang, Shun Li, Binyu Li, and Zhenyu Chen. 2024. DiPri: Distance-based Seed Prioritization for Greybox Fuzzing.ACM Transactions on Software Engineering and Methodology(2024)

work page 2024

[38] [38]

Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware evolutionary fuzzing.. InNDSS, Vol. 17. 1–14

work page 2017

[39] [39]

Moritz Schloegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias Scharnowski, Addison Crump, Arash Ale Ebrahim, Nicolai Bissantz, Marius Muench, and Thorsten Holz. 2024. SoK: Prudent Evaluation Practices for Fuzzing.arXiv preprint arXiv:2405.10220(2024)

work page arXiv 2024

[40] [40]

Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statis- tical modeling with python. In9th Python in Science Conference. https://www. statsmodels.org/stable/generated/statsmodels.stats.anova.anova_lm.html

work page 2010

[41] [41]

Dongdong She, Rahul Krishna, Lu Yan, Suman Jana, and Baishakhi Ray. 2020. MTFuzz: fuzzing with a multi-task neural network. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 737–749

work page 2020

[42] [42]

Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana. 2019. Neuzz: Efficient fuzzing with neural program smoothing. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, 803–817

work page 2019

[43] [43]

Robert Swiecki. 2015. Honggfuzz: A general-purpose, easy-to-use fuzzer with simple, command-line interface. https://github.com/google/honggfuzz. Accessed: 2024-08-25

work page 2015

[44] [44]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[45] [45]

András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101–132

work page 2000

[46] [46]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. InProceedings of the 31st International Conference on Neural In- formation Processing Systems (NeurIPS). Curran Associates, Inc., 6000–6010. https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[47] [47]

Xiaohong Wang, Chengcheng Hu, Ruopeng Ma, et al. 2021. CMFuzz: context- aware adaptive mutation for fuzzers.Empirical Software Engineering26, 10 (2021), 1–28. https://doi.org/10.1007/s10664-020-09927-3

work page doi:10.1007/s10664-020-09927-3 2021

[48] [48]

Mingyuan Wu, Ling Jiang, Jiahong Xiang, Yanwei Huang, Heming Cui, Lingming Zhang, and Yuqun Zhang. 2022. One fuzzing strategy to rule them all. InPro- ceedings of the 44th International Conference on Software Engineering. 1634–1645

work page 2022

[49] [49]

Michal Zalewski. 2014. American Fuzzy Lop (AFL). https://github.com/google/ AFL. Accessed: 2024-08-25

work page 2014

[50] [50]

Andreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su. 2024. Greybox Fuzzing. InThe Fuzzing Book, An- dreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su (Eds.). Fuzzingbook.org. https://www.fuzzingbook.org/html/ GreyboxFuzzer.html

work page 2024

[51] [51]

Han Zheng, Flavio Toffalini, Marcel Böhme, and Mathias Payer. 2025. MendelFuzz: The Return of the Deterministic Stage.Proceedings of the ACM on Software Engineering2, FSE (2025), 44–64

work page 2025