On Interaction Effects in Greybox Fuzzing
Pith reviewed 2026-05-18 04:17 UTC · model grok-4.3
The pith
Mutator order matters in greybox fuzzing because pairwise interactions affect coverage gains, and learning conditional probabilities to pick sequences improves results over fixed or independent choices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Greybox fuzzers become more effective when they learn and exploit interaction effects among mutators by estimating the conditional probability that a given mutator will yield coverage gains after the previous one, then sampling promising sequences via random walk rather than using fixed selection probabilities or independent optimization.
What carries the argument
A conditional probability model over ordered mutator pairs, sampled by random walk to produce full mutation sequences for seed inputs.
If this is right
- MuoFuzz records the highest code coverage on both the FuzzBench and MAGMA benchmark suites.
- MuoFuzz discovers four bugs that AFL++ misses and one bug missed by both AFL++ and MOPT.
- Choosing mutator sequences according to learned conditional probabilities is more efficient than fixed-probability selection or per-mutator optimization alone.
Where Pith is reading between the lines
- The same pairwise-interaction modeling could be applied to other mutation-driven search processes such as evolutionary test generation or program repair.
- If the learned probabilities shift as the corpus grows, periodic re-estimation during a campaign might further increase gains.
- The linear-model evidence for interactions suggests that similar dependencies may exist between other fuzzing decisions, such as seed scheduling or energy assignment.
Load-bearing premise
Interaction effects measured from short, isolated mutator-pair trials on a small set of seeds will still guide better choices when the model runs inside full-length fuzzing campaigns on many different programs.
What would settle it
Running MuoFuzz, AFL++, and MOPT head-to-head on the FuzzBench and MAGMA suites and finding that MuoFuzz does not produce the highest coverage or the unique bugs reported would show the learned sequences add no practical gain.
Figures
read the original abstract
A greybox fuzzer is an automated software testing tool that generates new test inputs by applying randomly chosen mutators (e.g., flipping a bit or deleting a block of bytes) to a seed input in random order and adds all coverage-increasing inputs to the corpus of seeds. We hypothesize that the order in which mutators are applied to a seed input has an impact on the effectiveness of greybox fuzzers. In our experiments, we fit a linear model to a dataset that contains the effectiveness of all possible mutator pairs and indeed observe the conjectured interaction effect. This points us to more efficient fuzzing by choosing the most promising mutator sequence with a higher likelihood. We propose MuoFuzz, a greybox fuzzer that learns and chooses the most promising mutator sequences. MuoFuzz learns the conditional probability that the next mutator will yield an interesting input, given the previously selected mutator. Then, it samples from the learned probability using a random walk to generate mutator sequences. We compare the performance of MuoFuzz to AFL++, which uses a fixed selection probability, and MOPT, which optimizes the selection probability of each mutator in isolation. Experimental results on the FuzzBench and MAGMA benchmarks show that MuoFuzz achieves the highest code coverage and finds four bugs missed by AFL++ and one missed by both AFL++ and MOPT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper hypothesizes that the sequence in which mutators are applied affects greybox fuzzer effectiveness due to interaction effects. It fits a linear model to pairwise mutator effectiveness data collected on limited seeds and reports observing such interactions. It then proposes MuoFuzz, which learns conditional probabilities P(next mutator is interesting | previous mutator) from coverage data and samples mutator sequences via random walk. On FuzzBench and MAGMA, MuoFuzz reports higher coverage than AFL++ (fixed probabilities) and MOPT (independent optimization) and discovers bugs missed by the baselines.
Significance. If the results hold, the work supplies empirical support for the value of modeling mutator interactions rather than treating selection probabilities as independent. The evaluation on standard, reproducible benchmarks (FuzzBench, MAGMA) is a strength and enables direct comparison with prior fuzzers. The approach of learning conditionals from observed coverage outcomes is falsifiable and grounded in data rather than hand-tuned parameters.
major comments (2)
- [Evaluation] The pairwise interaction measurements are performed on restricted seed sets in isolation, yet the central performance claims rest on MuoFuzz using the resulting conditional probabilities inside full, long-running campaigns on evolving corpora. No ablation or transfer experiment is reported that checks whether the learned conditionals remain predictive once the corpus diversifies and runtimes extend, leaving open whether observed gains are due to interaction modeling or other adaptive mechanics.
- [MuoFuzz Design] The manuscript does not state whether the conditional probability table is computed once from an initial dataset or is updated online as new coverage-increasing inputs are discovered during a campaign. This detail is load-bearing for the claim that interaction effects observed in the linear model translate to sustained gains in the deployed fuzzer.
minor comments (2)
- [Abstract] The abstract states that MuoFuzz finds four bugs missed by AFL++ and one missed by both AFL++ and MOPT, but the results section should list the specific bugs, the programs on which they were found, and the exact run counts used for each comparison.
- [Experiments] Statistical controls (number of independent runs, variance reporting, or significance tests) for the coverage curves are not described in sufficient detail to allow readers to judge the robustness of the reported improvements.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our evaluation and design that we address point by point below. We provide the strongest honest clarifications supported by the manuscript while indicating revisions where appropriate.
read point-by-point responses
-
Referee: [Evaluation] The pairwise interaction measurements are performed on restricted seed sets in isolation, yet the central performance claims rest on MuoFuzz using the resulting conditional probabilities inside full, long-running campaigns on evolving corpora. No ablation or transfer experiment is reported that checks whether the learned conditionals remain predictive once the corpus diversifies and runtimes extend, leaving open whether observed gains are due to interaction modeling or other adaptive mechanics.
Authors: The pairwise measurements on restricted seed sets were intentionally performed in isolation to establish the existence of interaction effects via the linear model, as stated in the manuscript. The performance claims for MuoFuzz are evaluated exclusively through full, long-running campaigns on the FuzzBench and MAGMA benchmarks, where the learned conditional probabilities are applied directly to evolving corpora over extended runtimes. These experiments demonstrate higher coverage and additional unique bugs compared to AFL++ (fixed probabilities) and MOPT (independent optimization). While we did not report a dedicated ablation isolating the transferability of the conditionals, the end-to-end results on standard benchmarks already reflect usage in realistic, diversifying settings and support that interaction modeling contributes to the observed gains beyond other adaptive mechanics. We will add a clarifying paragraph in the Evaluation section discussing the applicability of the pre-learned conditionals during long campaigns. revision: partial
-
Referee: [MuoFuzz Design] The manuscript does not state whether the conditional probability table is computed once from an initial dataset or is updated online as new coverage-increasing inputs are discovered during a campaign. This detail is load-bearing for the claim that interaction effects observed in the linear model translate to sustained gains in the deployed fuzzer.
Authors: We thank the referee for identifying this missing detail. In MuoFuzz, the conditional probability table is computed once from the initial coverage dataset collected on limited seeds (the same data underlying the linear model) and is not updated online during the campaign. This static design directly translates the observed pairwise interactions into sequence sampling via random walk without additional runtime adaptation. We will revise the MuoFuzz Design section to explicitly describe the one-time computation process, its grounding in the initial data, and how it enables sustained use of the learned interactions. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's derivation begins with a hypothesis on mutator ordering, supports it via an empirical linear model fit to a dataset of mutator-pair effectiveness (used only to observe interaction effects), and then implements MuoFuzz to learn conditional probabilities for sequencing and evaluates the resulting fuzzer directly on FuzzBench and MAGMA benchmarks. No equation or claim reduces a reported performance number or 'prediction' to the initial model fit by construction; the benchmark results are generated from independent full fuzzing campaigns rather than being statistically forced by the motivating linear model. The approach remains self-contained against external benchmarks with no load-bearing self-citation or ansatz smuggling.
Axiom & Free-Parameter Ledger
free parameters (1)
- conditional probabilities for mutator transitions
axioms (1)
- domain assumption Coverage increase is a reliable proxy for interesting inputs that advance the fuzzing campaign.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We fit a linear model ... interaction term ... two-way ANOVA ... Pr(m_n = j | m_{n-1} = i) ... Markov random walk
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
interaction effect between two mutators on the number of interesting inputs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report
work page 2023
-
[2]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem.Machine learning47 (2002), 235–256
work page 2002
-
[3]
Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. https: //doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
[4]
Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the exponential cost of vulnerability discovery. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 713–724
work page 2020
-
[5]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury
-
[6]
InProceedings of the 2017 ACM SIGSAC conference on computer and communications security
Directed greybox fuzzing. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2329–2344
work page 2017
-
[7]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- based greybox fuzzing as markov chain. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1032–1043. ICSE 2026, April 12–18, 2026, Rio de Janeiro, Brazil Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli
work page 2016
-
[8]
Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-Based n-gram Models of Natural Language.Compu- tational Linguistics18, 4 (1992), 467–479. https://www.aclweb.org/anthology/J92- 4003/
work page 1992
-
[9]
2006.Direct methods for sparse linear systems
Timothy A Davis. 2006.Direct methods for sparse linear systems. SIAM
work page 2006
-
[10]
Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis.Wiley Series in Probability and Statistics3 (1998)
work page 1998
-
[11]
Jeffrey L. Elman. 1990. Finding Structure in Time.Cognitive Science14, 2 (1990), 179–211. https://doi.org/10.1207/s15516709cog1402_1
-
[12]
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++}: Combining incremental steps of fuzzing research. In14th USENIX Workshop on Offensive Technologies (WOOT 20)
work page 2020
-
[13]
R. A. Fisher. 1925. Statistical Methods for Research Workers.Edinburgh Oliver and Boyd12 (1925), 356–369
work page 1925
-
[14]
Vasudev Gohil, Rahul Kande, Chen Chen, Ahmad-Reza Sadeghi, and Jeyavijayan Rajendran. 2024. Mabfuzz: Multi-armed bandit algorithms for fuzzing processors. In2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6
work page 2024
-
[15]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.Deep Learning. MIT Press. http://www.deeplearningbook.org
work page 2016
-
[16]
Google. n.d.. libFuzzer – A Library for Coverage-Guided Fuzz Testing. https: //llvm.org/docs/LibFuzzer.html. Accessed: 2024-08-25
work page 2024
-
[17]
Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A ground- truth fuzzing benchmark.Proceedings of the ACM on Measurement and Analysis of Computing Systems4, 3 (2020), 1–29
work page 2020
-
[18]
Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. InProceedings of the 36th international conference on software engineering. 435–445
work page 2014
- [19]
-
[20]
On Interaction Effects in Greybox Fuzzing
Konstantinos Kitsios, Marcel Böhme, and Alberto Bacchelli. 2025.Replication Package for "On Interaction Effects in Greybox Fuzzing". https://doi.org/10.5281/ zenodo.17391101
work page 2025
-
[21]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating fuzz testing. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security. 2123–2138
work page 2018
-
[22]
Yuki Koike, Hiroyuki Katsura, Hiromu Yakura, and Yuma Kurogome. 2022. Slopt: Bandit optimization framework for mutation-based fuzzing. InProceedings of the 38th Annual Computer Security Applications Conference. 519–533
work page 2022
-
[23]
Mario Köppen. 2000. The curse of dimensionality. In5th online world conference on soft computing in industrial applications (WSC5), Vol. 1. 4–8
work page 2000
-
[24]
Michael H. Kutner, Christopher J. Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models(5th ed.). McGraw-Hill/Irwin, Boston, MA
work page 2005
-
[25]
Tor Lattimore and Csaba Szepesvári. 2020.Bandit Algorithms. Cambridge Univer- sity Press, Cambridge, UK. https://tor-lattimore.com/downloads/book/book.pdf
work page 2020
-
[26]
Myungho Lee, Sooyoung Cha, and Hakjoo Oh. 2023. Learning seed-adaptive mutation strategies for greybox fuzzing. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 384–396
work page 2023
-
[27]
Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. InProceedings of the 33rd ACM/IEEE international conference on automated software engineering. 475–485
work page 2018
-
[28]
Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. 2019. Cerebro: context-aware adaptive fuzzing for effective vulnerability detection. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 533–544
work page 2019
-
[29]
Jie Liang, Mingzhe Wang, Chijin Zhou, Zhiyong Wu, Jianzhong Liu, and Yu Jiang
-
[30]
Dodrio: Parallelizing Taint Analysis Based Fuzzing via Redundancy-Free Scheduling. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 244–254
-
[31]
Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOPT: Optimized Mutation Scheduling for Fuzzers. In28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1949–1966. https://www.usenix.org/conference/usenixsecurity19/ presentation/lyu
work page 2019
-
[32]
Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. https://doi.org/10.1214/aoms/ 1177730491
-
[33]
Jonathan Metzman, László Szekeres, Laurent Simon, Read Sprabery, and Abhishek Arya. 2021. Fuzzbench: an open fuzzer benchmarking platform and service. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 1393–1403
work page 2021
-
[34]
Maria-Irina Nicolae, Max Eisele, and Andreas Zeller. 2023. Revisiting neural program smoothing for fuzzing. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 133–145
work page 2023
-
[35]
J. R. Norris. 1998.Markov Chains. Cambridge University Press, Cambridge, UK
work page 1998
-
[36]
Pennsylvania State University. 2025. ANOVA Assumptions. https://online.stat. psu.edu/stat500/lesson/10/10.2/10.2.1 Accessed: 2025-03-05
work page 2025
-
[37]
Ruixiang Qian, Quanjun Zhang, Chunrong Fang, Ding Yang, Shun Li, Binyu Li, and Zhenyu Chen. 2024. DiPri: Distance-based Seed Prioritization for Greybox Fuzzing.ACM Transactions on Software Engineering and Methodology(2024)
work page 2024
-
[38]
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware evolutionary fuzzing.. InNDSS, Vol. 17. 1–14
work page 2017
- [39]
-
[40]
Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statis- tical modeling with python. In9th Python in Science Conference. https://www. statsmodels.org/stable/generated/statsmodels.stats.anova.anova_lm.html
work page 2010
-
[41]
Dongdong She, Rahul Krishna, Lu Yan, Suman Jana, and Baishakhi Ray. 2020. MTFuzz: fuzzing with a multi-task neural network. InProceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 737–749
work page 2020
-
[42]
Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana. 2019. Neuzz: Efficient fuzzing with neural program smoothing. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, 803–817
work page 2019
-
[43]
Robert Swiecki. 2015. Honggfuzz: A general-purpose, easy-to-use fuzzer with simple, command-line interface. https://github.com/google/honggfuzz. Accessed: 2024-08-25
work page 2015
-
[44]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101–132
work page 2000
-
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. InProceedings of the 31st International Conference on Neural In- formation Processing Systems (NeurIPS). Curran Associates, Inc., 6000–6010. https://arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[47]
Xiaohong Wang, Chengcheng Hu, Ruopeng Ma, et al. 2021. CMFuzz: context- aware adaptive mutation for fuzzers.Empirical Software Engineering26, 10 (2021), 1–28. https://doi.org/10.1007/s10664-020-09927-3
-
[48]
Mingyuan Wu, Ling Jiang, Jiahong Xiang, Yanwei Huang, Heming Cui, Lingming Zhang, and Yuqun Zhang. 2022. One fuzzing strategy to rule them all. InPro- ceedings of the 44th International Conference on Software Engineering. 1634–1645
work page 2022
-
[49]
Michal Zalewski. 2014. American Fuzzy Lop (AFL). https://github.com/google/ AFL. Accessed: 2024-08-25
work page 2014
-
[50]
Andreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su. 2024. Greybox Fuzzing. InThe Fuzzing Book, An- dreas Zeller, Rahul Gopinath, Marcel Böhme, Christian Holler, Caroline Lemieux, and Zhendong Su (Eds.). Fuzzingbook.org. https://www.fuzzingbook.org/html/ GreyboxFuzzer.html
work page 2024
-
[51]
Han Zheng, Flavio Toffalini, Marcel Böhme, and Mathias Payer. 2025. MendelFuzz: The Return of the Deterministic Stage.Proceedings of the ACM on Software Engineering2, FSE (2025), 44–64
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.