Build It, Break It, Fix It: Contesting Secure Development
Pith reviewed 2026-05-25 10:34 UTC · model grok-4.3
The pith
Statically type-safe languages produced 11 times fewer security flaws than C/C++ in the BIBIFI contest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The BIBIFI contest format shows that language choice and team experience correlate with security outcomes: statically type-safe language submissions were 11 times less likely to contain security flaws than C/C++ submissions, C/C++ produced the most efficient builds, and break-it teams that also succeeded at build-it were significantly better at finding security bugs.
What carries the argument
The BIBIFI contest structure, in which teams build specified software and then break other submissions to expose flaws.
If this is right
- Statically type-safe languages correlate with substantially lower rates of security flaws even when teams can choose any language or tools.
- Teams with experience succeeding at both building secure code and breaking insecure code identify more bugs than teams specialized in one role.
- C/C++ submissions can achieve higher performance but at the cost of elevated security risk compared to type-safe alternatives.
Where Pith is reading between the lines
- Training programs that require participants to both construct and attack software could strengthen vulnerability detection skills.
- The contest format offers a controlled way to measure effects of other variables like specific tools or methodologies on security outcomes.
- Industry teams might reduce vulnerabilities by prioritizing type-safe languages where performance trade-offs allow.
Load-bearing premise
The three specific programming problems and contest rules produce security flaw rates and breaking performance that generalize to real-world secure development tasks outside the artificial contest constraints.
What would settle it
A replication of the contest using different programming problems that fails to show the same 11-fold difference in flaw rates by language, or a field study of real projects that finds no language-based difference in vulnerability counts.
Figures
read the original abstract
Typical security contests focus on breaking or mitigating the impact of buggy systems. We present the Build-it, Break-it, Fix-it (BIBIFI) contest, which aims to assess the ability to securely build software, not just break it. In BIBIFI, teams build specified software with the goal of maximizing correctness, performance, and security. The latter is tested when teams attempt to break other teams' submissions. Winners are chosen from among the best builders and the best breakers. BIBIFI was designed to be open-ended; teams can use any language, tool, process, etc. that they like. As such, contest outcomes shed light on factors that correlate with successfully building secure software and breaking insecure software. We ran three contests involving a total of 156 teams and three different programming problems. Quantitative analysis from these contests found that the most efficient build-it submissions used C/C++, but submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions. Break-it teams that were also successful build-it teams were significantly better at finding security bugs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Build-it, Break-it, Fix-it (BIBIFI) contest to evaluate secure software development through teams building specified software and then attempting to break others' submissions. Across three contests with 156 teams and three programming problems, the authors report that C/C++ submissions were the most efficient but that code in statically type-safe languages was 11 times less likely to contain security flaws; additionally, teams successful at both building and breaking performed significantly better at discovering security bugs.
Significance. If the quantitative claims hold after methodological clarification, the work provides empirical evidence on language choice and dual build/break experience as correlates of secure development outcomes. The open-ended contest format, permitting arbitrary languages, tools, and processes, is a strength that distinguishes it from more constrained studies and enables observation of real correlations in a semi-controlled setting with a sizable participant pool.
major comments (2)
- [Abstract] Abstract: The central claim that 'submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions' is presented with no accompanying information on security flaw classification criteria, inter-rater reliability, the statistical model or controls used to compute the ratio, or any uncertainty estimates, which are required to assess the robustness of this load-bearing quantitative result.
- [Results section] Results section: No analysis or discussion addresses potential confounds such as self-selection bias, correlation between language choice and prior team experience, or differential problem fit, any of which could produce the observed 11x difference without reflecting inherent language security properties.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. We address each major comment below. We will revise the manuscript to provide greater methodological transparency in the abstract and to add explicit discussion of potential confounds.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions' is presented with no accompanying information on security flaw classification criteria, inter-rater reliability, the statistical model or controls used to compute the ratio, or any uncertainty estimates, which are required to assess the robustness of this load-bearing quantitative result.
Authors: The abstract is space-constrained, but the full manuscript details the flaw classification criteria (based on CWE categories for memory safety, injection, and access control issues), reports inter-rater agreement via Cohen's kappa in the 'Security Flaw Classification' subsection, describes the negative binomial regression model with controls for team experience and problem, and supplies 95% confidence intervals for the incidence rate ratio. We will revise the abstract to include a short parenthetical note on the model and uncertainty to improve accessibility without exceeding length limits. revision: yes
-
Referee: [Results section] Results section: No analysis or discussion addresses potential confounds such as self-selection bias, correlation between language choice and prior team experience, or differential problem fit, any of which could produce the observed 11x difference without reflecting inherent language security properties.
Authors: We agree that the results section would benefit from explicit treatment of these issues. The regression already includes controls for self-reported prior experience and problem type, but we did not dedicate space to self-selection or differential problem fit. We will add a dedicated limitations paragraph in the Discussion acknowledging these confounds, noting that the contest's open-ended design and pre-contest surveys provide partial mitigation, while recognizing that observational data cannot fully eliminate them. revision: yes
Circularity Check
No circularity: empirical observations from contest data stand alone
full rationale
The paper reports direct quantitative findings from three BIBIFI contests (156 teams, three problems): C/C++ submissions were most efficient but statically safe languages showed 11x fewer security flaws, and dual build/break teams performed better at bug finding. These are presented as observed correlations in the collected submissions and break attempts, with no equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations. The central multiplier is a straightforward ratio computed from flaw counts in the contest data, not derived from prior author work or ansatzes. Generalization concerns exist but are external-validity issues, not circularity. The derivation chain is self-contained as raw empirical reporting.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The three programming problems used in the contests are representative of typical secure development challenges.
- domain assumption Security flaws discovered in the break-it phase accurately reflect the security properties of the built submissions.
Reference graph
Works this paper leans on
-
[1]
Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. 2009. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC) 13, 1 (2009), 4:1–4:40
work page 2009
-
[2]
Y. Acar, M. Backes, S. Fahl, S. Garfinkel, D. Kim, M. L. Mazurek, and C. Stransky. 2017. Comparing the Usability of Cryptographic APIs. In 2017 IEEE Symposium on Security and Privacy (SP)
work page 2017
-
[3]
Yasemin Acar, Christian Stransky, Dominik Wermke, Michelle L. Mazurek, and Sascha Fahl. 2017. Security Developer Studies with GitHub Users: Exploring a Convenience Sample. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017)
work page 2017
-
[4]
Yasemin Acar, Christian Stransky, Dominik Wermke, Charles Weir, Michelle L. Mazurek, and Sascha Fahl. 2017. Developers Need Support Too: A Survey of Security Advice for Software Developers. In IEEE Secure Development Conference (SecDev 2017)
work page 2017
-
[5]
acm [n. d.]. The ACM-ICPC International Collegiate Programming Contest. http://icpc.baylor.edu. ([n. d.])
-
[6]
AFL 2018. American Fuzzing Lop (AFL). http://lcamtuf.coredump.cx/afl/. (2018)
work page 2018
-
[7]
Daniele Antonioli, Hamid Reza Ghaeini, Sridhar Adepu, Martin Ochoa, and Nils Ole Tippenhauer. 2017. Gamifying ICS Security Training and Research: Design, Implementation, and Results of S3. In Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy (CPS ’17)
work page 2017
-
[8]
Ingolf Becker, Simon Parkin, and M. Angela Sasse. 2017. Finding Security Champions in Blends of Security Culture. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society
work page 2017
-
[9]
Daniel J Bernstein, Tanja Lange, and Peter Schwabe. 2012. The security impact of a new cryptographic library. In International Conference on Cryptology and Information Security in Latin America . Springer, 159–176
work page 2012
-
[10]
Black, Lee Badger, Barbara Guttman, and Elizabeth Fong
Paul E. Black, Lee Badger, Barbara Guttman, and Elizabeth Fong. 2016. Dramatically Reducing Software Vulnerabilities: Report to the White House Office of Science and Technology Policy. Technical Report Draft NISTIR 8151. National Institute of Standards and Technology. http://csrc.nist.gov/publications/drafts/nistir-8151/nistir8151_draft.pdf
work page 2016
-
[11]
Kevin Bock, George Hughey, and Dave Levin. 2018. King of the Hill: A Novel Cybersecurity Competition for Teaching Penetration Testing. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)
work page 2018
-
[12]
bsimm [n. d.]. Building Security In Maturity Model (BSIMM). http://bsimm.com. ([n. d.])
-
[13]
Kenneth P Burnham, David R Anderson, and Kathryn P Huyvaert. 2011. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology 65, 1 (2011), 23–35
work page 2011
-
[14]
Peter Chapman, Jonathan Burket, and David Brumley. 2014. PicoCTF: A game-based computer security competition for high school students. In 2014 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 14)
work page 2014
-
[15]
Brian Chess and Jacob West. 2007. Secure Programming with Static Analysis . Addison-Wesley
work page 2007
-
[16]
Nicholas Childers, Bryce Boe, Lorenzo Cavallaro, Ludovico Cavedon, Marco Cova, Manuel Egele, and Giovanni Vigna
- [17]
-
[18]
Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences . Lawrence Erlbaum Associates
work page 1988
-
[19]
Art Conklin. 2005. The Use of a Collegiate Cyber Defense Competition in Information Security Education. InInfoSecCD
work page 2005
-
[20]
Art Conklin. 2006. Cyber defense competitions and information security education: An active learning solution for a capstone course. In HICSS
work page 2006
-
[21]
Gregory Conti, Thomas Babbitt, and John Nelson. 2011. Hacking competitions and their untapped potential for security education. Security & Privacy 9, 3 (2011), 56–59. ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. 39:34 J. Parker et al
work page 2011
-
[22]
DEF CON Communications, Inc. [n. d.]. Capture the Flag Archive. https://www.defcon.org/html/links/dc-ctf.html. ([n. d.])
-
[23]
Adam Doupé, Manuel Egele, Benjamin Caillat, Gianluca Stringhini, Gorkem Yakin, Ali Zand, Ludovico Cavedon, and Giovanni Vigna. 2011. Hit ’Em Where It Hurts: A Live Security Exercise on Cyber Situational Awareness. InACSAC
work page 2011
-
[24]
dragostech.com inc. [n. d.]. CanSecWest Applied Security Conference. http://cansecwest.com. ([n. d.])
-
[25]
Chris Eagle. 2013. Computer security competitions: Expanding educational outcomes. Security & Privacy 11, 4 (2013), 69–71
work page 2013
-
[26]
Anne Edmundson, Brian Holtkamp, Emanuel Rivera, Matthew Finifter, Adrian Mettler, and David Wagner. 2013. An Empirical Study on the Effectiveness of Security Code Review. In International Symposium on Engineering Secure Software and Systems (ESSoS)
work page 2013
-
[27]
Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An empirical study of cryptographic misuse in android applications. In the 2013 ACM SIGSAC conference . ACM Press, 73–84. https://www.cs.ucsb.edu/ ~chris/research/doc/ccs13_cryptolint.pdf
work page 2013
- [28]
-
[29]
Matthew Finifter and David Wagner. 2011. Exploring the relationship betweenweb application development tools and security. In USENIX Conference on Web Application Development (WebApps)
work page 2011
-
[30]
Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. 2012. The most dangerous code in the world: validating SSL certificates in non-browser software. In CCS ’12: Proceedings of the 2012 ACM conference on Computer and communications security . ACM. https://doi.org/10.1145/2382196.2382204
-
[31]
git [n. d.]. Git – distributed version control management system. http://git-scm.com. ([n. d.])
-
[32]
google [n. d.]. Google Code Jam. http://code.google.com/codejam. ([n. d.])
-
[33]
Keith Harrison and Gregory White. 2010. An empirical study on the effectiveness of common security measures. In Hawaii International Conference on System Sciences (HICSS)
work page 2010
-
[34]
Hoffman, Tim Rosenberg, and Ronald Dodge
Lance J. Hoffman, Tim Rosenberg, and Ronald Dodge. 2005. Exploring a national cybersecurity exercise for universities. Security & Privacy 3, 5 (2005), 27–33
work page 2005
-
[35]
Michael Howard and David LeBlanc. 2003. Writing Secure Code. Microsoft Press
work page 2003
-
[36]
Michael Howard and Steve Lipner. 2006. The Security Development Lifecycle . Microsoft Press
work page 2006
-
[37]
icfp [n. d.]. ICFP Programming Contest. http://icfpcontest.org. ([n. d.])
-
[38]
DEF CON Communications Inc. [n. d.]. DEF CON Hacking Conference. http://www.defcon.org. ([n. d.])
-
[39]
Queena Kim. 2014. Want to learn cybersecurity? Head to Def Con. http://www.marketplace.org/2014/08/25/tech/ want-learn-cybersecurity-head-def-con. (2014)
work page 2014
-
[40]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18)
work page 2018
-
[41]
Gary McGraw. 2006. Software Security: Building Security In . Addison-Wesley
work page 2006
-
[42]
mdc3 [n. d.]. Maryland Cyber Challenge & Competition. http://www.fbcinc.com/e/cybermdconference/competitorinfo. aspx. ([n. d.])
-
[43]
David Molnar, Xue Cong Li, and David A. Wagner. 2009. Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. In USENIX Security Symposium
work page 2009
-
[44]
N J D Nagelkerke. 1991. A note on a general definition of the coefficient of determination. Biometrika 78, 3 (09 1991), 691–692
work page 1991
-
[45]
National Collegiate Cyber Defense Competition. [n. d.]. http://www.nationalccdc.org. ([n. d.])
-
[46]
Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots. In ACSAC
work page 2014
-
[47]
DeLong, Justin Cappos, and Yuriy Brun
Daniela Seabra Oliveira, Tian Lin, Muhammad Sajidur Rahman, Rad Akefirad, Donovan Ellis, Eliany Perez, Rahul Bobhate, Lois A. DeLong, Justin Cappos, and Yuriy Brun. 2018. API Blindspots: Why Experienced Developers Write Vulnerable Code. In Fourteenth Symposium on Usable Privacy and Security (SOUPS 2018)
work page 2018
-
[48]
OWASP. 2010. Secure Coding Practices - Quick Reference Guide. (2010). https://www.owasp.org/images/0/08/OWASP_ SCP_Quick_Reference_Guide_v2.pdf
work page 2010
-
[49]
James Parker, Niki Vazou, and Michael Hicks. 2019. LWeb: Information Flow Security for Multi-tier Web Applications. Proc. ACM Program. Lang. 3, POPL (Jan. 2019)
work page 2019
-
[50]
Van-Thuan Pham, Sakaar Khurana, Subhajit Roy, and Abhik Roychoudhury. 2017. Bucketing Failing Tests via Symbolic Analysis. In International Conference on Fundeamental Approaches to Software Engineering (FASE)
work page 2017
-
[51]
Polytechnic Institute of New York University. [n. d.]. CSAW - CyberSecurity Competition 2012. http://www.poly.edu/ csaw2012/csaw-CTF. ([n. d.]). ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. Build It, Break It, Fix It 39:35
work page 2012
- [52]
-
[53]
psql [n. d.]. PostgreSQL: The world’s most advanced open source database. http://www.postgresql.org. ([n. d.])
-
[54]
Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Michelle L. Mazurek, and Piotr Mardziel. 2016. Build It, Break It, Fix It: Contesting Secure Development. In CCS
work page 2016
-
[55]
Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Atif Memon, Jandelyn Plane, and Piotr Mardziel. 2015. Build It Break It: Measuring and Comparing Development Security. In CSET
work page 2015
-
[56]
Jerome H. Saltzer and Michael D. Schroeder. 1975. The Protection of Information in Computer Systems. Proc. IEEE 63, 9 (1975), 1278–1308
work page 1975
-
[57]
Riccardo Scandariato, James Walden, and Wouter Joosen. 2013. Static analysis versus penetration testing: A controlled experiment. In IEEE International Symposium on Reliability Engineering (ISSRE)
work page 2013
-
[58]
Robert C. Seacord. 2013. Secure Coding in C and C++ . Addison-Wesley
work page 2013
-
[59]
Deian Stefan, Alejandro Russo, John Mitchell, and David Mazieres. 2011. Flexible Dynamic Information Flow Control in Haskell. In ACM SIGPLAN Haskell Symposium
work page 2011
-
[60]
Redmiles, Michael Backes, Simson Garfinkel, Michelle L
Christian Stransky, Yasemin Acar, Duc Cuong Nguyen, Dominik Wermke, Doowon Kim, Elissa M. Redmiles, Michael Backes, Simson Garfinkel, Michelle L. Mazurek, and Sascha Fahl. 2017. Lessons Learned from Using an Online Platform to Conduct Large-Scale, Online Controlled Security Experiments with Software Developers. In 10th USENIX Workshop on Cyber Security Ex...
work page 2017
-
[61]
Positive Technologies. 2018. ATM logic attacks: scenarios, 2018. https://www.ptsecurity.com/upload/corporate/ww-en/ analytics/ATM-Vulnerabilities-2018-eng.pdf. (Nov. 2018)
work page 2018
-
[62]
Christopher Thompson and David Wagner. 2017. A Large-Scale Study of Modern Code Review and Security in Open Source Projects. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE)
work page 2017
-
[63]
topcoder [n. d.]. Top Coder competitions. http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview. ([n. d.])
-
[64]
Erik Trickel, Francesco Disperati, Eric Gustafson, Faezeh Kalantari, Mike Mabey, Naveen Tiwari, Yeganeh Safaei, Adam Doupé, and Giovanni Vigna. 2017. Shell We Play A Game? CTF-as-a-service for Security Education. In 2017 USENIX Workshop on Advances in Security Education (ASE 17)
work page 2017
-
[65]
Úlfar Erlingsson. 2012. personal communication stating that CFI was not deployed at Microsoft due to its overhead exceeding 10%. (2012)
work page 2012
-
[66]
Rijnard van Tonder, John Kotheimer, and Claire Le Goues. 2018. Semantic Crash Bucketing. In IEEE International Conference on Automated Software Engineering (ASE)
work page 2018
-
[67]
John Viega and Gary McGraw. 2001. Building Secure Software: How to A void Security Problems the Right Way . Addison- Wesley
work page 2001
-
[68]
James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting Vulnerable Components: Software Metrics vs Text Mining. In IEEE International Symposium on Software Reliability Engineering
work page 2014
-
[69]
Charles Weir, Awais Rashid, and James Noble. 2017. I’d Like to Have an Argument, Please: Using Dialectic for Effective App Security. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society
work page 2017
-
[70]
SeongIl Wi, Jaeseung Choi, and Sang Kil Cha. 2018. Git-based CTF: A Simple and Effective Approach to Organizing In-Course Attack-and-Defense Security Competition. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)
work page 2018
-
[71]
Glenn Wurster and P C van Oorschot. 2008. The developer is the enemy. In NSPW. 89
work page 2008
-
[72]
J Xie, H R Lipford, and B Chu. 2011. Why do programmers make security errors?. In 2011 IEEE Symposium on Visual Languages and Human-Centric Computing . http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6070393
work page 2011
-
[73]
Muhammad Mudassar Yamin, Basel Katt, Espen Torseth, Vasileios Gkioulos, and Stewart James Kowalski. 2018. Make It and Break It: An IoT Smart Home Testbed Case Study. InProceedings of the 2Nd International Symposium on Computer Science and Intelligent Control (ISCSIC ’18)
work page 2018
-
[74]
Joonseok Yang, Duksan Ryu, and Jongmoon Baik. 2016. Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In International Conference on Big Data and Smart Computing (BigComp)
work page 2016
-
[75]
yesodweb [n. d.]. Yesod Web Framework for Haskell. http://www.yesodweb.com. ([n. d.]). Received February 2007; revised March 2009; accepted June 2009 ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.