Build It, Break It, Fix It: Contesting Secure Development

Andrew Ruef; Daniel Votipka; Dave Levin; James Parker; Kelsey R. Fulton; Michael Hicks; Michelle L. Mazurek; Piotr Mardziel

arxiv: 1907.01679 · v1 · pith:3AOKBXGKnew · submitted 2019-07-02 · 💻 cs.CR

Build It, Break It, Fix It: Contesting Secure Development

James Parker , Michael Hicks , Andrew Ruef , Michelle L. Mazurek , Dave Levin , Daniel Votipka , Piotr Mardziel , Kelsey R. Fulton This is my paper

Pith reviewed 2026-05-25 10:34 UTC · model grok-4.3

classification 💻 cs.CR

keywords secure software developmentprogramming contestssecurity flawstype-safe languagesC/C++bug findingsoftware security

0 comments

The pith

Statically type-safe languages produced 11 times fewer security flaws than C/C++ in the BIBIFI contest.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Build-it, Break-it, Fix-it contest to evaluate how well teams create secure software rather than just break it. Teams build programs to meet correctness, performance, and security goals, then attempt to break other teams' work, with winners selected from top performers in each area. Analysis across three contests and 156 teams found that C/C++ submissions were most efficient but carried far higher security risk, while statically type-safe languages cut flaw likelihood by a factor of 11. Teams that performed well at both building and breaking proved significantly more effective at discovering vulnerabilities than those focused on one skill.

Core claim

The BIBIFI contest format shows that language choice and team experience correlate with security outcomes: statically type-safe language submissions were 11 times less likely to contain security flaws than C/C++ submissions, C/C++ produced the most efficient builds, and break-it teams that also succeeded at build-it were significantly better at finding security bugs.

What carries the argument

The BIBIFI contest structure, in which teams build specified software and then break other submissions to expose flaws.

If this is right

Statically type-safe languages correlate with substantially lower rates of security flaws even when teams can choose any language or tools.
Teams with experience succeeding at both building secure code and breaking insecure code identify more bugs than teams specialized in one role.
C/C++ submissions can achieve higher performance but at the cost of elevated security risk compared to type-safe alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training programs that require participants to both construct and attack software could strengthen vulnerability detection skills.
The contest format offers a controlled way to measure effects of other variables like specific tools or methodologies on security outcomes.
Industry teams might reduce vulnerabilities by prioritizing type-safe languages where performance trade-offs allow.

Load-bearing premise

The three specific programming problems and contest rules produce security flaw rates and breaking performance that generalize to real-world secure development tasks outside the artificial contest constraints.

What would settle it

A replication of the contest using different programming problems that fails to show the same 11-fold difference in flaw rates by language, or a field study of real projects that finds no language-based difference in vulnerability counts.

Figures

Figures reproduced from arXiv: 1907.01679 by Andrew Ruef, Daniel Votipka, Dave Levin, James Parker, Kelsey R. Fulton, Michael Hicks, Michelle L. Mazurek, Piotr Mardziel.

**Figure 1.** Figure 1: Overview of BIBIFI’s implementation. Web frontend. Contestants sign up for the contest through our web application frontend, and fill out a survey when doing so, to gather demographic data potentially relevant to the contest outcome (e.g., programming experience and security training). During the contest, the web application tests build-it submissions and break-it bug reports, keeps the current scores upda… view at source ↗

**Figure 2.** Figure 2: MITM replay attack. set of atm commands is run using the oracle’s atm and bank without the MITM. This means that any messages that the MITM sends directly to the target submission’s atm or bank will not be replayed/sent to the oracle. If the oracle and target both complete the command list without error, but they differ on the outputs of one or more commands, or on the balances of accounts at the bank whos… view at source ↗

**Figure 3.** Figure 3: Grammar for the Multiuser DB command language as BNF. Here, [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: The number of build-it submissions in each contest, organized by primary programming language [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Each team’s ship score, compared to the lines of code in its implementation and organized by language [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Final resilience scores, ordered by team, and plotted for each contest problem. Build-it teams who did [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: The fraction of teams in whose submission a security bug was found, by contest and language category. [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Scores of break-it teams prior to the fix-it phase, broken down by points from security and correctness [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Count of security bugs found by each break-it team, organized by contest and whether the team [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

read the original abstract

Typical security contests focus on breaking or mitigating the impact of buggy systems. We present the Build-it, Break-it, Fix-it (BIBIFI) contest, which aims to assess the ability to securely build software, not just break it. In BIBIFI, teams build specified software with the goal of maximizing correctness, performance, and security. The latter is tested when teams attempt to break other teams' submissions. Winners are chosen from among the best builders and the best breakers. BIBIFI was designed to be open-ended; teams can use any language, tool, process, etc. that they like. As such, contest outcomes shed light on factors that correlate with successfully building secure software and breaking insecure software. We ran three contests involving a total of 156 teams and three different programming problems. Quantitative analysis from these contests found that the most efficient build-it submissions used C/C++, but submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions. Break-it teams that were also successful build-it teams were significantly better at finding security bugs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BIBIFI supplies contest numbers linking safe languages to fewer flaws and builder experience to better breaking, but the artificial rules limit how far those numbers travel.

read the letter

The main takeaway is that this paper reports an 11 times lower security flaw rate for statically safe languages versus C/C++ in their BIBIFI contest, plus better bug-finding by teams that succeeded at both building and breaking. The contest format itself is the new piece: teams build for correctness, performance, and security on three problems, then break each other's code, with any language allowed. Across 156 teams they get quantitative correlations that earlier break-only contests did not produce in this way. That dual-role finding is a concrete observation worth noting. The work does a reasonable job of generating data on factors that line up with secure outcomes under their rules. The soft spot is external validity. The problems are small and fully specified, breaking happens under time limits, participants self-select and choose languages freely, and the abstract gives no details on flaw classification, statistical controls, or team experience matching. If memory-safety issues are simply easier to trigger quickly in this format, the 11x multiplier may not reflect inherent language properties in larger or less constrained code. The stress-test note lands on the actual weakness here. This is for researchers tracking empirical work on secure development practices or language effects. A reader already interested in contest designs or builder-breaker overlap would extract some usable observations, though they would need the full methods to assess the numbers. It deserves peer review. The scale of the experiment and the topic make referee input worthwhile even if the claims require more support on generalizability and controls.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Build-it, Break-it, Fix-it (BIBIFI) contest to evaluate secure software development through teams building specified software and then attempting to break others' submissions. Across three contests with 156 teams and three programming problems, the authors report that C/C++ submissions were the most efficient but that code in statically type-safe languages was 11 times less likely to contain security flaws; additionally, teams successful at both building and breaking performed significantly better at discovering security bugs.

Significance. If the quantitative claims hold after methodological clarification, the work provides empirical evidence on language choice and dual build/break experience as correlates of secure development outcomes. The open-ended contest format, permitting arbitrary languages, tools, and processes, is a strength that distinguishes it from more constrained studies and enables observation of real correlations in a semi-controlled setting with a sizable participant pool.

major comments (2)

[Abstract] Abstract: The central claim that 'submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions' is presented with no accompanying information on security flaw classification criteria, inter-rater reliability, the statistical model or controls used to compute the ratio, or any uncertainty estimates, which are required to assess the robustness of this load-bearing quantitative result.
[Results section] Results section: No analysis or discussion addresses potential confounds such as self-selection bias, correlation between language choice and prior team experience, or differential problem fit, any of which could produce the observed 11x difference without reflecting inherent language security properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below. We will revise the manuscript to provide greater methodological transparency in the abstract and to add explicit discussion of potential confounds.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'submissions coded in a statically-type safe language were 11 times less likely to have a security flaw than C/C++ submissions' is presented with no accompanying information on security flaw classification criteria, inter-rater reliability, the statistical model or controls used to compute the ratio, or any uncertainty estimates, which are required to assess the robustness of this load-bearing quantitative result.

Authors: The abstract is space-constrained, but the full manuscript details the flaw classification criteria (based on CWE categories for memory safety, injection, and access control issues), reports inter-rater agreement via Cohen's kappa in the 'Security Flaw Classification' subsection, describes the negative binomial regression model with controls for team experience and problem, and supplies 95% confidence intervals for the incidence rate ratio. We will revise the abstract to include a short parenthetical note on the model and uncertainty to improve accessibility without exceeding length limits. revision: yes
Referee: [Results section] Results section: No analysis or discussion addresses potential confounds such as self-selection bias, correlation between language choice and prior team experience, or differential problem fit, any of which could produce the observed 11x difference without reflecting inherent language security properties.

Authors: We agree that the results section would benefit from explicit treatment of these issues. The regression already includes controls for self-reported prior experience and problem type, but we did not dedicate space to self-selection or differential problem fit. We will add a dedicated limitations paragraph in the Discussion acknowledging these confounds, noting that the contest's open-ended design and pre-contest surveys provide partial mitigation, while recognizing that observational data cannot fully eliminate them. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observations from contest data stand alone

full rationale

The paper reports direct quantitative findings from three BIBIFI contests (156 teams, three problems): C/C++ submissions were most efficient but statically safe languages showed 11x fewer security flaws, and dual build/break teams performed better at bug finding. These are presented as observed correlations in the collected submissions and break attempts, with no equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations. The central multiplier is a straightforward ratio computed from flaw counts in the contest data, not derived from prior author work or ansatzes. Generalization concerns exist but are external-validity issues, not circularity. The derivation chain is self-contained as raw empirical reporting.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central quantitative claims rest on the untested premise that contest outcomes validly proxy real-world secure development; the three problems are treated as representative without independent justification.

axioms (2)

domain assumption The three programming problems used in the contests are representative of typical secure development challenges.
All reported correlations derive from performance on these specific problems.
domain assumption Security flaws discovered in the break-it phase accurately reflect the security properties of the built submissions.
The contest equates breaking success with security measurement.

pith-pipeline@v0.9.0 · 5746 in / 1143 out tokens · 49710 ms · 2026-05-25T10:34:51.898958+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages

[1]

Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. 2009. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC) 13, 1 (2009), 4:1–4:40

work page 2009
[2]

Y. Acar, M. Backes, S. Fahl, S. Garfinkel, D. Kim, M. L. Mazurek, and C. Stransky. 2017. Comparing the Usability of Cryptographic APIs. In 2017 IEEE Symposium on Security and Privacy (SP)

work page 2017
[3]

Mazurek, and Sascha Fahl

Yasemin Acar, Christian Stransky, Dominik Wermke, Michelle L. Mazurek, and Sascha Fahl. 2017. Security Developer Studies with GitHub Users: Exploring a Convenience Sample. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017)

work page 2017
[4]

Mazurek, and Sascha Fahl

Yasemin Acar, Christian Stransky, Dominik Wermke, Charles Weir, Michelle L. Mazurek, and Sascha Fahl. 2017. Developers Need Support Too: A Survey of Security Advice for Software Developers. In IEEE Secure Development Conference (SecDev 2017)

work page 2017
[5]

acm [n. d.]. The ACM-ICPC International Collegiate Programming Contest. http://icpc.baylor.edu. ([n. d.])

work page
[6]

American Fuzzing Lop (AFL)

AFL 2018. American Fuzzing Lop (AFL). http://lcamtuf.coredump.cx/afl/. (2018)

work page 2018
[7]

Daniele Antonioli, Hamid Reza Ghaeini, Sridhar Adepu, Martin Ochoa, and Nils Ole Tippenhauer. 2017. Gamifying ICS Security Training and Research: Design, Implementation, and Results of S3. In Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy (CPS ’17)

work page 2017
[8]

Angela Sasse

Ingolf Becker, Simon Parkin, and M. Angela Sasse. 2017. Finding Security Champions in Blends of Security Culture. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society

work page 2017
[9]

Daniel J Bernstein, Tanja Lange, and Peter Schwabe. 2012. The security impact of a new cryptographic library. In International Conference on Cryptology and Information Security in Latin America . Springer, 159–176

work page 2012
[10]

Black, Lee Badger, Barbara Guttman, and Elizabeth Fong

Paul E. Black, Lee Badger, Barbara Guttman, and Elizabeth Fong. 2016. Dramatically Reducing Software Vulnerabilities: Report to the White House Office of Science and Technology Policy. Technical Report Draft NISTIR 8151. National Institute of Standards and Technology. http://csrc.nist.gov/publications/drafts/nistir-8151/nistir8151_draft.pdf

work page 2016
[11]

Kevin Bock, George Hughey, and Dave Levin. 2018. King of the Hill: A Novel Cybersecurity Competition for Teaching Penetration Testing. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)

work page 2018
[12]

bsimm [n. d.]. Building Security In Maturity Model (BSIMM). http://bsimm.com. ([n. d.])

work page
[13]

Kenneth P Burnham, David R Anderson, and Kathryn P Huyvaert. 2011. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology 65, 1 (2011), 23–35

work page 2011
[14]

Peter Chapman, Jonathan Burket, and David Brumley. 2014. PicoCTF: A game-based computer security competition for high school students. In 2014 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 14)

work page 2014
[15]

Brian Chess and Jacob West. 2007. Secure Programming with Static Analysis . Addison-Wesley

work page 2007
[16]

Nicholas Childers, Bryce Boe, Lorenzo Cavallaro, Ludovico Cavedon, Marco Cova, Manuel Egele, and Giovanni Vigna

work page
[17]

In DIMV A

Organizing Large Scale Hacking Competitions. In DIMV A

work page
[18]

Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences . Lawrence Erlbaum Associates

work page 1988
[19]

Art Conklin. 2005. The Use of a Collegiate Cyber Defense Competition in Information Security Education. InInfoSecCD

work page 2005
[20]

Art Conklin. 2006. Cyber defense competitions and information security education: An active learning solution for a capstone course. In HICSS

work page 2006
[21]

Gregory Conti, Thomas Babbitt, and John Nelson. 2011. Hacking competitions and their untapped potential for security education. Security & Privacy 9, 3 (2011), 56–59. ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. 39:34 J. Parker et al

work page 2011
[22]

DEF CON Communications, Inc. [n. d.]. Capture the Flag Archive. https://www.defcon.org/html/links/dc-ctf.html. ([n. d.])

work page
[23]

Adam Doupé, Manuel Egele, Benjamin Caillat, Gianluca Stringhini, Gorkem Yakin, Ali Zand, Ludovico Cavedon, and Giovanni Vigna. 2011. Hit ’Em Where It Hurts: A Live Security Exercise on Cyber Situational Awareness. InACSAC

work page 2011
[24]

dragostech.com inc. [n. d.]. CanSecWest Applied Security Conference. http://cansecwest.com. ([n. d.])

work page
[25]

Chris Eagle. 2013. Computer security competitions: Expanding educational outcomes. Security & Privacy 11, 4 (2013), 69–71

work page 2013
[26]

Anne Edmundson, Brian Holtkamp, Emanuel Rivera, Matthew Finifter, Adrian Mettler, and David Wagner. 2013. An Empirical Study on the Effectiveness of Security Code Review. In International Symposium on Engineering Secure Software and Systems (ESSoS)

work page 2013
[27]

Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An empirical study of cryptographic misuse in android applications. In the 2013 ACM SIGSAC conference . ACM Press, 73–84. https://www.cs.ucsb.edu/ ~chris/research/doc/ccs13_cryptolint.pdf

work page 2013
[28]

Sascha Fahl, Marian Harbach, Henning Perl, Markus Koetter, and Matthew Smith. 2013. Rethinking SSL development in an appified world. In Proc. ACM CCS. http://dl.acm.org/citation.cfm?doid=2508859.2516655

work page arXiv 2013
[29]

Matthew Finifter and David Wagner. 2011. Exploring the relationship betweenweb application development tools and security. In USENIX Conference on Web Application Development (WebApps)

work page 2011
[30]

Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. 2012. The most dangerous code in the world: validating SSL certificates in non-browser software. In CCS ’12: Proceedings of the 2012 ACM conference on Computer and communications security . ACM. https://doi.org/10.1145/2382196.2382204

work page doi:10.1145/2382196.2382204 2012
[31]

git [n. d.]. Git – distributed version control management system. http://git-scm.com. ([n. d.])

work page
[32]

google [n. d.]. Google Code Jam. http://code.google.com/codejam. ([n. d.])

work page
[33]

Keith Harrison and Gregory White. 2010. An empirical study on the effectiveness of common security measures. In Hawaii International Conference on System Sciences (HICSS)

work page 2010
[34]

Hoffman, Tim Rosenberg, and Ronald Dodge

Lance J. Hoffman, Tim Rosenberg, and Ronald Dodge. 2005. Exploring a national cybersecurity exercise for universities. Security & Privacy 3, 5 (2005), 27–33

work page 2005
[35]

Michael Howard and David LeBlanc. 2003. Writing Secure Code. Microsoft Press

work page 2003
[36]

Michael Howard and Steve Lipner. 2006. The Security Development Lifecycle . Microsoft Press

work page 2006
[37]

icfp [n. d.]. ICFP Programming Contest. http://icfpcontest.org. ([n. d.])

work page
[38]

DEF CON Communications Inc. [n. d.]. DEF CON Hacking Conference. http://www.defcon.org. ([n. d.])

work page
[39]

Queena Kim. 2014. Want to learn cybersecurity? Head to Def Con. http://www.marketplace.org/2014/08/25/tech/ want-learn-cybersecurity-head-def-con. (2014)

work page 2014
[40]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18)

work page 2018
[41]

Gary McGraw. 2006. Software Security: Building Security In . Addison-Wesley

work page 2006
[42]

mdc3 [n. d.]. Maryland Cyber Challenge & Competition. http://www.fbcinc.com/e/cybermdconference/competitorinfo. aspx. ([n. d.])

work page
[43]

David Molnar, Xue Cong Li, and David A. Wagner. 2009. Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. In USENIX Security Symposium

work page 2009
[44]

N J D Nagelkerke. 1991. A note on a general definition of the coefficient of determination. Biometrika 78, 3 (09 1991), 691–692

work page 1991
[45]

National Collegiate Cyber Defense Competition. [n. d.]. http://www.nationalccdc.org. ([n. d.])

work page
[46]

Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots. In ACSAC

work page 2014
[47]

DeLong, Justin Cappos, and Yuriy Brun

Daniela Seabra Oliveira, Tian Lin, Muhammad Sajidur Rahman, Rad Akefirad, Donovan Ellis, Eliany Perez, Rahul Bobhate, Lois A. DeLong, Justin Cappos, and Yuriy Brun. 2018. API Blindspots: Why Experienced Developers Write Vulnerable Code. In Fourteenth Symposium on Usable Privacy and Security (SOUPS 2018)

work page 2018
[48]

OWASP. 2010. Secure Coding Practices - Quick Reference Guide. (2010). https://www.owasp.org/images/0/08/OWASP_ SCP_Quick_Reference_Guide_v2.pdf

work page 2010
[49]

James Parker, Niki Vazou, and Michael Hicks. 2019. LWeb: Information Flow Security for Multi-tier Web Applications. Proc. ACM Program. Lang. 3, POPL (Jan. 2019)

work page 2019
[50]

Van-Thuan Pham, Sakaar Khurana, Subhajit Roy, and Abhik Roychoudhury. 2017. Bucketing Failing Tests via Symbolic Analysis. In International Conference on Fundeamental Approaches to Software Engineering (FASE)

work page 2017
[51]

Polytechnic Institute of New York University. [n. d.]. CSAW - CyberSecurity Competition 2012. http://www.poly.edu/ csaw2012/csaw-CTF. ([n. d.]). ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. Build It, Break It, Fix It 39:35

work page 2012
[52]

Prechelt

L. Prechelt. 2011. Plat_Forms: A web development platform comparison by an exploratory experiment searching for emergent platform properties. IEEE Transactions on Software Engineering 37, 1 (2011), 95–108

work page 2011
[53]

psql [n. d.]. PostgreSQL: The world’s most advanced open source database. http://www.postgresql.org. ([n. d.])

work page
[54]

Mazurek, and Piotr Mardziel

Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Michelle L. Mazurek, and Piotr Mardziel. 2016. Build It, Break It, Fix It: Contesting Secure Development. In CCS

work page 2016
[55]

Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Atif Memon, Jandelyn Plane, and Piotr Mardziel. 2015. Build It Break It: Measuring and Comparing Development Security. In CSET

work page 2015
[56]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. 1975. The Protection of Information in Computer Systems. Proc. IEEE 63, 9 (1975), 1278–1308

work page 1975
[57]

Riccardo Scandariato, James Walden, and Wouter Joosen. 2013. Static analysis versus penetration testing: A controlled experiment. In IEEE International Symposium on Reliability Engineering (ISSRE)

work page 2013
[58]

Robert C. Seacord. 2013. Secure Coding in C and C++ . Addison-Wesley

work page 2013
[59]

Deian Stefan, Alejandro Russo, John Mitchell, and David Mazieres. 2011. Flexible Dynamic Information Flow Control in Haskell. In ACM SIGPLAN Haskell Symposium

work page 2011
[60]

Redmiles, Michael Backes, Simson Garfinkel, Michelle L

Christian Stransky, Yasemin Acar, Duc Cuong Nguyen, Dominik Wermke, Doowon Kim, Elissa M. Redmiles, Michael Backes, Simson Garfinkel, Michelle L. Mazurek, and Sascha Fahl. 2017. Lessons Learned from Using an Online Platform to Conduct Large-Scale, Online Controlled Security Experiments with Software Developers. In 10th USENIX Workshop on Cyber Security Ex...

work page 2017
[61]

Positive Technologies. 2018. ATM logic attacks: scenarios, 2018. https://www.ptsecurity.com/upload/corporate/ww-en/ analytics/ATM-Vulnerabilities-2018-eng.pdf. (Nov. 2018)

work page 2018
[62]

Christopher Thompson and David Wagner. 2017. A Large-Scale Study of Modern Code Review and Security in Open Source Projects. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE)

work page 2017
[63]

topcoder [n. d.]. Top Coder competitions. http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview. ([n. d.])

work page
[64]

Erik Trickel, Francesco Disperati, Eric Gustafson, Faezeh Kalantari, Mike Mabey, Naveen Tiwari, Yeganeh Safaei, Adam Doupé, and Giovanni Vigna. 2017. Shell We Play A Game? CTF-as-a-service for Security Education. In 2017 USENIX Workshop on Advances in Security Education (ASE 17)

work page 2017
[65]

Úlfar Erlingsson. 2012. personal communication stating that CFI was not deployed at Microsoft due to its overhead exceeding 10%. (2012)

work page 2012
[66]

Rijnard van Tonder, John Kotheimer, and Claire Le Goues. 2018. Semantic Crash Bucketing. In IEEE International Conference on Automated Software Engineering (ASE)

work page 2018
[67]

John Viega and Gary McGraw. 2001. Building Secure Software: How to A void Security Problems the Right Way . Addison- Wesley

work page 2001
[68]

James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting Vulnerable Components: Software Metrics vs Text Mining. In IEEE International Symposium on Software Reliability Engineering

work page 2014
[69]

Charles Weir, Awais Rashid, and James Noble. 2017. I’d Like to Have an Argument, Please: Using Dialectic for Effective App Security. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society

work page 2017
[70]

SeongIl Wi, Jaeseung Choi, and Sang Kil Cha. 2018. Git-based CTF: A Simple and Effective Approach to Organizing In-Course Attack-and-Defense Security Competition. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)

work page 2018
[71]

Glenn Wurster and P C van Oorschot. 2008. The developer is the enemy. In NSPW. 89

work page 2008
[72]

J Xie, H R Lipford, and B Chu. 2011. Why do programmers make security errors?. In 2011 IEEE Symposium on Visual Languages and Human-Centric Computing . http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6070393

work page 2011
[73]

Muhammad Mudassar Yamin, Basel Katt, Espen Torseth, Vasileios Gkioulos, and Stewart James Kowalski. 2018. Make It and Break It: An IoT Smart Home Testbed Case Study. InProceedings of the 2Nd International Symposium on Computer Science and Intelligent Control (ISCSIC ’18)

work page 2018
[74]

Joonseok Yang, Duksan Ryu, and Jongmoon Baik. 2016. Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In International Conference on Big Data and Smart Computing (BigComp)

work page 2016
[75]

yesodweb [n. d.]. Yesod Web Framework for Haskell. http://www.yesodweb.com. ([n. d.]). Received February 2007; revised March 2009; accepted June 2009 ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010

work page 2007

[1] [1]

Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. 2009. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC) 13, 1 (2009), 4:1–4:40

work page 2009

[2] [2]

Y. Acar, M. Backes, S. Fahl, S. Garfinkel, D. Kim, M. L. Mazurek, and C. Stransky. 2017. Comparing the Usability of Cryptographic APIs. In 2017 IEEE Symposium on Security and Privacy (SP)

work page 2017

[3] [3]

Mazurek, and Sascha Fahl

Yasemin Acar, Christian Stransky, Dominik Wermke, Michelle L. Mazurek, and Sascha Fahl. 2017. Security Developer Studies with GitHub Users: Exploring a Convenience Sample. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017)

work page 2017

[4] [4]

Mazurek, and Sascha Fahl

Yasemin Acar, Christian Stransky, Dominik Wermke, Charles Weir, Michelle L. Mazurek, and Sascha Fahl. 2017. Developers Need Support Too: A Survey of Security Advice for Software Developers. In IEEE Secure Development Conference (SecDev 2017)

work page 2017

[5] [5]

acm [n. d.]. The ACM-ICPC International Collegiate Programming Contest. http://icpc.baylor.edu. ([n. d.])

work page

[6] [6]

American Fuzzing Lop (AFL)

AFL 2018. American Fuzzing Lop (AFL). http://lcamtuf.coredump.cx/afl/. (2018)

work page 2018

[7] [7]

Daniele Antonioli, Hamid Reza Ghaeini, Sridhar Adepu, Martin Ochoa, and Nils Ole Tippenhauer. 2017. Gamifying ICS Security Training and Research: Design, Implementation, and Results of S3. In Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy (CPS ’17)

work page 2017

[8] [8]

Angela Sasse

Ingolf Becker, Simon Parkin, and M. Angela Sasse. 2017. Finding Security Champions in Blends of Security Culture. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society

work page 2017

[9] [9]

Daniel J Bernstein, Tanja Lange, and Peter Schwabe. 2012. The security impact of a new cryptographic library. In International Conference on Cryptology and Information Security in Latin America . Springer, 159–176

work page 2012

[10] [10]

Black, Lee Badger, Barbara Guttman, and Elizabeth Fong

Paul E. Black, Lee Badger, Barbara Guttman, and Elizabeth Fong. 2016. Dramatically Reducing Software Vulnerabilities: Report to the White House Office of Science and Technology Policy. Technical Report Draft NISTIR 8151. National Institute of Standards and Technology. http://csrc.nist.gov/publications/drafts/nistir-8151/nistir8151_draft.pdf

work page 2016

[11] [11]

Kevin Bock, George Hughey, and Dave Levin. 2018. King of the Hill: A Novel Cybersecurity Competition for Teaching Penetration Testing. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)

work page 2018

[12] [12]

bsimm [n. d.]. Building Security In Maturity Model (BSIMM). http://bsimm.com. ([n. d.])

work page

[13] [13]

Kenneth P Burnham, David R Anderson, and Kathryn P Huyvaert. 2011. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology 65, 1 (2011), 23–35

work page 2011

[14] [14]

Peter Chapman, Jonathan Burket, and David Brumley. 2014. PicoCTF: A game-based computer security competition for high school students. In 2014 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 14)

work page 2014

[15] [15]

Brian Chess and Jacob West. 2007. Secure Programming with Static Analysis . Addison-Wesley

work page 2007

[16] [16]

Nicholas Childers, Bryce Boe, Lorenzo Cavallaro, Ludovico Cavedon, Marco Cova, Manuel Egele, and Giovanni Vigna

work page

[17] [17]

In DIMV A

Organizing Large Scale Hacking Competitions. In DIMV A

work page

[18] [18]

Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences . Lawrence Erlbaum Associates

work page 1988

[19] [19]

Art Conklin. 2005. The Use of a Collegiate Cyber Defense Competition in Information Security Education. InInfoSecCD

work page 2005

[20] [20]

Art Conklin. 2006. Cyber defense competitions and information security education: An active learning solution for a capstone course. In HICSS

work page 2006

[21] [21]

Gregory Conti, Thomas Babbitt, and John Nelson. 2011. Hacking competitions and their untapped potential for security education. Security & Privacy 9, 3 (2011), 56–59. ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. 39:34 J. Parker et al

work page 2011

[22] [22]

DEF CON Communications, Inc. [n. d.]. Capture the Flag Archive. https://www.defcon.org/html/links/dc-ctf.html. ([n. d.])

work page

[23] [23]

Adam Doupé, Manuel Egele, Benjamin Caillat, Gianluca Stringhini, Gorkem Yakin, Ali Zand, Ludovico Cavedon, and Giovanni Vigna. 2011. Hit ’Em Where It Hurts: A Live Security Exercise on Cyber Situational Awareness. InACSAC

work page 2011

[24] [24]

dragostech.com inc. [n. d.]. CanSecWest Applied Security Conference. http://cansecwest.com. ([n. d.])

work page

[25] [25]

Chris Eagle. 2013. Computer security competitions: Expanding educational outcomes. Security & Privacy 11, 4 (2013), 69–71

work page 2013

[26] [26]

Anne Edmundson, Brian Holtkamp, Emanuel Rivera, Matthew Finifter, Adrian Mettler, and David Wagner. 2013. An Empirical Study on the Effectiveness of Security Code Review. In International Symposium on Engineering Secure Software and Systems (ESSoS)

work page 2013

[27] [27]

Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An empirical study of cryptographic misuse in android applications. In the 2013 ACM SIGSAC conference . ACM Press, 73–84. https://www.cs.ucsb.edu/ ~chris/research/doc/ccs13_cryptolint.pdf

work page 2013

[28] [28]

Sascha Fahl, Marian Harbach, Henning Perl, Markus Koetter, and Matthew Smith. 2013. Rethinking SSL development in an appified world. In Proc. ACM CCS. http://dl.acm.org/citation.cfm?doid=2508859.2516655

work page arXiv 2013

[29] [29]

Matthew Finifter and David Wagner. 2011. Exploring the relationship betweenweb application development tools and security. In USENIX Conference on Web Application Development (WebApps)

work page 2011

[30] [30]

Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. 2012. The most dangerous code in the world: validating SSL certificates in non-browser software. In CCS ’12: Proceedings of the 2012 ACM conference on Computer and communications security . ACM. https://doi.org/10.1145/2382196.2382204

work page doi:10.1145/2382196.2382204 2012

[31] [31]

git [n. d.]. Git – distributed version control management system. http://git-scm.com. ([n. d.])

work page

[32] [32]

google [n. d.]. Google Code Jam. http://code.google.com/codejam. ([n. d.])

work page

[33] [33]

Keith Harrison and Gregory White. 2010. An empirical study on the effectiveness of common security measures. In Hawaii International Conference on System Sciences (HICSS)

work page 2010

[34] [34]

Hoffman, Tim Rosenberg, and Ronald Dodge

Lance J. Hoffman, Tim Rosenberg, and Ronald Dodge. 2005. Exploring a national cybersecurity exercise for universities. Security & Privacy 3, 5 (2005), 27–33

work page 2005

[35] [35]

Michael Howard and David LeBlanc. 2003. Writing Secure Code. Microsoft Press

work page 2003

[36] [36]

Michael Howard and Steve Lipner. 2006. The Security Development Lifecycle . Microsoft Press

work page 2006

[37] [37]

icfp [n. d.]. ICFP Programming Contest. http://icfpcontest.org. ([n. d.])

work page

[38] [38]

DEF CON Communications Inc. [n. d.]. DEF CON Hacking Conference. http://www.defcon.org. ([n. d.])

work page

[39] [39]

Queena Kim. 2014. Want to learn cybersecurity? Head to Def Con. http://www.marketplace.org/2014/08/25/tech/ want-learn-cybersecurity-head-def-con. (2014)

work page 2014

[40] [40]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18)

work page 2018

[41] [41]

Gary McGraw. 2006. Software Security: Building Security In . Addison-Wesley

work page 2006

[42] [42]

mdc3 [n. d.]. Maryland Cyber Challenge & Competition. http://www.fbcinc.com/e/cybermdconference/competitorinfo. aspx. ([n. d.])

work page

[43] [43]

David Molnar, Xue Cong Li, and David A. Wagner. 2009. Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. In USENIX Security Symposium

work page 2009

[44] [44]

N J D Nagelkerke. 1991. A note on a general definition of the coefficient of determination. Biometrika 78, 3 (09 1991), 691–692

work page 1991

[45] [45]

National Collegiate Cyber Defense Competition. [n. d.]. http://www.nationalccdc.org. ([n. d.])

work page

[46] [46]

Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots. In ACSAC

work page 2014

[47] [47]

DeLong, Justin Cappos, and Yuriy Brun

Daniela Seabra Oliveira, Tian Lin, Muhammad Sajidur Rahman, Rad Akefirad, Donovan Ellis, Eliany Perez, Rahul Bobhate, Lois A. DeLong, Justin Cappos, and Yuriy Brun. 2018. API Blindspots: Why Experienced Developers Write Vulnerable Code. In Fourteenth Symposium on Usable Privacy and Security (SOUPS 2018)

work page 2018

[48] [48]

OWASP. 2010. Secure Coding Practices - Quick Reference Guide. (2010). https://www.owasp.org/images/0/08/OWASP_ SCP_Quick_Reference_Guide_v2.pdf

work page 2010

[49] [49]

James Parker, Niki Vazou, and Michael Hicks. 2019. LWeb: Information Flow Security for Multi-tier Web Applications. Proc. ACM Program. Lang. 3, POPL (Jan. 2019)

work page 2019

[50] [50]

Van-Thuan Pham, Sakaar Khurana, Subhajit Roy, and Abhik Roychoudhury. 2017. Bucketing Failing Tests via Symbolic Analysis. In International Conference on Fundeamental Approaches to Software Engineering (FASE)

work page 2017

[51] [51]

Polytechnic Institute of New York University. [n. d.]. CSAW - CyberSecurity Competition 2012. http://www.poly.edu/ csaw2012/csaw-CTF. ([n. d.]). ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010. Build It, Break It, Fix It 39:35

work page 2012

[52] [52]

Prechelt

L. Prechelt. 2011. Plat_Forms: A web development platform comparison by an exploratory experiment searching for emergent platform properties. IEEE Transactions on Software Engineering 37, 1 (2011), 95–108

work page 2011

[53] [53]

psql [n. d.]. PostgreSQL: The world’s most advanced open source database. http://www.postgresql.org. ([n. d.])

work page

[54] [54]

Mazurek, and Piotr Mardziel

Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Michelle L. Mazurek, and Piotr Mardziel. 2016. Build It, Break It, Fix It: Contesting Secure Development. In CCS

work page 2016

[55] [55]

Andrew Ruef, Michael Hicks, James Parker, Dave Levin, Atif Memon, Jandelyn Plane, and Piotr Mardziel. 2015. Build It Break It: Measuring and Comparing Development Security. In CSET

work page 2015

[56] [56]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. 1975. The Protection of Information in Computer Systems. Proc. IEEE 63, 9 (1975), 1278–1308

work page 1975

[57] [57]

Riccardo Scandariato, James Walden, and Wouter Joosen. 2013. Static analysis versus penetration testing: A controlled experiment. In IEEE International Symposium on Reliability Engineering (ISSRE)

work page 2013

[58] [58]

Robert C. Seacord. 2013. Secure Coding in C and C++ . Addison-Wesley

work page 2013

[59] [59]

Deian Stefan, Alejandro Russo, John Mitchell, and David Mazieres. 2011. Flexible Dynamic Information Flow Control in Haskell. In ACM SIGPLAN Haskell Symposium

work page 2011

[60] [60]

Redmiles, Michael Backes, Simson Garfinkel, Michelle L

Christian Stransky, Yasemin Acar, Duc Cuong Nguyen, Dominik Wermke, Doowon Kim, Elissa M. Redmiles, Michael Backes, Simson Garfinkel, Michelle L. Mazurek, and Sascha Fahl. 2017. Lessons Learned from Using an Online Platform to Conduct Large-Scale, Online Controlled Security Experiments with Software Developers. In 10th USENIX Workshop on Cyber Security Ex...

work page 2017

[61] [61]

Positive Technologies. 2018. ATM logic attacks: scenarios, 2018. https://www.ptsecurity.com/upload/corporate/ww-en/ analytics/ATM-Vulnerabilities-2018-eng.pdf. (Nov. 2018)

work page 2018

[62] [62]

Christopher Thompson and David Wagner. 2017. A Large-Scale Study of Modern Code Review and Security in Open Source Projects. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE)

work page 2017

[63] [63]

topcoder [n. d.]. Top Coder competitions. http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview. ([n. d.])

work page

[64] [64]

Erik Trickel, Francesco Disperati, Eric Gustafson, Faezeh Kalantari, Mike Mabey, Naveen Tiwari, Yeganeh Safaei, Adam Doupé, and Giovanni Vigna. 2017. Shell We Play A Game? CTF-as-a-service for Security Education. In 2017 USENIX Workshop on Advances in Security Education (ASE 17)

work page 2017

[65] [65]

Úlfar Erlingsson. 2012. personal communication stating that CFI was not deployed at Microsoft due to its overhead exceeding 10%. (2012)

work page 2012

[66] [66]

Rijnard van Tonder, John Kotheimer, and Claire Le Goues. 2018. Semantic Crash Bucketing. In IEEE International Conference on Automated Software Engineering (ASE)

work page 2018

[67] [67]

John Viega and Gary McGraw. 2001. Building Secure Software: How to A void Security Problems the Right Way . Addison- Wesley

work page 2001

[68] [68]

James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting Vulnerable Components: Software Metrics vs Text Mining. In IEEE International Symposium on Software Reliability Engineering

work page 2014

[69] [69]

Charles Weir, Awais Rashid, and James Noble. 2017. I’d Like to Have an Argument, Please: Using Dialectic for Effective App Security. In 2nd European Workshop on Usable Security (Euro USEC 2017) . Internet Society

work page 2017

[70] [70]

SeongIl Wi, Jaeseung Choi, and Sang Kil Cha. 2018. Git-based CTF: A Simple and Effective Approach to Organizing In-Course Attack-and-Defense Security Competition. In 2018 USENIX Workshop on Advances in Security Education (ASE 18)

work page 2018

[71] [71]

Glenn Wurster and P C van Oorschot. 2008. The developer is the enemy. In NSPW. 89

work page 2008

[72] [72]

J Xie, H R Lipford, and B Chu. 2011. Why do programmers make security errors?. In 2011 IEEE Symposium on Visual Languages and Human-Centric Computing . http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6070393

work page 2011

[73] [73]

Muhammad Mudassar Yamin, Basel Katt, Espen Torseth, Vasileios Gkioulos, and Stewart James Kowalski. 2018. Make It and Break It: An IoT Smart Home Testbed Case Study. InProceedings of the 2Nd International Symposium on Computer Science and Intelligent Control (ISCSIC ’18)

work page 2018

[74] [74]

Joonseok Yang, Duksan Ryu, and Jongmoon Baik. 2016. Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In International Conference on Big Data and Smart Computing (BigComp)

work page 2016

[75] [75]

yesodweb [n. d.]. Yesod Web Framework for Haskell. http://www.yesodweb.com. ([n. d.]). Received February 2007; revised March 2009; accepted June 2009 ACM Transactions on Privacy and Security, Vol. 9, No. 4, Article 39. Publication date: March 2010

work page 2007