One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption
Pith reviewed 2026-05-07 07:56 UTC · model grok-4.3
The pith
Nygard's ADR template outperforms MADR on overall comprehension, usability, and adoption scores in a controlled comparison of five options.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
After expert DESMET Feature Analysis selected Nygard and MADR as the top two templates among Tyree/Akerman, Nygard, arc42, Y-statements, and MADR, a controlled experiment with undergraduate students showed Nygard's template achieving a higher Overall Score. Qualitative feedback indicated that Nygard encourages concise and objective documentation while MADR better supports structural details and specific architectural requirements. The authors conclude that these differences provide a practical decision aid for selecting ADR templates that reduce overhead and improve architectural knowledge retention.
What carries the argument
A two-step empirical process: expert DESMET Feature Analysis to rank five ADR templates, followed by a controlled experiment measuring comprehension, usability, and ease of adoption on the top two (Nygard and MADR).
If this is right
- Teams needing quick, objective records can adopt Nygard's template to reduce documentation time.
- Projects requiring explicit structural and requirement details can choose MADR without sacrificing the other templates' lower performance.
- The comparison supplies a concrete decision guide that aligns template choice with project constraints to limit overhead.
- Adopting the higher-scoring template should improve retention of architectural knowledge across team changes.
Where Pith is reading between the lines
- Repeating the tasks with industry professionals on evolving systems could reveal whether the preference for conciseness holds when long-term maintenance is the main use case.
- Embedding the preferred template into common issue trackers or wikis might increase adoption rates beyond what voluntary use achieves.
- The same evaluation approach could be applied to newer or domain-specific templates to keep the decision guide current.
Load-bearing premise
Undergraduate students completing a lab task accurately represent how professional developers would comprehend, use, and prefer these templates during real project maintenance.
What would settle it
A replication study using professional developers on live codebases that measures the same overall score and finds MADR preferred or equal to Nygard.
Figures
read the original abstract
Context: Documenting Architectural Design Decisions (ADDs) is a critical factor in the software lifecycle, essential for efficient system maintenance, developer onboarding, and preventing knowledge vaporization. Although various templates for Architectural Decision Records (ADRs) have been proposed, there is a lack of empirical evidence comparing them. Goal: To address this gap, this paper aims to identify which ADR template best supports comprehension, usability, and ease of adoption: Tyree/Akerman's template, Nygard's ADR, arc42, Y-statements, and MADR. Method: We compared these templates using the DESMET FA method in a two-step evaluation. First, the two primary authors evaluated the five templates through the DESMET FA, based on their software architecture expertise. The two top-performing templates were then used as treatments in a controlled experiment conducted with undergraduate students. Results: In the preliminary screening by experts, the top-performing templates were those of Nygard and MADR. In the subsequent controlled experiment, Nygard's template outperformed MADR in terms of the Overall Score. Qualitative analysis of participant feedback revealed the factors influencing template preference. The findings indicate that Nygard supports concise and objective documentation, while MADR facilitates structural details and specific architectural requirements. Conclusion: This paper provides an evidence-based strategy for ADR template adoption by offering a comparison between them. The findings present a decision-making guide that assists practitioners and researchers in selecting ADR templates aligned with project constraints, aiming to minimize documentation overhead and increase architectural knowledge retention.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a two-phase empirical comparison of five ADR templates (Tyree/Akerman, Nygard, arc42, Y-statements, MADR) using the DESMET Feature Analysis method. The authors first perform an expert screening, identifying Nygard and MADR as top performers, then conduct a controlled experiment with undergraduate students comparing these two, finding that Nygard's template yields a higher overall score in terms of comprehension, usability, and ease of adoption. Qualitative feedback suggests Nygard supports concise documentation while MADR aids structural details. The paper concludes with a decision-making guide for practitioners selecting ADR templates.
Significance. If the findings hold after addressing methodological gaps, this work supplies useful empirical evidence on ADR template effectiveness, filling a gap in software architecture documentation research. The two-phase DESMET approach combined with qualitative analysis of participant feedback is a methodological strength that could support practitioner guidance. However, the exclusive use of undergraduate students in a lab setting substantially limits the significance for the stated goal of aiding real-world professional projects, as no validation or sensitivity analysis bridges this proxy gap.
major comments (3)
- [Section 3] Section 3 (Expert Screening): The DESMET FA screening was conducted solely by the two primary authors. This self-evaluation introduces subjectivity and potential bias in selecting Nygard and MADR for the experiment; the selection step is load-bearing because it determines which templates receive the decisive empirical test.
- [Section 4] Section 4 (Controlled Experiment): The description supplies no sample size, no statistical tests or effect sizes for the Overall Score comparison, no details on the comprehension tasks or usability instruments, and no controls for prior knowledge or order effects. These omissions make it impossible to evaluate the reliability of the central claim that Nygard outperformed MADR.
- [Section 5] Section 5 (Discussion/Conclusion): The manuscript claims to deliver an 'evidence-based strategy for ADR template adoption' for practitioners, yet contains no limitations discussion or analysis of whether undergraduate lab results generalize to professional developers facing maintenance responsibilities and time pressures. This proxy assumption is load-bearing for any practitioner-facing recommendation.
minor comments (2)
- [Abstract] Abstract: The term 'Overall Score' is used without definition or reference to its construction; a brief parenthetical or cross-reference to the metrics in Section 4 would improve clarity.
- [Tables/Figures] Figures/Tables: Any tables reporting scores or qualitative themes should include confidence intervals or inter-rater agreement metrics where applicable to strengthen the quantitative presentation.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating the revisions we plan to make. Our responses focus on strengthening the methodological transparency and acknowledging limitations without misrepresenting the study design or findings.
read point-by-point responses
-
Referee: [Section 3] Section 3 (Expert Screening): The DESMET FA screening was conducted solely by the two primary authors. This self-evaluation introduces subjectivity and potential bias in selecting Nygard and MADR for the experiment; the selection step is load-bearing because it determines which templates receive the decisive empirical test.
Authors: We acknowledge that the DESMET Feature Analysis screening was performed exclusively by the two primary authors. This approach follows the standard application of DESMET FA, which relies on domain-expert judgment for initial feature evaluation, and the authors have relevant expertise in software architecture documentation. However, we agree that greater transparency is warranted given the load-bearing nature of the selection. In the revised manuscript, we will expand Section 3 with a detailed table or breakdown showing the scores assigned to each template on every DESMET criterion. We will also add an explicit discussion of potential subjectivity and author bias to the limitations section, noting that independent expert validation of the screening could be pursued in follow-up studies. We do not plan to re-execute the screening with additional raters for this revision, as the original criteria application was consistent and documented. revision: partial
-
Referee: [Section 4] Section 4 (Controlled Experiment): The description supplies no sample size, no statistical tests or effect sizes for the Overall Score comparison, no details on the comprehension tasks or usability instruments, and no controls for prior knowledge or order effects. These omissions make it impossible to evaluate the reliability of the central claim that Nygard outperformed MADR.
Authors: We agree that the current text of Section 4 omits essential methodological details required to assess the reliability of the results. We will revise Section 4 to include the exact sample size, the specific statistical tests and effect sizes used for the Overall Score comparison, complete descriptions of the comprehension tasks and usability instruments, and the measures taken to control for prior knowledge and order effects. These additions will allow readers to fully evaluate the central claim that Nygard's template outperformed MADR on the measured dimensions. revision: yes
-
Referee: [Section 5] Section 5 (Discussion/Conclusion): The manuscript claims to deliver an 'evidence-based strategy for ADR template adoption' for practitioners, yet contains no limitations discussion or analysis of whether undergraduate lab results generalize to professional developers facing maintenance responsibilities and time pressures. This proxy assumption is load-bearing for any practitioner-facing recommendation.
Authors: We concur that the manuscript would be strengthened by an explicit limitations discussion addressing the generalizability of undergraduate lab results to professional settings. In the revised version, we will add a dedicated limitations subsection that directly discusses the use of students as proxies, the controlled lab environment versus real-world time pressures and maintenance responsibilities, and the absence of sensitivity analysis or industry validation. We will also moderate the language in the conclusion and decision-making guide to present the findings as initial empirical evidence rather than a fully validated practitioner strategy, while retaining the guide as a synthesis of the observed differences in comprehension, usability, and adoption factors. This revision will qualify the claims appropriately without altering the reported results or the two-phase DESMET approach. revision: yes
Circularity Check
No circularity: direct empirical comparison without derivations or self-referential reductions
full rationale
The paper performs a two-stage empirical evaluation: DESMET FA screening by the two primary authors followed by a controlled experiment measuring student comprehension, usability, and preference scores for the top two templates. No equations, fitted parameters, predictions, or first-principles derivations appear. Results are reported as direct measurements and qualitative feedback from participants. No self-citations form a load-bearing chain, and the templates evaluated originate from prior independent work. The derivation chain is therefore self-contained against the experiment data; the undergraduate proxy assumption affects external validity but does not create circularity by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Undergraduate students can serve as reasonable proxies for evaluating the comprehension and usability of ADR templates by professional developers.
- domain assumption The DESMET Feature Analysis method provides a reliable preliminary screening of software engineering templates before human-subject evaluation.
Reference graph
Works this paper leans on
-
[1]
Bardha Ahmeti, Maja Linder, Raffaela Groner, and Rebekka Wohlrab. 2024. Ar- chitecture decision records in practice: An action research study. InEuropean Conference on Software Architecture. Springer, 333–349
work page 2024
-
[2]
Muhammad Ali Babar, Barbara Kitchenham, and Piyush Maheshwari. 2006. As- sessing the value of architectural information extracted from patterns for archi- tecting. In10th International Conference on Evaluation and Assessment in Software Engineering (EASE). BCS Learning & Development
work page 2006
-
[3]
Terese Besker, Antonio Martini, and Jan Bosch. 2017. Impact of architectural technical debt on daily software development work—a survey of software practi- tioners. In2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 278–287
work page 2017
-
[4]
Terese Besker, Antonio Martini, and Jan Bosch. 2018. Managing architectural technical debt: A unified model and systematic literature review.Journal of Systems and Software135 (2018), 1–16
work page 2018
-
[5]
Klara Borowa, Rafał Lewanczyk, Klaudia Stpiczyńska, Patryk Stradomski, and An- drzej Zalewski. 2023. What rationales drive architectural decisions? An empirical inquiry. InEuropean Conference on Software Architecture. Springer, 303–318
work page 2023
-
[6]
Rafael Capilla, Anton Jansen, Antony Tang, Paris Avgeriou, and Muhammad Ali Babar. 2016. 10 years of software architecture knowledge management: Practice and future.Journal of Systems and Software116 (2016), 191–205
work page 2016
-
[7]
Lucas Carvalho and Tayana Conte. 2025. Software architecture decision-making process: The practitioners’ view from the Brazilian industry.Science of Computer Programming244 (2025), 103302
work page 2025
-
[8]
Paul Clements, David Garlan, Reed Little, Robert Nord, and Judith Stafford. 2003. Documenting software architectures: views and beyond. In25th International Conference on Software Engineering, 2003. Proceedings.IEEE, 740–741
work page 2003
-
[9]
2014.Basics of qualitative research: Techniques and procedures for developing grounded theory
Juliet Corbin and Anselm Strauss. 2014.Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage publications
work page 2014
-
[10]
Wei Ding, Peng Liang, Antony Tang, Hans Van Vliet, and Mojtaba Shahin
-
[11]
In2014 19th International conference on engineering of complex computer systems
How do open source communities document software architecture: An exploratory survey. In2014 19th International conference on engineering of complex computer systems. IEEE, 136–145
-
[12]
Davide Falessi, Natalia Juristo, Claes Wohlin, Burak Turhan, Jürgen Münch, Andreas Jedlitschka, and Markku Oivo. 2018. Empirical software engineering experts on the use of students and professionals in experiments.Empirical Software Engineering23, 1 (2018), 452–489
work page 2018
-
[13]
David Garlan. 2000. Software architecture: a roadmap. InProceedings of the Conference on the Future of Software Engineering. 91–101
work page 2000
-
[14]
Ibrahim Habli and Tim Kelly. 2007. Capturing and replaying architectural knowl- edge through derivational analogy. InSecond Workshop on Sharing and Reusing One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption EASE’26, June 2026, Glasgow, United Kingdom Architectural Knowledge-Architecture, Ra...
work page 2007
-
[15]
Anton Jansen and Jan Bosch. 2005. Software architecture as a set of architectural design decisions. In5th Working IEEE/IFIP Conference on Software Architecture (WICSA’05). IEEE, 109–120
work page 2005
-
[16]
Anton Jansen, Jan Van Der Ven, Paris Avgeriou, and Dieter K Hammer. 2007. Tool support for architectural decisions. In2007 Working IEEE/IFIP Conference on Software Architecture (WICSA’07). Ieee, 4–4
work page 2007
-
[17]
Barbara Kitchenham, Stephen Linkman, and David Law. 1996. DESMET: A method for evaluating software engineering methods and tools.Keele University (1996)
work page 1996
-
[18]
Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering.Empirical Software Engineering22, 2 (2017), 579–630
work page 2017
-
[19]
Oliver Kopp, Anita Armbruster, and Olaf Zimmermann. 2018. Markdown Archi- tectural Decision Records: Format and Tool Support.. InZEUS. 55–62
work page 2018
-
[20]
Helena Chmura Kraemer and David J Kupfer. 2006. Size of treatment effects and their importance to clinical research and practice.Biological psychiatry59, 11 (2006), 990–996
work page 2006
-
[21]
Philippe Kruchten. 2004. An ontology of architectural design decisions in soft- ware intensive systems. In2nd Groningen workshop on software variability. Gronin- gen, The Netherlands, 54–61
work page 2004
-
[22]
Philippe Kruchten. 2012. Strategic management of technical debt: Tutorial syn- opsis. In2012 12th International Conference on Quality Software. IEEE, 282–284
work page 2012
-
[23]
Philippe Kruchten, Patricia Lago, and Hans Van Vliet. 2006. Building up and rea- soning about architectural knowledge. InInternational conference on the quality of software architectures. Springer, 43–58
work page 2006
-
[24]
Philippe B Kruchten. 2002. The 4+ 1 view model of architecture.IEEE software 12, 6 (2002), 42–50
work page 2002
-
[25]
Mathieu Nassif and Martin P Robillard. 2025. Evaluating interactive documenta- tion for programmers.Empirical Software Engineering30, 3 (2025), 73
work page 2025
-
[26]
2011.Documenting Architecture Decisions
Michael Nygard. 2011.Documenting Architecture Decisions. Cognitect Blog. Re- trieved Janeiro 4, 2026 from https://cognitect.com/blog/2011/11/15/documenting- architecture-decisions
work page 2011
-
[27]
Dewayne E Perry and Alexander L Wolf. 1992. Foundations for the study of software architecture.ACM SIGSOFT Software engineering notes17, 4 (1992), 40–52
work page 1992
-
[28]
Mojtaba Shahin, Peng Liang, and Mohammad Reza Khayyambashi. 2009. Ar- chitectural design decision: Existing models and tools. In2009 Joint Working IEEE/IFIP Conference on Software Architecture & European Conference on Software Architecture. IEEE, 293–296
work page 2009
-
[29]
2024.arc42: The Template for Software Archi- tecture Documentation
Gernot Starke and Peter Hruschka. 2024.arc42: The Template for Software Archi- tecture Documentation. Retrieved Janeiro 4, 2026 from https://arc42.org/overview
work page 2024
-
[30]
Klaas-Jan Stol, Paris Avgeriou, and Muhammad Ali Babar. 2010. Identifying Architectural Patterns Used in Open Source Software: Approaches and Chal- lenges. In14th International Conference on Evaluation and Assessment in Soft- ware Engineering, EASE 2010, Keele University, UK, 12-13 April 2010 (Work- shops in Computing), Mark Turner and Mahmood Niazi (Eds....
work page 2010
-
[31]
Jeff Tyree and Art Akerman. 2005. Architecture decisions: Demystifying archi- tecture.IEEE software22, 2 (2005), 19–27
work page 2005
-
[32]
András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101–132
work page 2000
-
[33]
Sira Vegas, Cecilia Apa, and Natalia Juristo. 2015. Crossover designs in software engineering experiments: Benefits and perils.IEEE Transactions on Software Engineering42, 2 (2015), 120–135
work page 2015
-
[34]
Zhiyuan Wan, Yun Zhang, Xin Xia, Yi Jiang, and David Lo. 2023. Software architecture in practice: Challenges and opportunities. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1457–1469
work page 2023
-
[35]
2012.Experimentation in software engineering
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, Anders Wesslén, et al. 2012.Experimentation in software engineering. Vol. 236. Springer
work page 2012
-
[36]
Chen Yang, Peng Liang, and Paris Avgeriou. 2019. Integrating agile practices into architectural assumption management: An industrial survey. InProceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering. 156–165
work page 2019
-
[37]
Uwe Zdun, Rafael Capilla, Huy Tran, and Olaf Zimmermann. 2013. Sustainable Architectural Design Decisions.IEEE Software30, 6 (2013), 46–53. doi:10.1109/ MS.2013.97
work page 2013
-
[38]
Olaf Zimmermann, Lukas Wegmann, Heiko Koziolek, and Thomas Goldschmidt
-
[39]
In2015 12th Working IEEE/IFIP Conference on Software Architecture
Architectural decision guidance across projects-problem space modeling, decision backlog management and cloud computing knowledge. In2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 85–94
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.