pith. sign in

arxiv: 2604.27333 · v1 · submitted 2026-04-30 · 💻 cs.SE

One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

Pith reviewed 2026-05-07 07:56 UTC · model grok-4.3

classification 💻 cs.SE
keywords Architectural Decision RecordsADR templatesSoftware architecture documentationEmpirical comparisonNygard ADRMADRDESMET Feature AnalysisControlled experiment
0
0 comments X

The pith

Nygard's ADR template outperforms MADR on overall comprehension, usability, and adoption scores in a controlled comparison of five options.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Documenting the reasons behind architectural choices prevents knowledge loss during maintenance and onboarding, yet different templates for recording those decisions vary in how easy they are to read and follow. This study first screened five templates with expert reviewers using a structured feature analysis method, then ran a controlled experiment with undergraduates to compare the two highest-ranked ones on actual task performance. Nygard's template produced higher overall scores than MADR, with participants noting its support for short, objective entries versus MADR's strength in capturing detailed structure and requirements. The work therefore supplies an evidence-based guide that lets teams choose a template matching their project's need for brevity or depth. Practitioners can use the results to cut unnecessary documentation effort while preserving essential design rationale.

Core claim

After expert DESMET Feature Analysis selected Nygard and MADR as the top two templates among Tyree/Akerman, Nygard, arc42, Y-statements, and MADR, a controlled experiment with undergraduate students showed Nygard's template achieving a higher Overall Score. Qualitative feedback indicated that Nygard encourages concise and objective documentation while MADR better supports structural details and specific architectural requirements. The authors conclude that these differences provide a practical decision aid for selecting ADR templates that reduce overhead and improve architectural knowledge retention.

What carries the argument

A two-step empirical process: expert DESMET Feature Analysis to rank five ADR templates, followed by a controlled experiment measuring comprehension, usability, and ease of adoption on the top two (Nygard and MADR).

If this is right

  • Teams needing quick, objective records can adopt Nygard's template to reduce documentation time.
  • Projects requiring explicit structural and requirement details can choose MADR without sacrificing the other templates' lower performance.
  • The comparison supplies a concrete decision guide that aligns template choice with project constraints to limit overhead.
  • Adopting the higher-scoring template should improve retention of architectural knowledge across team changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Repeating the tasks with industry professionals on evolving systems could reveal whether the preference for conciseness holds when long-term maintenance is the main use case.
  • Embedding the preferred template into common issue trackers or wikis might increase adoption rates beyond what voluntary use achieves.
  • The same evaluation approach could be applied to newer or domain-specific templates to keep the decision guide current.

Load-bearing premise

Undergraduate students completing a lab task accurately represent how professional developers would comprehend, use, and prefer these templates during real project maintenance.

What would settle it

A replication study using professional developers on live codebases that measures the same overall score and finds MADR preferred or equal to Nygard.

Figures

Figures reproduced from arXiv: 2604.27333 by Fernando Nogueira, Nabson Silva, Tayana Conte.

Figure 1
Figure 1. Figure 1: ADR example of a generic architectural decision view at source ↗
Figure 2
Figure 2. Figure 2: Two-step research design 3.1 Goal and Research Questions The goal of this study is to evaluate and compare ADR templates for their ability to effectively support documentation and under￾standing of architectural decisions. The evaluation focuses on key quality attributes that influence the practical use of ADRs, namely comprehension, usability, and ease of adoption. Based on this, we formulate the followin… view at source ↗
Figure 3
Figure 3. Figure 3: Experiment Design Illustration. 3.3.6 Instrumentation. The instruments for this experiment in￾clude a set of forms, training materials, architectural decision sce￾narios, a software platform, and specific task guidelines. We describe each instrument as follows: • Training: A presentation delivered during a class session ex￾plaining the core concepts of Architectural Decisions (ADs) and the AD documentation… view at source ↗
Figure 4
Figure 4. Figure 4: Example of Report Log generated by the platform. view at source ↗
Figure 5
Figure 5. Figure 5: Participants distribution. in two different laboratories, where we equally followed the experi￾ment’s guidelines. In the first session, both groups documented the first architectural decision: Group A utilized the Nygard template, while Group B used the MADR template. In the second session, the treatments were crossed: Group A used the MADR template and Group B used the Nygard template to document a second… view at source ↗
Figure 6
Figure 6. Figure 6: Boxplots presenting the data distribution of the view at source ↗
read the original abstract

Context: Documenting Architectural Design Decisions (ADDs) is a critical factor in the software lifecycle, essential for efficient system maintenance, developer onboarding, and preventing knowledge vaporization. Although various templates for Architectural Decision Records (ADRs) have been proposed, there is a lack of empirical evidence comparing them. Goal: To address this gap, this paper aims to identify which ADR template best supports comprehension, usability, and ease of adoption: Tyree/Akerman's template, Nygard's ADR, arc42, Y-statements, and MADR. Method: We compared these templates using the DESMET FA method in a two-step evaluation. First, the two primary authors evaluated the five templates through the DESMET FA, based on their software architecture expertise. The two top-performing templates were then used as treatments in a controlled experiment conducted with undergraduate students. Results: In the preliminary screening by experts, the top-performing templates were those of Nygard and MADR. In the subsequent controlled experiment, Nygard's template outperformed MADR in terms of the Overall Score. Qualitative analysis of participant feedback revealed the factors influencing template preference. The findings indicate that Nygard supports concise and objective documentation, while MADR facilitates structural details and specific architectural requirements. Conclusion: This paper provides an evidence-based strategy for ADR template adoption by offering a comparison between them. The findings present a decision-making guide that assists practitioners and researchers in selecting ADR templates aligned with project constraints, aiming to minimize documentation overhead and increase architectural knowledge retention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a two-phase empirical comparison of five ADR templates (Tyree/Akerman, Nygard, arc42, Y-statements, MADR) using the DESMET Feature Analysis method. The authors first perform an expert screening, identifying Nygard and MADR as top performers, then conduct a controlled experiment with undergraduate students comparing these two, finding that Nygard's template yields a higher overall score in terms of comprehension, usability, and ease of adoption. Qualitative feedback suggests Nygard supports concise documentation while MADR aids structural details. The paper concludes with a decision-making guide for practitioners selecting ADR templates.

Significance. If the findings hold after addressing methodological gaps, this work supplies useful empirical evidence on ADR template effectiveness, filling a gap in software architecture documentation research. The two-phase DESMET approach combined with qualitative analysis of participant feedback is a methodological strength that could support practitioner guidance. However, the exclusive use of undergraduate students in a lab setting substantially limits the significance for the stated goal of aiding real-world professional projects, as no validation or sensitivity analysis bridges this proxy gap.

major comments (3)
  1. [Section 3] Section 3 (Expert Screening): The DESMET FA screening was conducted solely by the two primary authors. This self-evaluation introduces subjectivity and potential bias in selecting Nygard and MADR for the experiment; the selection step is load-bearing because it determines which templates receive the decisive empirical test.
  2. [Section 4] Section 4 (Controlled Experiment): The description supplies no sample size, no statistical tests or effect sizes for the Overall Score comparison, no details on the comprehension tasks or usability instruments, and no controls for prior knowledge or order effects. These omissions make it impossible to evaluate the reliability of the central claim that Nygard outperformed MADR.
  3. [Section 5] Section 5 (Discussion/Conclusion): The manuscript claims to deliver an 'evidence-based strategy for ADR template adoption' for practitioners, yet contains no limitations discussion or analysis of whether undergraduate lab results generalize to professional developers facing maintenance responsibilities and time pressures. This proxy assumption is load-bearing for any practitioner-facing recommendation.
minor comments (2)
  1. [Abstract] Abstract: The term 'Overall Score' is used without definition or reference to its construction; a brief parenthetical or cross-reference to the metrics in Section 4 would improve clarity.
  2. [Tables/Figures] Figures/Tables: Any tables reporting scores or qualitative themes should include confidence intervals or inter-rater agreement metrics where applicable to strengthen the quantitative presentation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating the revisions we plan to make. Our responses focus on strengthening the methodological transparency and acknowledging limitations without misrepresenting the study design or findings.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Expert Screening): The DESMET FA screening was conducted solely by the two primary authors. This self-evaluation introduces subjectivity and potential bias in selecting Nygard and MADR for the experiment; the selection step is load-bearing because it determines which templates receive the decisive empirical test.

    Authors: We acknowledge that the DESMET Feature Analysis screening was performed exclusively by the two primary authors. This approach follows the standard application of DESMET FA, which relies on domain-expert judgment for initial feature evaluation, and the authors have relevant expertise in software architecture documentation. However, we agree that greater transparency is warranted given the load-bearing nature of the selection. In the revised manuscript, we will expand Section 3 with a detailed table or breakdown showing the scores assigned to each template on every DESMET criterion. We will also add an explicit discussion of potential subjectivity and author bias to the limitations section, noting that independent expert validation of the screening could be pursued in follow-up studies. We do not plan to re-execute the screening with additional raters for this revision, as the original criteria application was consistent and documented. revision: partial

  2. Referee: [Section 4] Section 4 (Controlled Experiment): The description supplies no sample size, no statistical tests or effect sizes for the Overall Score comparison, no details on the comprehension tasks or usability instruments, and no controls for prior knowledge or order effects. These omissions make it impossible to evaluate the reliability of the central claim that Nygard outperformed MADR.

    Authors: We agree that the current text of Section 4 omits essential methodological details required to assess the reliability of the results. We will revise Section 4 to include the exact sample size, the specific statistical tests and effect sizes used for the Overall Score comparison, complete descriptions of the comprehension tasks and usability instruments, and the measures taken to control for prior knowledge and order effects. These additions will allow readers to fully evaluate the central claim that Nygard's template outperformed MADR on the measured dimensions. revision: yes

  3. Referee: [Section 5] Section 5 (Discussion/Conclusion): The manuscript claims to deliver an 'evidence-based strategy for ADR template adoption' for practitioners, yet contains no limitations discussion or analysis of whether undergraduate lab results generalize to professional developers facing maintenance responsibilities and time pressures. This proxy assumption is load-bearing for any practitioner-facing recommendation.

    Authors: We concur that the manuscript would be strengthened by an explicit limitations discussion addressing the generalizability of undergraduate lab results to professional settings. In the revised version, we will add a dedicated limitations subsection that directly discusses the use of students as proxies, the controlled lab environment versus real-world time pressures and maintenance responsibilities, and the absence of sensitivity analysis or industry validation. We will also moderate the language in the conclusion and decision-making guide to present the findings as initial empirical evidence rather than a fully validated practitioner strategy, while retaining the guide as a synthesis of the observed differences in comprehension, usability, and adoption factors. This revision will qualify the claims appropriately without altering the reported results or the two-phase DESMET approach. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison without derivations or self-referential reductions

full rationale

The paper performs a two-stage empirical evaluation: DESMET FA screening by the two primary authors followed by a controlled experiment measuring student comprehension, usability, and preference scores for the top two templates. No equations, fitted parameters, predictions, or first-principles derivations appear. Results are reported as direct measurements and qualitative feedback from participants. No self-citations form a load-bearing chain, and the templates evaluated originate from prior independent work. The derivation chain is therefore self-contained against the experiment data; the undergraduate proxy assumption affects external validity but does not create circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two standard but untested assumptions in empirical software engineering: that student participants adequately represent professional developers and that the DESMET Feature Analysis criteria validly capture real-world template usability. No free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Undergraduate students can serve as reasonable proxies for evaluating the comprehension and usability of ADR templates by professional developers.
    The controlled experiment used undergraduate students as participants.
  • domain assumption The DESMET Feature Analysis method provides a reliable preliminary screening of software engineering templates before human-subject evaluation.
    Used by the two primary authors to select the top two templates for the experiment.

pith-pipeline@v0.9.0 · 5580 in / 1361 out tokens · 56416 ms · 2026-05-07T07:56:23.743484+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Bardha Ahmeti, Maja Linder, Raffaela Groner, and Rebekka Wohlrab. 2024. Ar- chitecture decision records in practice: An action research study. InEuropean Conference on Software Architecture. Springer, 333–349

  2. [2]

    Muhammad Ali Babar, Barbara Kitchenham, and Piyush Maheshwari. 2006. As- sessing the value of architectural information extracted from patterns for archi- tecting. In10th International Conference on Evaluation and Assessment in Software Engineering (EASE). BCS Learning & Development

  3. [3]

    Terese Besker, Antonio Martini, and Jan Bosch. 2017. Impact of architectural technical debt on daily software development work—a survey of software practi- tioners. In2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 278–287

  4. [4]

    Terese Besker, Antonio Martini, and Jan Bosch. 2018. Managing architectural technical debt: A unified model and systematic literature review.Journal of Systems and Software135 (2018), 1–16

  5. [5]

    Klara Borowa, Rafał Lewanczyk, Klaudia Stpiczyńska, Patryk Stradomski, and An- drzej Zalewski. 2023. What rationales drive architectural decisions? An empirical inquiry. InEuropean Conference on Software Architecture. Springer, 303–318

  6. [6]

    Rafael Capilla, Anton Jansen, Antony Tang, Paris Avgeriou, and Muhammad Ali Babar. 2016. 10 years of software architecture knowledge management: Practice and future.Journal of Systems and Software116 (2016), 191–205

  7. [7]

    Lucas Carvalho and Tayana Conte. 2025. Software architecture decision-making process: The practitioners’ view from the Brazilian industry.Science of Computer Programming244 (2025), 103302

  8. [8]

    Paul Clements, David Garlan, Reed Little, Robert Nord, and Judith Stafford. 2003. Documenting software architectures: views and beyond. In25th International Conference on Software Engineering, 2003. Proceedings.IEEE, 740–741

  9. [9]

    2014.Basics of qualitative research: Techniques and procedures for developing grounded theory

    Juliet Corbin and Anselm Strauss. 2014.Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage publications

  10. [10]

    Wei Ding, Peng Liang, Antony Tang, Hans Van Vliet, and Mojtaba Shahin

  11. [11]

    In2014 19th International conference on engineering of complex computer systems

    How do open source communities document software architecture: An exploratory survey. In2014 19th International conference on engineering of complex computer systems. IEEE, 136–145

  12. [12]

    Davide Falessi, Natalia Juristo, Claes Wohlin, Burak Turhan, Jürgen Münch, Andreas Jedlitschka, and Markku Oivo. 2018. Empirical software engineering experts on the use of students and professionals in experiments.Empirical Software Engineering23, 1 (2018), 452–489

  13. [13]

    David Garlan. 2000. Software architecture: a roadmap. InProceedings of the Conference on the Future of Software Engineering. 91–101

  14. [14]

    Ibrahim Habli and Tim Kelly. 2007. Capturing and replaying architectural knowl- edge through derivational analogy. InSecond Workshop on Sharing and Reusing One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption EASE’26, June 2026, Glasgow, United Kingdom Architectural Knowledge-Architecture, Ra...

  15. [15]

    Anton Jansen and Jan Bosch. 2005. Software architecture as a set of architectural design decisions. In5th Working IEEE/IFIP Conference on Software Architecture (WICSA’05). IEEE, 109–120

  16. [16]

    Anton Jansen, Jan Van Der Ven, Paris Avgeriou, and Dieter K Hammer. 2007. Tool support for architectural decisions. In2007 Working IEEE/IFIP Conference on Software Architecture (WICSA’07). Ieee, 4–4

  17. [17]

    Barbara Kitchenham, Stephen Linkman, and David Law. 1996. DESMET: A method for evaluating software engineering methods and tools.Keele University (1996)

  18. [18]

    Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering.Empirical Software Engineering22, 2 (2017), 579–630

  19. [19]

    Oliver Kopp, Anita Armbruster, and Olaf Zimmermann. 2018. Markdown Archi- tectural Decision Records: Format and Tool Support.. InZEUS. 55–62

  20. [20]

    Helena Chmura Kraemer and David J Kupfer. 2006. Size of treatment effects and their importance to clinical research and practice.Biological psychiatry59, 11 (2006), 990–996

  21. [21]

    Philippe Kruchten. 2004. An ontology of architectural design decisions in soft- ware intensive systems. In2nd Groningen workshop on software variability. Gronin- gen, The Netherlands, 54–61

  22. [22]

    Philippe Kruchten. 2012. Strategic management of technical debt: Tutorial syn- opsis. In2012 12th International Conference on Quality Software. IEEE, 282–284

  23. [23]

    Philippe Kruchten, Patricia Lago, and Hans Van Vliet. 2006. Building up and rea- soning about architectural knowledge. InInternational conference on the quality of software architectures. Springer, 43–58

  24. [24]

    Philippe B Kruchten. 2002. The 4+ 1 view model of architecture.IEEE software 12, 6 (2002), 42–50

  25. [25]

    Mathieu Nassif and Martin P Robillard. 2025. Evaluating interactive documenta- tion for programmers.Empirical Software Engineering30, 3 (2025), 73

  26. [26]

    2011.Documenting Architecture Decisions

    Michael Nygard. 2011.Documenting Architecture Decisions. Cognitect Blog. Re- trieved Janeiro 4, 2026 from https://cognitect.com/blog/2011/11/15/documenting- architecture-decisions

  27. [27]

    Dewayne E Perry and Alexander L Wolf. 1992. Foundations for the study of software architecture.ACM SIGSOFT Software engineering notes17, 4 (1992), 40–52

  28. [28]

    Mojtaba Shahin, Peng Liang, and Mohammad Reza Khayyambashi. 2009. Ar- chitectural design decision: Existing models and tools. In2009 Joint Working IEEE/IFIP Conference on Software Architecture & European Conference on Software Architecture. IEEE, 293–296

  29. [29]

    2024.arc42: The Template for Software Archi- tecture Documentation

    Gernot Starke and Peter Hruschka. 2024.arc42: The Template for Software Archi- tecture Documentation. Retrieved Janeiro 4, 2026 from https://arc42.org/overview

  30. [30]

    Klaas-Jan Stol, Paris Avgeriou, and Muhammad Ali Babar. 2010. Identifying Architectural Patterns Used in Open Source Software: Approaches and Chal- lenges. In14th International Conference on Evaluation and Assessment in Soft- ware Engineering, EASE 2010, Keele University, UK, 12-13 April 2010 (Work- shops in Computing), Mark Turner and Mahmood Niazi (Eds....

  31. [31]

    Jeff Tyree and Art Akerman. 2005. Architecture decisions: Demystifying archi- tecture.IEEE software22, 2 (2005), 19–27

  32. [32]

    András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101–132

  33. [33]

    Sira Vegas, Cecilia Apa, and Natalia Juristo. 2015. Crossover designs in software engineering experiments: Benefits and perils.IEEE Transactions on Software Engineering42, 2 (2015), 120–135

  34. [34]

    Zhiyuan Wan, Yun Zhang, Xin Xia, Yi Jiang, and David Lo. 2023. Software architecture in practice: Challenges and opportunities. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1457–1469

  35. [35]

    2012.Experimentation in software engineering

    Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, Anders Wesslén, et al. 2012.Experimentation in software engineering. Vol. 236. Springer

  36. [36]

    Chen Yang, Peng Liang, and Paris Avgeriou. 2019. Integrating agile practices into architectural assumption management: An industrial survey. InProceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering. 156–165

  37. [37]

    Uwe Zdun, Rafael Capilla, Huy Tran, and Olaf Zimmermann. 2013. Sustainable Architectural Design Decisions.IEEE Software30, 6 (2013), 46–53. doi:10.1109/ MS.2013.97

  38. [38]

    Olaf Zimmermann, Lukas Wegmann, Heiko Koziolek, and Thomas Goldschmidt

  39. [39]

    In2015 12th Working IEEE/IFIP Conference on Software Architecture

    Architectural decision guidance across projects-problem space modeling, decision backlog management and cloud computing knowledge. In2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 85–94