pith. sign in

arxiv: 1907.05564 · v1 · pith:ZCLUCRKQnew · submitted 2019-07-12 · 💻 cs.SE

Framework Code Samples: How Are They Maintained and Used by Developers?

Pith reviewed 2026-05-24 22:47 UTC · model grok-4.3

classification 💻 cs.SE
keywords framework code samplessoftware maintenancecode evolutiondeveloper usageAndroidSpringBootcode forkingbuild tools
0
0 comments X

The pith

Framework code samples are typically small and simple, update often to match new versions, and get forked by clients who rarely modify them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how code samples for frameworks like Android and SpringBoot are created, maintained, and used in practice. It analyzes 233 samples for their size, reliance on build tools, evolution over time, popularity, and client behaviors such as forking. The work shows that these samples mostly serve as ready working environments rather than complex starting points. A sympathetic reader would care because code samples are intended to speed up learning, yet little was previously known about whether they achieve that in real developer workflows. The findings point to patterns that could inform how samples are designed and supported going forward.

Core claim

Analysis of 233 code samples from the Android and SpringBoot frameworks shows that most are small and simple, provide a working environment to clients, and rely on automated build tools. The samples change frequently, often to adapt to new framework versions. Clients commonly fork the code samples but rarely modify them after forking.

What carries the argument

Empirical measurement of source code size and complexity, evolution history, popularity indicators, and client usage patterns across the selected samples.

If this is right

  • Creators should prioritize keeping code samples small, simple, and equipped with automated build tools to match how they are actually used.
  • Frequent updates to align with new framework versions are necessary to keep samples relevant over time.
  • Since clients fork samples but seldom change them, design should emphasize ready-to-run examples over highly customizable templates.
  • Lessons from this analysis can guide better maintenance practices for both creators and users of code samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The fork-but-rarely-modify pattern suggests code samples function more as reference implementations than as bases for substantial customization.
  • Similar studies on samples from other frameworks could test whether the observed characteristics hold beyond Android and SpringBoot.
  • If the pattern is widespread, framework maintainers might consider investing in interactive or version-specific sample delivery mechanisms instead of static repositories.

Load-bearing premise

The 233 code samples chosen from Android and SpringBoot represent framework code samples in general, and the chosen metrics for size, evolution, popularity, and client usage capture maintenance and usage behaviors accurately.

What would settle it

Finding a broad set of code samples from additional frameworks where most are large and complex, updated infrequently, or frequently modified by clients after forking.

Figures

Figures reproduced from arXiv: 1907.05564 by Andre Hora, Bruno Cafeo, Gabriel Menezes.

Figure 1
Figure 1. Figure 1: Example of code sample (SpringBoot framework). [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Code sample statistics (SpringBoot framework). [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Basic metrics of the Android and SpringBoot code samples. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Source code analysis (RQ1). 1. Source code metrics: We first assess the current state of the samples by computing source code metrics with the support of the software analysis tool Understand.22 Particularly, we focus on four metrics: number of java files, lines of code, cyclomatic complexity, and commented code lines. Rationale: Small code 21https://github.com/spring-guides 22https://scitools.com [PITH_F… view at source ↗
Figure 5
Figure 5. Figure 5: Evolutionary analysis (RQ2). 1. Evolutionary metrics: We compute metrics to assess the evolution of the code samples. Specifically, we extract two evolutionary metrics: frequency of commits and lifetime. Life￾time is computed as the number of days between the first and the last project commit. Rationale: To cope with API evolution [10]–[13], ideally, the code samples should change over time. Code samples w… view at source ↗
Figure 6
Figure 6. Figure 6: Popularity analysis (RQ3). 23https://cran.r-project.org/web/packages/effsize [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Client usage analysis (RQ4). 1. Fork metrics: We compute three metrics to assess how the code samples are forked: number of forks, number of forks with commits, and number of commits in forked code samples. Rationale: Fork can be seen as a measure of popularity [21]. After forking, the client developer can update the code or simply do not perform any change. In the case the forked project is updated, this … view at source ↗
Figure 9
Figure 9. Figure 9: Evolutionary metrics (RQ2). File extension changes: Table III presents the changes per file extension. We clearly see that the code samples are not static: they are updated over the years. In both cases, xml files are the most changed, followed by Java, properties, and jar files. Table IV shows another view of this data: the actions performed on the files: addition, modification, or removal. While in Andro… view at source ↗
Figure 10
Figure 10. Figure 10: Migration delay (RQ2) [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Dependency to the framework in number of imports (left) and ratio [PITH_FULL_IMAGE:figures/full_fig_p007_12.png] view at source ↗
Figure 11
Figure 11. Figure 11: shows the versions that the code samples are adopting. We see that the Android samples mostly rely on24 the API level 26 (i.e., Android 8.0, Oreo), 27 (i.e., 8.1, Oreo), and 28 (i.e., 9.0, Pie), however, many samples also rely on other API levels, which represents older versions of Android. Regarding SpringBoot, the majority of the samples are based on version 2.0.5; in this case, we found no sample relyi… view at source ↗
Figure 13
Figure 13. Figure 13: Code sample forks (RQ4). The fact that there is a fork do not necessarily mean that it changes over time. Indeed, in Android, only 3% (871 out of 25,106) forked projects are ahead of the base project, i.e., they performed at least one commit; in SpringBoot this ratio is 15% (1,055 out of 7,025) [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: presents the frequency of commits per forked code samples; here, we only show the forks with at least one commit. In this case, 7% and 9% of the forked Android and SpringBoot code samples have 10 or more commits. In both frameworks, the majority of the forked code samples have a single commit (46% and 47%). In Android, 29% of the forked code samples have 2–3 commits, while 16% have 4–10. In SpringBoot, th… view at source ↗
read the original abstract

Background: Modern software systems are commonly built on the top of frameworks. To accelerate the learning process of features provided by frameworks, code samples are made available to assist developers. However, we know little about how code samples are actually developed. Aims: In this paper, we aim to fill this gap by assessing the characteristics of framework code samples. We provide insights on how code samples are maintained and used by developers. Method: We analyze 233 code samples of Android and SpringBoot, and assess aspects related to their source code, evolution, popularity, and client usage. Results: We find that most code samples are small and simple, provide a working environment to the clients, and rely on automated build tools. They change frequently over time, for example, to adapt to new framework versions. We also detect that clients commonly fork the code samples, however, they rarely modify them. Conclusions: We provide a set of lessons learned and implications to creators and clients of code samples to improve maintenance and usage activities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports an empirical analysis of 233 code samples drawn from the Android and Spring Boot frameworks. It examines source-code characteristics, evolution history, popularity metrics, and client usage patterns (forking and modification). The central claims are that the samples are predominantly small and simple, supply a working environment via automated build tools, evolve frequently (often to track framework releases), and are commonly forked by clients yet rarely modified inside those forks. The authors extract lessons and implications for sample creators and users.

Significance. If the reported patterns hold within the studied frameworks, the work supplies concrete observational data on an under-studied artifact class (framework code samples). The repository-mining approach yields falsifiable, quantitative findings on size, change frequency, and fork-versus-modify behavior that could inform tooling and documentation practices. No parameter-free derivations or machine-checked proofs are present, but the study is purely data-driven and avoids circular definitions.

major comments (2)
  1. [Method] Method section (as summarized in the abstract): the study selects 233 samples exclusively from Android and Spring Boot yet provides no sampling frame, inclusion/exclusion criteria, or justification for why these two frameworks are representative of framework code samples in general. This untested representativeness assumption is load-bearing for all general claims about maintenance and usage behaviors.
  2. [Results] Results (evolution and client-usage paragraphs): fork counts and change-frequency metrics are presented as direct indicators of usage and maintenance intent without validation or discussion of confounds (e.g., forks may proxy visibility rather than active use; change cadence may simply track upstream framework release schedules). No sensitivity analysis or alternative operationalizations are reported.
minor comments (1)
  1. [Abstract] Abstract: 'built on the top of frameworks' is non-standard phrasing; 'built on top of frameworks' is the conventional form.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. The comments highlight important aspects of methodological transparency and interpretation of results. We address each major comment below, indicating planned revisions where appropriate. Our responses focus on clarifying the scope of the study and strengthening the discussion of limitations.

read point-by-point responses
  1. Referee: [Method] Method section (as summarized in the abstract): the study selects 233 samples exclusively from Android and Spring Boot yet provides no sampling frame, inclusion/exclusion criteria, or justification for why these two frameworks are representative of framework code samples in general. This untested representativeness assumption is load-bearing for all general claims about maintenance and usage behaviors.

    Authors: We agree that the original manuscript lacks an explicit description of the sampling process. The 233 samples were drawn from the official GitHub repositories of Android and Spring Boot (the two most prominent frameworks with publicly available code samples at the time of data collection), using a combination of repository search and manual verification for the presence of build files and documentation. We did not intend to claim statistical representativeness across all possible frameworks; the study is positioned as an in-depth examination of these two widely adopted cases. In the revision we will add a dedicated 'Sample Selection' subsection to the Method section that details the inclusion criteria (e.g., presence of build automation, public availability, and minimum activity threshold) and will include an explicit limitations paragraph stating that findings should be interpreted within the context of these two frameworks rather than generalized to all framework code samples. revision: yes

  2. Referee: [Results] Results (evolution and client-usage paragraphs): fork counts and change-frequency metrics are presented as direct indicators of usage and maintenance intent without validation or discussion of confounds (e.g., forks may proxy visibility rather than active use; change cadence may simply track upstream framework release schedules). No sensitivity analysis or alternative operationalizations are reported.

    Authors: The referee is correct that the manuscript presents fork counts and commit frequency without sufficient discussion of alternative explanations. Fork counts serve as a readily available proxy for interest and potential usage, yet they may also reflect repository visibility or promotional activity. Likewise, many updates coincide with new framework releases. We will revise the Results and Discussion sections to acknowledge these confounds explicitly, add a paragraph on threats to validity that covers alternative interpretations, and report supplementary metrics (e.g., proportion of forks with subsequent commits by the forker) that were available in our dataset. A full sensitivity analysis is not feasible with the current observational data, but the added discussion will temper the claims accordingly. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational empirical study

full rationale

The paper conducts a direct repository analysis of 233 code samples from Android and Spring Boot, reporting descriptive statistics on size, evolution, popularity, and client usage. No equations, fitted parameters, predictions, or derivations appear in the method or results. Claims are computed from the collected data without self-referential definitions, self-citation load-bearing premises, or renaming of known results. The representativeness limitation noted by the skeptic is an external-validity concern, not a circular reduction in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the assumption that the chosen samples and metrics reflect real-world usage; no free parameters, invented entities, or mathematical axioms are involved.

axioms (1)
  • domain assumption The 233 code samples analyzed are representative of how framework code samples are maintained and used by developers.
    Selection of Android and SpringBoot samples is presented without justification of representativeness or sampling frame in the abstract.

pith-pipeline@v0.9.0 · 5702 in / 1189 out tokens · 19466 ms · 2026-05-24T22:47:47.842515+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    The effect of object-oriented frameworks on developer productivity,

    S. Moser and O. Nierstrasz, “The effect of object-oriented frameworks on developer productivity,” Computer, vol. 29, no. 9, 1996

  2. [2]

    Best principles in the design of shared software,

    D. Konstantopoulos, J. Marien, M. Pinkerton, and E. Braude, “Best principles in the design of shared software,” in International Computer Software and Applications Conference , 2009, pp. 287–292

  3. [3]

    Measuring software library stability through historical version analysis,

    S. Raemaekers, A. van Deursen, and J. Visser, “Measuring software library stability through historical version analysis,” in International Conference on Software Maintenance , 2012, pp. 378–387

  4. [4]

    Code example guidelines,

    D. Vincent, “Code example guidelines,” https://developer.mozilla.org/en- US/docs/MDN/Contribute/Guidelines/Code guidelines, 2018

  5. [5]

    R. C. Martin, Clean code: a handbook of agile software craftsmanship . Pearson Education, 2009

  6. [6]

    How software engineers use documentation: The state of the practice,

    T. C. Lethbridge, J. Singer, and A. Forward, “How software engineers use documentation: The state of the practice,” IEEE Software, no. 6, pp. 35–39, 2003

  7. [7]

    Duvall, S

    P. Duvall, S. M. Matyas, and A. Glover, Continuous Integration: Improving Software Quality and Reducing Risk , ser. Addison-Wesley Signature Series. Addison-Wesley, 2007

  8. [8]

    Continuous integration and its tools,

    M. Meyer, “Continuous integration and its tools,” IEEE Software , vol. 31, no. 3, pp. 14–16, May 2014

  9. [9]

    Quality and Productivity Outcomes Relating to Continuous Integration in GitHub,

    B. Vasilescu, Y . Yu, H. Wang, P. Devanbu, and V . Filkov, “Quality and Productivity Outcomes Relating to Continuous Integration in GitHub,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 805–816

  10. [10]

    How do APIs evolve? A story of refactoring,

    D. Dig and R. Johnson, “How do APIs evolve? A story of refactoring,” Journal of software maintenance and evolution: Research and Practice , vol. 18, no. 2, pp. 83–107, 2006

  11. [11]

    Historical and impact analysis of API breaking changes: A large scale study,

    L. Xavier, A. Brito, A. Hora, and M. T. Valente, “Historical and impact analysis of API breaking changes: A large scale study,” in International Conference on Software Analysis, Evolution and Reengineering , 2017, pp. 138–147

  12. [12]

    An empirical study on the impact of refactoring activities on evolving client-used apis,

    R. G. Kula, A. Ouni, D. M. German, and K. Inoue, “An empirical study on the impact of refactoring activities on evolving client-used apis,” Information and Software Technology , vol. 93, pp. 186–199, 2018

  13. [13]

    Assessing the threat of untracked changes in software evolution,

    A. Hora, D. Silva, R. Robbes, and M. T. Valente, “Assessing the threat of untracked changes in software evolution,” in 40th International Conference on Software Engineering (ICSE) , 2018, pp. 1102–1113

  14. [14]

    An empirical study of API stability and adoption in the Android ecosystem,

    T. McDonnell, B. Ray, and M. Kim, “An empirical study of API stability and adoption in the Android ecosystem,” in International Conference on Software Maintenance, 2013, pp. 70–79

  15. [15]

    How do developers react to API deprecation? the case of a Smalltalk ecosystem,

    R. Robbes, M. Lungu, and D. R ¨othlisberger, “How do developers react to API deprecation? the case of a Smalltalk ecosystem,” in International Symposium on the Foundations of Software Engineering , 2012

  16. [16]

    How do developers react to API evolution? a large-scale empirical study,

    A. Hora, R. Robbes, M. T. Valente, N. Anquetil, A. Etien, and S. Ducasse, “How do developers react to API evolution? a large-scale empirical study,” Software Quality Journal, vol. 26, no. 1, pp. 161–191, 2018

  17. [17]

    Do develop- ers update their library dependencies?

    R. G. Kula, D. M. German, A. Ouni, T. Ishio, and K. Inoue, “Do develop- ers update their library dependencies?” Empirical Software Engineering, vol. 23, no. 1, pp. 384–417, 2018

  18. [18]

    Appropriate statistics for ordinal level data: Should we really be using t-test and cohensd for evaluating group differences on the nsse and other surveys,

    J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek, “Appropriate statistics for ordinal level data: Should we really be using t-test and cohensd for evaluating group differences on the nsse and other surveys,” in Florida Association of Institutional Research , 2006, pp. 1–33

  19. [19]

    What are the charac- teristics of high-rated apps? a case study on free Android applications,

    Y . Tian, M. Nagappan, D. Lo, and A. E. Hassan, “What are the charac- teristics of high-rated apps? a case study on free Android applications,” in International Conference on Software Maintenance and Evolution , 2014, pp. 301–310

  20. [20]

    On the use of replacement messages in API deprecation: An empirical study,

    G. Brito, A. Hora, M. T. Valente, and R. Robbes, “On the use of replacement messages in API deprecation: An empirical study,” Journal of Systems and Software , vol. 137, pp. 306–321, 2018

  21. [21]

    Understanding the factors that impact the popularity of GitHub repositories,

    H. Borges, A. Hora, and M. T. Valente, “Understanding the factors that impact the popularity of GitHub repositories,” in International Conference on Software Maintenance and Evolution , 2016, pp. 334– 344

  22. [22]

    Wohlin, P

    C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, and A. Wessln, Experimentation in Software Engineering . Springer Pub- lishing Company, Incorporated, 2012

  23. [23]

    Using an information retrieval system to retrieve source code samples,

    R. Sindhgatta, “Using an information retrieval system to retrieve source code samples,” in International Conference on Software Engineering , 2006, pp. 905–908

  24. [24]

    Using structural context to recommend source code examples,

    R. Holmes and G. C. Murphy, “Using structural context to recommend source code examples,” in International Conference on Software Engi- neering, 2005, pp. 117–125

  25. [25]

    Jungloid mining: Helping to navigate the api jungle,

    D. Mandelin, L. Xu, R. Bod ´ık, and D. Kimelman, “Jungloid mining: Helping to navigate the api jungle,” in Conference on Programming Language Design and Implementation , 2005, pp. 48–61

  26. [26]

    Jiriss - an eclipse plug-in for source code exploration,

    D. Poshyvanyk and A. M. and, “Jiriss - an eclipse plug-in for source code exploration,” in International Conference on Program Comprehension , 2006, pp. 252–255

  27. [27]

    Xsnippet: Mining for sample code,

    N. Sahavechaphan and K. Claypool, “Xsnippet: Mining for sample code,” in Conference on Object-oriented Programming Systems, Lan- guages, and Applications , 2006, pp. 413–430

  28. [28]

    Mining api usage examples from test code,

    Z. Zhu, Y . Zou, B. Xie, Y . Jin, Z. Lin, and L. Zhang, “Mining api usage examples from test code,” inInternational Conference on Software Maintenance and Evolution , 2014, pp. 301–310

  29. [29]

    Documenting apis with examples: Lessons learned with the apiminer platform,

    J. E. Montandon, H. Borges, D. Felix, and M. T. Valente, “Documenting apis with examples: Lessons learned with the apiminer platform,” in Working Conference on Reverse Engineering , 2013, pp. 401–408

  30. [30]

    How Can I Use this Method?

    L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus, “How Can I Use this Method?” in International Conference on Software Engineering, 2015, pp. 880–890

  31. [31]

    Synthesizing api usage examples,

    R. P. L. Buse and W. Weimer, “Synthesizing api usage examples,” in International Conference on Software Engineering , 2012, pp. 782–792

  32. [32]

    Spotting working code examples,

    I. Keivanloo, J. Rilling, and Y . Zou, “Spotting working code examples,” in International Conference on Software Engineering , 2014, pp. 664– 675

  33. [33]

    Learning to rank code examples for code search engines,

    H. Niu, I. Keivanloo, and Y . Zou, “Learning to rank code examples for code search engines,” Empirical Software Engineering , vol. 22, no. 1, pp. 259–291, Feb. 2017

  34. [34]

    What Makes a Good Code Example?: A Study of Programming Q&A in StackOverflow,

    J. Sillito, F. Maurer, S. M. Nasehi, and C. Burns, “What Makes a Good Code Example?: A Study of Programming Q&A in StackOverflow,” in International Conference on Software Maintenance , 2012, pp. 25–34

  35. [35]

    From Query to Usable Code: An Analysis of Stack Overflow Code Snippets,

    D. Yang, A. Hussain, and C. V . Lopes, “From Query to Usable Code: An Analysis of Stack Overflow Code Snippets,” in International Conference on Mining Software Repositories , 2016, pp. 391–402

  36. [36]

    Near-miss function clones in open source software: An empirical study,

    C. K. Roy and J. R. Cordy, “Near-miss function clones in open source software: An empirical study,” Journal of Software: Evolution and Process, vol. 22, no. 3, pp. 165–189, 2010

  37. [37]

    On the Extent and Nature of Software Reuse in Open Source Java Projects,

    L. Heinemann, F. Deissenboeck, M. Gleirscher, B. Hummel, and M. Irl- beck, “On the Extent and Nature of Software Reuse in Open Source Java Projects,” in International Conference on Top Productivity Through Software Reuse, 2011, pp. 207–222

  38. [38]

    Stack Overflow in Github: Any Snippets There?

    D. Yang, P. Martins, V . Saini, and C. Lopes, “Stack Overflow in Github: Any Snippets There?” in International Conference on Mining Software Repositories, 2017, pp. 280–290