How Do Developers Use Migration Guides? A Case Study of Log4j

Kazumasa Shimari; Kazuma Yamasaki; Kenichi Matsumoto; Takahiro Monno; Tetsuya Kanda

arxiv: 2604.24072 · v2 · pith:LXLO7LG7new · submitted 2026-04-27 · 💻 cs.SE

How Do Developers Use Migration Guides? A Case Study of Log4j

Takahiro Monno , Kazumasa Shimari , Tetsuya Kanda , Kazuma Yamasaki , Kenichi Matsumoto This is my paper

Pith reviewed 2026-05-21 01:21 UTC · model grok-4.3

classification 💻 cs.SE

keywords migration guidesbreaking changessoftware documentationpull requestsLog4jlibrary updatesdeveloper practices

0 comments

The pith

Developers most often reference migration guides in pull request descriptions, linking to the full guide 82.81 percent of the time and consulting them during both major updates and later maintenance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how developers actually use migration guides that document breaking changes when libraries update to new versions. It begins by checking whether libraries with known incompatibilities supply such guides, then focuses on a detailed case study of Log4j to examine real pull requests. Analysis shows references appear most frequently in the pull request description itself rather than in code comments. Most of those references point to the entire guide instead of individual sections. The study also finds that developers keep turning to the guides after the initial major version change, treating them as an ongoing resource.

Core claim

In the Log4j case study, pull request authors most frequently reference the migration guide in the pull request description, and most references (82.81%) link to the entire guide rather than specific sections. Developers use migration guides not only during major version updates but also during subsequent maintenance tasks, suggesting that the guides serve as a resource throughout the entire migration process.

What carries the argument

Empirical counting and classification of links and textual references to the official Log4j migration guide inside pull request descriptions and comments.

Load-bearing premise

References and links appearing in pull request descriptions and comments accurately capture how and when developers actually consult and apply the migration guide content during their work.

What would settle it

A direct observation study that records the exact sections developers open and read while performing a migration and then checks whether those sections match the links later placed in their pull requests.

Figures

Figures reproduced from arXiv: 2604.24072 by Kazumasa Shimari, Kazuma Yamasaki, Kenichi Matsumoto, Takahiro Monno, Tetsuya Kanda.

**Figure 1.** Figure 1: Example PR Body with Migration Guide Reference view at source ↗

read the original abstract

Migration guides are a form of software documentation that helps developers address breaking changes introduced in library version updates. Prior studies have examined documents such as release notes, API reference manuals, and patch notes. However, research that focuses specifically on migration guides remains limited. Improving the usability and coverage of migration guides is essential for helping developers resolve breaking changes efficiently. Yet, we still lack a clear understanding of how migration guides are currently provided and how developers use them in practice. To fill this gap, we first investigate whether libraries known to introduce incompatibilities provide migration guides. We then conduct a detailed case study on Log4j, a library that has experienced large-scale breaking updates in the past. We empirically analyze how developers refer to and use the official migration guide in real-world projects. We find that pull request authors most frequently reference the migration guide in the pull request description, and that most references (82.81%) link to the entire guide rather than specific sections. We also find that developers use migration guides not only during major version updates but also during subsequent maintenance tasks, suggesting that the guides serve as a resource throughout the entire migration process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This Log4j case study gives concrete counts on PR references to the migration guide but the proxy for actual consultation remains unvalidated.

read the letter

The main things to take from this paper are the specific numbers on how developers reference the Log4j migration guide in pull requests—most often in the description, with 82.81% pointing to the entire guide rather than sections—and the observation that references continue during later maintenance tasks, not just the initial major update. These are presented as new empirical details on a documentation type that has received less attention than release notes or API docs. The case study approach on a library with well-known breaking changes is a reasonable way to generate targeted data, and the straightforward counting from public PRs keeps the claims grounded in observable artifacts. That part of the work is useful for anyone who maintains libraries or writes migration documentation. The soft spot is exactly the one flagged in the stress test. Counting visible links does not confirm that developers read or applied the linked content, nor does it capture cases where the guide was consulted without a reference. The maintenance-task usage claim therefore rests on an assumption that is not checked with interviews, change tracing, or other validation. The paper would be tighter if it acknowledged this measurement gap more explicitly and discussed how the sample of PRs was filtered. The scope is also narrow to one library, which limits how far the percentages can be generalized. This is the sort of paper that would interest empirical software engineering researchers who study documentation practices and library maintainers who want practical feedback on their guides. A reader looking for quantitative patterns in real projects would get value from the reference-location breakdown and the timing observation. It has enough clear data collection and a well-defined question to deserve peer review, though reviewers will probably push on the validity of the reference proxy and ask for more on reproducibility of the dataset.

Referee Report

1 major / 2 minor

Summary. The paper examines migration guides as documentation for handling breaking changes in libraries. It first checks whether libraries with incompatibilities provide such guides, then presents a case study of Log4j analyzing references to its official migration guide within pull requests from real-world projects. Key findings include that PR authors most often reference the guide in the description (rather than comments), that 82.81% of references point to the entire guide instead of specific sections, and that references appear not only during major version migrations but also in later maintenance tasks.

Significance. If the observational measurements hold, the work supplies concrete empirical data on an under-studied form of documentation, showing that migration guides continue to be consulted after initial upgrades. This could guide improvements in how guides are structured and linked. The study draws on external project data without fitted parameters or self-referential derivations, providing a falsifiable count-based characterization of reference patterns.

major comments (1)

[Results / Empirical Analysis] The central observational claims (frequency of references in PR descriptions, 82.81% entire-guide links, and usage during maintenance tasks) rest on treating textual links and mentions in PR descriptions/comments as direct evidence of consultation and application. No validation is provided that a reference implies the author read or followed the linked content, nor that non-referenced changes occurred without the guide. This proxy assumption is load-bearing for interpreting both the section-link statistic and the maintenance-task claim.

minor comments (2)

[Abstract] The abstract states the 82.81% figure but does not indicate the total number of references or PRs analyzed; adding these counts (and confidence intervals) would improve precision.
[Methodology] Clarify the exact criteria used to classify a reference as 'to the entire guide' versus 'specific section' and how inter-rater agreement was assessed if multiple coders were involved.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting an important methodological point regarding the interpretation of our observational data. We address the comment below and outline revisions that will improve the precision of the manuscript.

read point-by-point responses

Referee: [Results / Empirical Analysis] The central observational claims (frequency of references in PR descriptions, 82.81% entire-guide links, and usage during maintenance tasks) rest on treating textual links and mentions in PR descriptions/comments as direct evidence of consultation and application. No validation is provided that a reference implies the author read or followed the linked content, nor that non-referenced changes occurred without the guide. This proxy assumption is load-bearing for interpreting both the section-link statistic and the maintenance-task claim.

Authors: We agree that references in pull requests constitute an indirect proxy for consultation and that we lack direct validation (e.g., via surveys or interaction logs) that developers read the linked content or that unreferenced changes were performed without the guide. This is an inherent limitation of repository-mining studies that rely on public artifacts. At the same time, explicitly including a link to the migration guide in a PR description provides observable evidence that the developer identified the guide as relevant to the changes under review. We classify maintenance-task PRs by examining titles, descriptions, and commit messages that indicate post-migration work rather than the initial upgrade. To address the concern, we will (1) add an explicit paragraph in a Threats to Validity section discussing the proxy nature of the measure and the possibility of unreferenced usage, and (2) revise wording in the abstract, introduction, and results to emphasize observed reference patterns rather than unobservable reading or application behaviors. These changes will be incorporated in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity in observational case study

full rationale

The paper is a purely empirical case study that collects and classifies references to an external migration guide within pull requests from open-source projects. No derivations, equations, fitted parameters, or predictions are present that could reduce findings to inputs by construction. Claims rest on direct counts and classifications of observable artifacts rather than any self-definitional, self-citation load-bearing, or ansatz-smuggling steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that pull-request metadata serves as a valid proxy for real developer consultation behavior and that the Log4j case generalizes to migration guide usage more broadly.

axioms (1)

domain assumption References in pull requests and their descriptions reliably indicate actual developer use of the migration guide.
The study infers usage patterns directly from these references without additional validation such as surveys or logs.

pith-pipeline@v0.9.0 · 5746 in / 1176 out tokens · 40668 ms · 2026-05-21T01:21:50.636042+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Aline Brito, Marco Valente, Laerte Xavier, and Andre Hora. 2020. You Broke My Code: Understanding the Motivations for Breaking Changes in APIs.Empirical Softw. Engg.25 (03 2020), 1458–1492. doi:10.1007/s10664-019-09756-z

work page doi:10.1007/s10664-019-09756-z 2020
[2]

Farbod Daneshyan, Runzhi He, Jianyu Wu, and Minghui Zhou. 2025. SmartNote: An LLM-Powered, Personalised Release Note Generator That Just Works.Proc. ACM Softw. Eng.2, FSE, Article FSE075 (June 2025), 24 pages. doi:10.1145/3729345

work page doi:10.1145/3729345 2025
[3]

Erik Derr, Sven Bugiel, Sascha Fahl, Yasemin Acar, and Michael Backes. 2017. Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). 2187–2200. doi:10.1145/3133956.3134059

work page doi:10.1145/3133956.3134059 2017
[4]

Abram Hindle, Daniel German, Michael Godfrey, and Richard Holt. 2009. Auto- matic Classification of Large Changes into Maintenance Categories. InProceedings of the 17th International Conference on Program Comprehension (ICPC 2009). 30 –

work page 2009
[5]

doi:10.1109/ICPC.2009.5090025

work page doi:10.1109/icpc.2009.5090025 2009
[6]

1998.IEEE Standard for Software Maintenance

IEEE. 1998.IEEE Standard for Software Maintenance. Technical Report IEEE Std 1219-1998. IEEE

work page 1998
[7]

Dhanushka Jayasuriya, Samuel Ou, Saakshi Hegde, Valerio Terragni, Jens Dietrich, and Kelly Blincoe. 2024. An extended study of syntactic breaking changes in the wild.Empirical Softw. Engg.30, 2 (Dec. 2024), 45 pages. doi:10.1007/s10664-024- 10563-4

work page doi:10.1007/s10664-024- 2024
[8]

Chia Hung Kao, Cheng-Ying Chang, and Hewijin Christine Jiau. 2022. Towards cost-effective API deprecation: A win–win strategy for API developers and API users.Information and Software Technology142 (2022), 106746

work page 2022
[9]

Deokyoon Ko, Kyeongwook Ma, Sooyong Park, Suntae Kim, Dongsun Kim, and Yves Le Traon. 2014. API Document Quality for Resolving Deprecated APIs. In Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, Vol. 2. 27–30. doi:10.1109/APSEC.2014.87

work page doi:10.1109/apsec.2014.87 2014
[10]

Sai Pranav Koyyada, Denim Deshmukh Deepika Badampudi, Vida Ahmadi, and Muhammad Usman. 2022. Towards automated open source assessment – An empirical study. arXiv:2212.00087 [cs.SE] https://arxiv.org/abs/2212.00087

work page arXiv 2022
[11]

Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How Does Web Service API Evolution Affect Clients?. InProceedings of the 2013 IEEE 20th International Conference on Web Services. 300–307. doi:10.1109/ICWS.2013.48

work page doi:10.1109/icws.2013.48 2013
[12]

We Feel Like We’re Winging It:

Courtney Miller, Christian Kästner, and Bogdan Vasilescu. 2023. “We Feel Like We’re Winging It:” A Study on Navigating Open-Source Dependency Aban- donment. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). 1281–1293. doi:10.1145/3611643.3616293

work page doi:10.1145/3611643.3616293 2023
[13]

David Novick and Karen Ward. 2006. Why don’t people read the manual? Departmental Papers (CS)(10 2006). doi:10.1145/1166324.1166329

work page doi:10.1145/1166324.1166329 2006
[14]

Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to turn the IDE into a self- confident programming prompter. InProceedings of the 11th Working Confer- ence on Mining Software Repositories(Hyderabad, India)(MSR 2014). 102–111. doi:10.1145/2597073.2597077

work page doi:10.1145/2597073.2597077 2014
[15]

Frank Reyes, Yogya Gamage, Gabriel Skoglund, Benoit Baudry, and Martin Mon- perrus. 2024. BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates. InProceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 159–170. doi:10.1109/SANER60148.2024. 00024

work page doi:10.1109/saner60148.2024 2024
[16]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agree- ment for categorical data.Biometrics33, 1 (1977), 159–174

work page 1977
[17]

Burton Swanson

E. Burton Swanson. 1976. The dimensions of maintenance. InProceedings of the 2nd International Conference on Software Engineering. https://api.semanticscholar. org/CorpusID:17035728

work page 1976
[18]

Hidetake Tanaka, Kazuma Yamasaki, Momoka Hirose, Takashi Nakano, Youmei Fan, Kazumasa Shimari, Raula Gaikovina Kula, and Kenichi Matsumoto. 2025. Mining for Lags in Updating Critical Security Threats: A Case Study of Log4j Library. InProceedings of the 22nd International Conference on Mining Software Repositories (MSR 2025). 319–323. doi:10.1109/MSR66628....

work page doi:10.1109/msr66628.2025.00058 2025
[19]

Daniel Venturini, Filipe Roseiro Cogo, Ivanilton Polato, Marco A Gerosa, and Igor Scaliante Wiese. 2023. I depended on you and you broke me: An empirical study of manifesting breaking changes in client packages.ACM Transactions on Software Engineering and Methodology32, 4 (2023), 1–26

work page 2023
[20]

Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente. 2017. Historical and impact analysis of API breaking changes: A large-scale study. InProceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering (SANER 2017). IEEE, 138–147

work page 2017
[21]

Jerin Yasmin, Yuan Tian, and Jinqiu Yang. 2020. A First Look at the Deprecation of RESTful APIs: An Empirical Study . InProceedings of the 36th International Conference on Software Maintenance and Evolution (ICSME 2020). 151–161. doi:10. 1109/ICSME46990.2020.00024

work page arXiv 2020
[22]

Fiorella Zampetti, Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, and Michele Lanza. 2017. How developers document pull requests with external references. InProceedings of the 25th International Conference on Program Comprehension (ICPC 2017). IEEE, 23–33

work page 2017

[1] [1]

Aline Brito, Marco Valente, Laerte Xavier, and Andre Hora. 2020. You Broke My Code: Understanding the Motivations for Breaking Changes in APIs.Empirical Softw. Engg.25 (03 2020), 1458–1492. doi:10.1007/s10664-019-09756-z

work page doi:10.1007/s10664-019-09756-z 2020

[2] [2]

Farbod Daneshyan, Runzhi He, Jianyu Wu, and Minghui Zhou. 2025. SmartNote: An LLM-Powered, Personalised Release Note Generator That Just Works.Proc. ACM Softw. Eng.2, FSE, Article FSE075 (June 2025), 24 pages. doi:10.1145/3729345

work page doi:10.1145/3729345 2025

[3] [3]

Erik Derr, Sven Bugiel, Sascha Fahl, Yasemin Acar, and Michael Backes. 2017. Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). 2187–2200. doi:10.1145/3133956.3134059

work page doi:10.1145/3133956.3134059 2017

[4] [4]

Abram Hindle, Daniel German, Michael Godfrey, and Richard Holt. 2009. Auto- matic Classification of Large Changes into Maintenance Categories. InProceedings of the 17th International Conference on Program Comprehension (ICPC 2009). 30 –

work page 2009

[5] [5]

doi:10.1109/ICPC.2009.5090025

work page doi:10.1109/icpc.2009.5090025 2009

[6] [6]

1998.IEEE Standard for Software Maintenance

IEEE. 1998.IEEE Standard for Software Maintenance. Technical Report IEEE Std 1219-1998. IEEE

work page 1998

[7] [7]

Dhanushka Jayasuriya, Samuel Ou, Saakshi Hegde, Valerio Terragni, Jens Dietrich, and Kelly Blincoe. 2024. An extended study of syntactic breaking changes in the wild.Empirical Softw. Engg.30, 2 (Dec. 2024), 45 pages. doi:10.1007/s10664-024- 10563-4

work page doi:10.1007/s10664-024- 2024

[8] [8]

Chia Hung Kao, Cheng-Ying Chang, and Hewijin Christine Jiau. 2022. Towards cost-effective API deprecation: A win–win strategy for API developers and API users.Information and Software Technology142 (2022), 106746

work page 2022

[9] [9]

Deokyoon Ko, Kyeongwook Ma, Sooyong Park, Suntae Kim, Dongsun Kim, and Yves Le Traon. 2014. API Document Quality for Resolving Deprecated APIs. In Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, Vol. 2. 27–30. doi:10.1109/APSEC.2014.87

work page doi:10.1109/apsec.2014.87 2014

[10] [10]

Sai Pranav Koyyada, Denim Deshmukh Deepika Badampudi, Vida Ahmadi, and Muhammad Usman. 2022. Towards automated open source assessment – An empirical study. arXiv:2212.00087 [cs.SE] https://arxiv.org/abs/2212.00087

work page arXiv 2022

[11] [11]

Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How Does Web Service API Evolution Affect Clients?. InProceedings of the 2013 IEEE 20th International Conference on Web Services. 300–307. doi:10.1109/ICWS.2013.48

work page doi:10.1109/icws.2013.48 2013

[12] [12]

We Feel Like We’re Winging It:

Courtney Miller, Christian Kästner, and Bogdan Vasilescu. 2023. “We Feel Like We’re Winging It:” A Study on Navigating Open-Source Dependency Aban- donment. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). 1281–1293. doi:10.1145/3611643.3616293

work page doi:10.1145/3611643.3616293 2023

[13] [13]

David Novick and Karen Ward. 2006. Why don’t people read the manual? Departmental Papers (CS)(10 2006). doi:10.1145/1166324.1166329

work page doi:10.1145/1166324.1166329 2006

[14] [14]

Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to turn the IDE into a self- confident programming prompter. InProceedings of the 11th Working Confer- ence on Mining Software Repositories(Hyderabad, India)(MSR 2014). 102–111. doi:10.1145/2597073.2597077

work page doi:10.1145/2597073.2597077 2014

[15] [15]

Frank Reyes, Yogya Gamage, Gabriel Skoglund, Benoit Baudry, and Martin Mon- perrus. 2024. BUMP: A Benchmark of Reproducible Breaking Dependency Up- dates. InProceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 159–170. doi:10.1109/SANER60148.2024. 00024

work page doi:10.1109/saner60148.2024 2024

[16] [16]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agree- ment for categorical data.Biometrics33, 1 (1977), 159–174

work page 1977

[17] [17]

Burton Swanson

E. Burton Swanson. 1976. The dimensions of maintenance. InProceedings of the 2nd International Conference on Software Engineering. https://api.semanticscholar. org/CorpusID:17035728

work page 1976

[18] [18]

Hidetake Tanaka, Kazuma Yamasaki, Momoka Hirose, Takashi Nakano, Youmei Fan, Kazumasa Shimari, Raula Gaikovina Kula, and Kenichi Matsumoto. 2025. Mining for Lags in Updating Critical Security Threats: A Case Study of Log4j Library. InProceedings of the 22nd International Conference on Mining Software Repositories (MSR 2025). 319–323. doi:10.1109/MSR66628....

work page doi:10.1109/msr66628.2025.00058 2025

[19] [19]

Daniel Venturini, Filipe Roseiro Cogo, Ivanilton Polato, Marco A Gerosa, and Igor Scaliante Wiese. 2023. I depended on you and you broke me: An empirical study of manifesting breaking changes in client packages.ACM Transactions on Software Engineering and Methodology32, 4 (2023), 1–26

work page 2023

[20] [20]

Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente. 2017. Historical and impact analysis of API breaking changes: A large-scale study. InProceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering (SANER 2017). IEEE, 138–147

work page 2017

[21] [21]

Jerin Yasmin, Yuan Tian, and Jinqiu Yang. 2020. A First Look at the Deprecation of RESTful APIs: An Empirical Study . InProceedings of the 36th International Conference on Software Maintenance and Evolution (ICSME 2020). 151–161. doi:10. 1109/ICSME46990.2020.00024

work page arXiv 2020

[22] [22]

Fiorella Zampetti, Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, and Michele Lanza. 2017. How developers document pull requests with external references. InProceedings of the 25th International Conference on Program Comprehension (ICPC 2017). IEEE, 23–33

work page 2017