Toward an Understanding of Developer Behaviour while Using Bug Localization Tools

Michel Wermelinger; Pablo Diaz Pedreira; Tamara Lopez

arxiv: 2605.04828 · v1 · submitted 2026-05-06 · 💻 cs.SE

Toward an Understanding of Developer Behaviour while Using Bug Localization Tools

Pablo Diaz Pedreira , Tamara Lopez , Michel Wermelinger This is my paper

Pith reviewed 2026-05-08 16:14 UTC · model grok-4.3

classification 💻 cs.SE

keywords bug localizationdeveloper behaviorqualitative studytool adoptionsoftware debuggingthink-aloud protocolproblem solving

0 comments

The pith

Developers use bug localization tools through complex interactions that go beyond the tools' reported accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes a qualitative study in which eleven developers completed four realistic bug localization tasks while thinking aloud, using a tool that provided varying levels of support information. The sessions revealed three main aspects of practice: specific ways developers interact with the tool, the significant role played by social and contextual information, and individual problem-solving strategies. The work establishes that bug localization is a multifaceted activity and that tool adoption decisions hinge on factors other than accuracy alone. A reader would care because this helps explain why technically strong tools often see limited uptake in actual development work.

Core claim

The study shows that developers interact with bug localization tools in particular patterns, draw heavily on social and contextual information outside the tool's output, and employ diverse problem-solving approaches during the task. These observations establish that bug localization is complex and that the adoption of effective tools depends on more than their accuracy.

What carries the argument

Qualitative analysis of think-aloud protocols collected during controlled bug localization tasks that supplied different amounts of tool support information.

If this is right

Tool interfaces should support access to social and contextual information alongside code suggestions.
Evaluation of bug localization tools must include measures of workflow fit beyond precision and recall.
Developers apply varied problem-solving styles, so tools need flexibility rather than a single prescribed workflow.
Adoption decisions involve factors outside the tool itself, such as team communication channels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The findings point to a possible reason many research prototypes remain unused despite strong benchmark scores.
Future designs could embed lightweight sharing features so contextual notes travel with the tool output.
Replicating the study across different team sizes and cultures could test whether the observed social influences vary systematically.

Load-bearing premise

That the interaction patterns and decision factors observed among eleven participants in a lab setting with think-aloud instructions represent how developers behave with such tools in everyday professional work.

What would settle it

A field observation of many developers in their normal projects showing that a high-accuracy bug localization tool is adopted or rejected primarily according to its precision score rather than social or contextual factors.

Figures

Figures reproduced from arXiv: 2605.04828 by Michel Wermelinger, Pablo Diaz Pedreira, Tamara Lopez.

**Figure 1.** Figure 1: IntelliJ IDE showing BR 54095 for Task 4. Tool view at source ↗

read the original abstract

Bug fixing is a complex and time-consuming task in software development. Bug localization research tends to focus on the accuracy of automated tools that suggest source code files for developers to look at. However, little is known about how developers use these tools in practice. This paper reports on an ongoing qualitative user study. Eleven participants worked through four realistic bug localization tasks in a controlled environment and were given varying levels of support information offered by a specialized tool. Participants were asked to think aloud in a semi-structured interview session. The preliminary findings provide insight into three aspects of practice: how developers interact with tools, the role social and contextual information plays, and problem solving. The study demonstrates that bug localization is complex and suggests that the adoption of effective tools depends on more than their accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a small preliminary qualitative study offering observations on how developers interact with bug localization tools, but the n=11 lab setup limits what it can say about real adoption.

read the letter

The main thing here is a set of preliminary observations from eleven developers working four realistic bug localization tasks with varying tool support. They used think-aloud and semi-structured interviews to look at tool interactions, the role of social and contextual information, and problem-solving approaches. The paper notes that bug localization is complex and that tool adoption likely depends on more than raw accuracy. That framing is the modest new angle beyond the usual focus on algorithm performance in this subfield. The setup with controlled but realistic tasks and different support levels is a reasonable way to surface behavioral patterns that accuracy papers often miss. It gives tool designers something concrete to consider about usability and context. The soft spots are exactly what the stress-test flags. Eleven participants in a lab with think-aloud protocol makes it hard to separate general patterns from individual quirks or study artifacts, and there's no production time pressure or team setting to test against. The abstract gives little on how the data were coded or checked for consistency, which leaves the themes feeling tentative. Since the work is ongoing, the claims stay suggestive rather than firm. This is mainly for software engineering researchers who build or evaluate developer tools and want to think about adoption factors. Someone already working on qualitative studies in SE might pick up the method details for their own designs. It deserves a serious referee to check the full analysis and limitations section, even if revisions will be needed on scope and generalizability.

Referee Report

2 major / 1 minor

Summary. The manuscript reports preliminary results from an ongoing qualitative user study in which eleven participants completed four realistic bug localization tasks in a controlled lab environment. Participants used a specialized tool offering varying levels of support information, thought aloud during the tasks, and participated in semi-structured interviews. The study examines three aspects of practice—tool interactions, the role of social and contextual information, and problem-solving strategies—and concludes that bug localization is complex and that effective tool adoption depends on factors beyond accuracy.

Significance. If the observations hold under further validation, the work provides useful qualitative insight into how developers actually engage with bug localization tools, moving beyond the dominant focus on algorithmic accuracy. The use of realistic tasks and standard methods (think-aloud protocols and semi-structured interviews) is a positive feature that grounds the findings in observable behavior.

major comments (2)

Abstract: the central claim that the study 'demonstrates that bug localization is complex' and that 'adoption of effective tools depends on more than their accuracy' rests on observations from only eleven participants in a lab setting; this small sample and artificial environment (no production stakes, team dynamics, or time pressure) limits the ability to distinguish genuine practice from study artifacts and weakens generalizability to real-world adoption decisions.
Abstract / implied methods section: no details are supplied on qualitative analysis procedures, theme derivation, inter-rater reliability, or how preliminary findings were validated, making it difficult to assess the rigor and trustworthiness of the reported insights on tool interaction, social/contextual information, and problem solving.

minor comments (1)

The manuscript should explicitly discuss the preliminary status and planned next steps (e.g., additional participants or validation) so readers can properly calibrate the strength of the current claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our preliminary qualitative study. We agree that the current manuscript would benefit from clearer framing of its limitations and expanded methodological detail. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [—] Abstract: the central claim that the study 'demonstrates that bug localization is complex' and that 'adoption of effective tools depends on more than their accuracy' rests on observations from only eleven participants in a lab setting; this small sample and artificial environment (no production stakes, team dynamics, or time pressure) limits the ability to distinguish genuine practice from study artifacts and weakens generalizability to real-world adoption decisions.

Authors: We accept that the abstract's phrasing is too strong for a preliminary study. Qualitative work in software engineering commonly uses small samples to surface phenomena that warrant further investigation; the lab setting was deliberately chosen to enable think-aloud protocols and controlled task exposure. We will revise the abstract to replace 'demonstrates' with 'suggests' and to explicitly note the preliminary nature of the findings. We will also expand the discussion and limitations sections to address the absence of production stakes, team dynamics, and time pressure, and to clarify that the results are intended to generate hypotheses rather than to generalize directly to industrial practice. revision: yes
Referee: [—] Abstract / implied methods section: no details are supplied on qualitative analysis procedures, theme derivation, inter-rater reliability, or how preliminary findings were validated, making it difficult to assess the rigor and trustworthiness of the reported insights on tool interaction, social/contextual information, and problem solving.

Authors: The referee is correct that the current version provides insufficient methodological transparency. Because this is a preliminary report from an ongoing study, the analysis description was kept brief. In the revision we will add a dedicated subsection under Methods that describes the thematic analysis process (iterative coding of think-aloud transcripts and interview data), how themes were derived and refined, and the steps taken for trustworthiness (e.g., reflexive memoing and peer debriefing). We will also note that, with a single analyst at this stage, formal inter-rater reliability metrics were not computed; any future multi-analyst validation will be reported when the full study is completed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on direct empirical observations

full rationale

The paper is a qualitative user study reporting preliminary findings from think-aloud sessions with eleven participants on four bug localization tasks. It contains no equations, derivations, fitted parameters, or predictions that reduce to inputs by construction. Claims about complexity of bug localization and factors beyond tool accuracy are presented as direct interpretations of observed participant behavior and interactions, without self-definitional loops, self-citation load-bearing premises, or renaming of known results. The derivation chain is self-contained against the collected data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative empirical study with no mathematical derivations. No free parameters, axioms, or invented entities are introduced; the central claims rest on the interpretive validity of the collected interview and observation data.

pith-pipeline@v0.9.0 · 5424 in / 1082 out tokens · 51701 ms · 2026-05-08T16:14:58.660615+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Jorge Aranda and Gina Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. InProc. 31st Int’l Conf. on Software Engineering. IEEE, 298–308

work page 2009
[2]

Andrea J. Bingham. 2023. From Data Management to Actionable Findings: A Five- Phase Process of Qualitative Data Analysis.International Journal of Qualitative Methods22 (Oct. 2023), 1–11

work page 2023
[3]

L. Braz, C. Aeberhard, G. Çalikli, and A. Bacchelli. 2022. Less is More: Supporting Developers in Vulnerability Detection during Code Review. InProc. 44th Int’l Conf. on Software Engineering. ACM, 1317–1329

work page 2022
[4]

Oscar Chaparro, Juan Manuel Florez, and Andrian Marcus. 2019. Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering24, 5 (Oct. 2019), 2947–3007

work page 2019
[5]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey.Journal of Software: Evolution and Process25, 1 (Nov. 2013), 53–95

work page 2013
[6]

Abram Hindle and Curtis Onuczko. 2019. Preventing duplicate bug reports by continuously querying bug reports.Empirical Software Engineering24, 2 (April 2019), 902–936

work page 2019
[7]

Thomas Hirsch and Birgit Hofer. 2021. What we can learn from how programmers debug their code. InProc. 8th Int’l Workshop on Softw. Eng. Research and Industrial Practice. IEEE, 37–40

work page 2021
[8]

Pavneet Singh Kochharand, Xin Xia, David Lo, and Shanping Li. 2016. Practition- ers’ expectations on automated fault localization. InProc. 25th Int’l Symposium on Software Testing and Analysis. ACM, 165–176

work page 2016
[9]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug Localization with Combination of Deep Learning and Information Retrieval. InProc. 25th Int’l Conf. on Program Comprehension. IEEE, 218–229

work page 2017
[10]

Le, Ferdian Thung, and David Lo

Tien-Duy B. Le, Ferdian Thung, and David Lo. 2017. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools.Empirical Software Engineering22, 4 (Aug. 2017), 2237–2279

work page 2017
[11]

Wei Li, Qingan Li, Yunlong Ming, Weijiao Dai, Shi Ying, and Mengting Yuan

work page
[12]

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects.Empirical Software Engineering27, 2 (2022), 47

work page 2022
[13]

Zhengmao Luo, Wenyao Wang, and Caichun Cen. 2023. Improving Bug Localiza- tion With Effective Contrastive Learning Representation.IEEE Access11 (2023), 32523–32533

work page 2023
[14]

Binhang Qi, Hailong Sun, Wei Yuan, Hongyu Zhang, and Xiangxin Meng. 2022. DreamLoc: A Deep Relevance Matching-Based Framework for bug Localization. IEEE Transactions on Reliability71, 1 (March 2022), 235–249

work page 2022
[15]

McBurney, and Collin McMillan

Paige Rodeghero, Cheng Liu, Paul W. McBurney, and Collin McMillan. 2015. An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization.IEEE Transactions on Software Engineering41, 11 (Nov. 2015), 1038–1054

work page 2015
[16]

Murphy, and Kris De Volder

Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2008. Asking and Answering Questions during a Programming Change Task.IEEE Transactions on Software Engineering34, 4 (July 2008), 434–451

work page 2008
[17]

Shaohua Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization.Journal of Software: Evolution and Process 28, 10 (Oct. 2016), 921–942

work page 2016
[18]

Jacqueline Whalley and Nadia Kasto. 2014. A qualitative think-aloud study of novice programmers’ code writing strategies. InConf. on Innovation & Technology in Computer Science Education. ACM, 279–284

work page 2014
[19]

Xi Xiao, Renjie Xiao, Qing Li, Jianhui Lv, Shunyan Cui, and Qixu Liu. 2023. BugRadar: Bug localization by knowledge graph link prediction.Information and Software Technology162 (Oct. 2023), 107274

work page 2023
[20]

Yan Xiao, Jacky Keung, Qing Mi, and Kwabena E. Bennin. 2017. Improving Bug Localization with an Enhanced Convolutional Neural Network. InProc. 24th Asia-Pacific Software Engineering Conf.IEEE, 338–347

work page 2017
[21]

Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In22nd ACM SIGSOFT Int’l Symposium on Foundations of Software Engineering. ACM, 689–699

work page 2014
[22]

Xia Zhang, Ziye Zhu, and Yun Li. 2023. Enhancing Bug Localization through Bug Report Summarization. InProc. Int’l Conf. on Data Mining. IEEE, 1541–1546

work page 2023
[23]

Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. InProc. 34th Int’l Conf. on Software Engineering. IEEE, 14–24

work page 2012
[24]

Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. 2010. What Makes a Good Bug Report?IEEE Transactions on Software Engineering36, 5 (Sept. 2010), 618–643

work page 2010

[1] [1]

Jorge Aranda and Gina Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. InProc. 31st Int’l Conf. on Software Engineering. IEEE, 298–308

work page 2009

[2] [2]

Andrea J. Bingham. 2023. From Data Management to Actionable Findings: A Five- Phase Process of Qualitative Data Analysis.International Journal of Qualitative Methods22 (Oct. 2023), 1–11

work page 2023

[3] [3]

L. Braz, C. Aeberhard, G. Çalikli, and A. Bacchelli. 2022. Less is More: Supporting Developers in Vulnerability Detection during Code Review. InProc. 44th Int’l Conf. on Software Engineering. ACM, 1317–1329

work page 2022

[4] [4]

Oscar Chaparro, Juan Manuel Florez, and Andrian Marcus. 2019. Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering24, 5 (Oct. 2019), 2947–3007

work page 2019

[5] [5]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey.Journal of Software: Evolution and Process25, 1 (Nov. 2013), 53–95

work page 2013

[6] [6]

Abram Hindle and Curtis Onuczko. 2019. Preventing duplicate bug reports by continuously querying bug reports.Empirical Software Engineering24, 2 (April 2019), 902–936

work page 2019

[7] [7]

Thomas Hirsch and Birgit Hofer. 2021. What we can learn from how programmers debug their code. InProc. 8th Int’l Workshop on Softw. Eng. Research and Industrial Practice. IEEE, 37–40

work page 2021

[8] [8]

Pavneet Singh Kochharand, Xin Xia, David Lo, and Shanping Li. 2016. Practition- ers’ expectations on automated fault localization. InProc. 25th Int’l Symposium on Software Testing and Analysis. ACM, 165–176

work page 2016

[9] [9]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug Localization with Combination of Deep Learning and Information Retrieval. InProc. 25th Int’l Conf. on Program Comprehension. IEEE, 218–229

work page 2017

[10] [10]

Le, Ferdian Thung, and David Lo

Tien-Duy B. Le, Ferdian Thung, and David Lo. 2017. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools.Empirical Software Engineering22, 4 (Aug. 2017), 2237–2279

work page 2017

[11] [11]

Wei Li, Qingan Li, Yunlong Ming, Weijiao Dai, Shi Ying, and Mengting Yuan

work page

[12] [12]

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects.Empirical Software Engineering27, 2 (2022), 47

work page 2022

[13] [13]

Zhengmao Luo, Wenyao Wang, and Caichun Cen. 2023. Improving Bug Localiza- tion With Effective Contrastive Learning Representation.IEEE Access11 (2023), 32523–32533

work page 2023

[14] [14]

Binhang Qi, Hailong Sun, Wei Yuan, Hongyu Zhang, and Xiangxin Meng. 2022. DreamLoc: A Deep Relevance Matching-Based Framework for bug Localization. IEEE Transactions on Reliability71, 1 (March 2022), 235–249

work page 2022

[15] [15]

McBurney, and Collin McMillan

Paige Rodeghero, Cheng Liu, Paul W. McBurney, and Collin McMillan. 2015. An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization.IEEE Transactions on Software Engineering41, 11 (Nov. 2015), 1038–1054

work page 2015

[16] [16]

Murphy, and Kris De Volder

Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2008. Asking and Answering Questions during a Programming Change Task.IEEE Transactions on Software Engineering34, 4 (July 2008), 434–451

work page 2008

[17] [17]

Shaohua Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization.Journal of Software: Evolution and Process 28, 10 (Oct. 2016), 921–942

work page 2016

[18] [18]

Jacqueline Whalley and Nadia Kasto. 2014. A qualitative think-aloud study of novice programmers’ code writing strategies. InConf. on Innovation & Technology in Computer Science Education. ACM, 279–284

work page 2014

[19] [19]

Xi Xiao, Renjie Xiao, Qing Li, Jianhui Lv, Shunyan Cui, and Qixu Liu. 2023. BugRadar: Bug localization by knowledge graph link prediction.Information and Software Technology162 (Oct. 2023), 107274

work page 2023

[20] [20]

Yan Xiao, Jacky Keung, Qing Mi, and Kwabena E. Bennin. 2017. Improving Bug Localization with an Enhanced Convolutional Neural Network. InProc. 24th Asia-Pacific Software Engineering Conf.IEEE, 338–347

work page 2017

[21] [21]

Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In22nd ACM SIGSOFT Int’l Symposium on Foundations of Software Engineering. ACM, 689–699

work page 2014

[22] [22]

Xia Zhang, Ziye Zhu, and Yun Li. 2023. Enhancing Bug Localization through Bug Report Summarization. InProc. Int’l Conf. on Data Mining. IEEE, 1541–1546

work page 2023

[23] [23]

Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. InProc. 34th Int’l Conf. on Software Engineering. IEEE, 14–24

work page 2012

[24] [24]

Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. 2010. What Makes a Good Bug Report?IEEE Transactions on Software Engineering36, 5 (Sept. 2010), 618–643

work page 2010