Enhancing Python Compiler Error Messages via Stack Overflow
Pith reviewed 2026-05-25 14:55 UTC · model grok-4.3
The pith
Stack Overflow threads can be automatically mined and summarized to enhance Python compiler error messages inside an IDE.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pycee automatically queries Stack Overflow to provide customised and summarised information about Python compiler errors within the Sublime Text IDE. When evaluated in a user study, the majority of the 16 participants agreed that Pycee was helpful, and they generally preferred it to a baseline using official Python documentation due to its concrete suggestions for fixes and example code.
What carries the argument
Pycee, an IDE plugin that automatically queries Stack Overflow and repackages relevant thread content as enhanced error messages.
If this is right
- Programmers receive fix suggestions and examples directly in the editor instead of searching separately.
- Official documentation is no longer the only source for improving error messages.
- Time spent resolving common Python errors can decrease for users of the enhanced messages.
- The same reuse of online Q&A content becomes feasible for other programming tasks beyond error messages.
Where Pith is reading between the lines
- The approach could be tested on languages other than Python where Stack Overflow has dense error discussions.
- If the summarization step is made more robust, the same pipeline might apply to runtime errors or warnings.
- Integration into other editors would let the benefit reach programmers who do not use Sublime Text.
Load-bearing premise
Stack Overflow threads contain accurate, relevant, and summarizable information about Python errors that improves programmer understanding without introducing new confusion or incorrect advice.
What would settle it
A controlled study in which programmers using the Stack Overflow summaries take longer to fix errors or introduce more new bugs than those using only the official documentation would falsify the central claim.
Figures
read the original abstract
Background: Compilers tend to produce cryptic and uninformative error messages, leaving programmers confused and requiring them to spend precious time to resolve the underlying error. To find help, programmers often take to online question-and-answer forums such as Stack Overflow to start discussion threads about the errors they encountered. Aims: We conjecture that information from Stack Overflow threads which discuss compiler errors can be automatically collected and repackaged to provide programmers with enhanced compiler error messages, thus saving programmers' time and energy. Method: We present Pycee, a plugin integrated with the popular Sublime Text IDE to provide enhanced compiler error messages for the Python programming language. Pycee automatically queries Stack Overflow to provide customised and summarised information within the IDE. We evaluated two Pycee variants through a think-aloud user study during which 16 programmers completed Python programming tasks while using Pycee. Results: The majority of participants agreed that Pycee was helpful while completing the study tasks. When compared to a baseline relying on the official Python documentation to enhance compiler error messages, participants generally preferred Pycee in terms of helpfulness, citing concrete suggestions for fixes and example code as major benefits. Conclusions: Our results confirm that data from online sources such as Stack Overflow can be successfully used to automatically enhance compiler error messages. Our work opens up venues for future work to further enhance compiler error messages as well as to automatically reuse content from Stack Overflow for other aspects of programming.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Pycee, a Sublime Text IDE plugin that automatically queries Stack Overflow to retrieve, summarize, and display customized information alongside Python compiler error messages. Two variants are evaluated in a think-aloud study with 16 participants who completed programming tasks; results show that the majority found Pycee helpful and generally preferred it to a baseline that enhanced messages using official Python documentation, primarily due to concrete fix suggestions and example code. The authors conclude that Stack Overflow data can be successfully reused to enhance compiler error messages.
Significance. If the central claim holds, the work demonstrates a practical approach to repurposing online Q&A content for IDE tooling, supported by an empirical user study with direct baseline comparison. This provides qualitative evidence of user preference and opens directions for similar applications to other languages or programming activities. The inclusion of a controlled think-aloud protocol with participant feedback is a positive aspect of the evaluation design.
major comments (2)
- [Results / User Study] §Results / User Study: The claim that Stack Overflow data can be 'successfully used' to enhance error messages is supported only by subjective reports of helpfulness and preference from 16 participants. No objective metrics (task completion rates, time-to-fix, or pre/post understanding scores) are reported, so the evidence does not directly address whether the SO-derived content improves error resolution or merely appears appealing.
- [Method] §Method: The baseline condition uses 'official Python documentation to enhance compiler error messages,' yet the paper provides no description of how this baseline was implemented or how its content was selected and presented, preventing assessment of whether the observed preference is attributable to SO content specifically or to differences in summarization style.
minor comments (1)
- [Abstract] Abstract: The abstract states that 'two Pycee variants' were evaluated but does not indicate what distinguishes the variants or which results apply to each.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Results / User Study] §Results / User Study: The claim that Stack Overflow data can be 'successfully used' to enhance error messages is supported only by subjective reports of helpfulness and preference from 16 participants. No objective metrics (task completion rates, time-to-fix, or pre/post understanding scores) are reported, so the evidence does not directly address whether the SO-derived content improves error resolution or merely appears appealing.
Authors: We agree that the evaluation relies on subjective participant reports from a think-aloud study with 16 programmers rather than objective measures such as task completion time or error resolution accuracy. The study design prioritised qualitative insights into perceived helpfulness and preference under realistic conditions, which aligns with the goal of assessing tool usability. However, the concluding claim that Stack Overflow data can be 'successfully used' is stronger than the subjective evidence warrants. We will revise the Conclusions section to state that the results provide evidence of user preference for the SO-enhanced messages, without claiming objective improvements in error resolution. revision: partial
-
Referee: [Method] §Method: The baseline condition uses 'official Python documentation to enhance compiler error messages,' yet the paper provides no description of how this baseline was implemented or how its content was selected and presented, preventing assessment of whether the observed preference is attributable to SO content specifically or to differences in summarization style.
Authors: We accept this criticism. The baseline was created by selecting relevant excerpts from the official Python documentation for each encountered error and formatting them similarly to the Pycee output. We will expand the Method section with a full description of baseline content selection, summarisation approach, and presentation format to enable clearer comparison. revision: yes
Circularity Check
No circularity: empirical evaluation with external user feedback
full rationale
The paper describes an empirical tool (Pycee) that queries Stack Overflow and a think-aloud user study with 16 participants comparing it to official documentation. No equations, fitted parameters, predictions, or derivations appear in the abstract or described method. The central claim rests on participant preference ratings, which constitute external feedback rather than any self-referential reduction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Stack Overflow threads contain accurate and relevant information about Python compiler errors that can be automatically retrieved and summarized to help programmers.
Reference graph
Works this paper leans on
-
[1]
R. L. Wexelblat, “Maxims for malfeasant designers, or how to design languages to make programming as difficult as possible,” inProceedings of the International Conference on Software Engineering, 1976, pp. 331– 336
work page 1976
-
[2]
On compiler error messages: What they say and what they mean,
V . J. Traver, “On compiler error messages: What they say and what they mean,” Advances in Human-Computer Interaction , vol. 2010, pp. 3:1–3:26, 2010
work page 2010
-
[3]
An effective approach to enhancing compiler error messages,
B. A. Becker, “An effective approach to enhancing compiler error messages,” in Proceedings of the Technical Symposium on Computing Science Education, 2016, pp. 126–131
work page 2016
-
[4]
Mind your language: On novices’ interactions with error messages,
G. Marceau, K. Fisler, and S. Krishnamurthi, “Mind your language: On novices’ interactions with error messages,” in Proceedings of the Sym- posium on New Ideas, New Paradigms, and Reflections on Programming and Software, 2011, pp. 3–18
work page 2011
-
[5]
How do programmers ask and answer questions on the web? (NIER track),
C. Treude, O. Barzilay, and M.-A. Storey, “How do programmers ask and answer questions on the web? (NIER track),” in Proceedings of the International Conference on Software Engineering , 2011, pp. 804–807
work page 2011
-
[6]
Ranking crowd knowledge to assist software development,
L. B. L. de Souza, E. C. Campos, and M. de Almeida Maia, “Ranking crowd knowledge to assist software development,” in Proceedings of the International Conference on Program Comprehension, 2014, pp. 72–82
work page 2014
-
[7]
What makes a good code example?: A study of programming Q&A in StackOverflow,
S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns, “What makes a good code example?: A study of programming Q&A in StackOverflow,” in Proceedings of the International Conference on Software Maintenance , 2012, pp. 25–34
work page 2012
-
[8]
Redocumenting APIs with crowd knowledge: a coverage analysis based on question types,
F. M. Delfim, K. V . R. Paix ˜ao, D. Cassou, and M. de Almeida Maia, “Redocumenting APIs with crowd knowledge: a coverage analysis based on question types,” Journal of the Brazilian Computer Society , vol. 22, no. 1, 2016
work page 2016
-
[9]
P. Chatterjee, M. A. Nishi, K. Damevski, V . Augustine, L. Pollock, and N. A. Kraft, “What information about code snippets is available in differ- ent software-related documents? An exploratory study,” in Proceedings of the International Conference on Software Analysis, Evolution and Reengineering, 2017, pp. 382–386
work page 2017
-
[10]
Holistic recommender systems for software engineering,
L. Ponzanelli, “Holistic recommender systems for software engineering,” in Companion Proceedings of the International Conference on Software Engineering, 2014, pp. 686–689
work page 2014
-
[11]
Augmenting API documentation with insights from Stack Overflow,
C. Treude and M. P. Robillard, “Augmenting API documentation with insights from Stack Overflow,” in Proceedings of the International Conference on Software Engineering , 2016, pp. 392–403
work page 2016
-
[12]
Effective compiler error message enhancement for novice programming students,
B. A. Becker, G. Glanville, R. Iwashima, C. McDonnell, K. Goslin, and C. Mooney, “Effective compiler error message enhancement for novice programming students,” Computer Science Education , vol. 26, no. 2–3, pp. 148–175, 2016
work page 2016
-
[13]
Automatic query reformulations for text retrieval in soft- ware engineering,
S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia, and T. Menzies, “Automatic query reformulations for text retrieval in soft- ware engineering,” in Proceedings of the International Conference on Software Engineering, 2013, pp. 842–851
work page 2013
-
[14]
Query expansion via WordNet for effective code search,
M. Lu, X. Sun, S. Wang, D. Lo, and Y . Duan, “Query expansion via WordNet for effective code search,” in Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering , 2015, pp. 545–549
work page 2015
-
[15]
An empirical investigation into programming language syntax,
A. Stefik and S. Siebert, “An empirical investigation into programming language syntax,” ACM Transactions on Computing Education , vol. 13, no. 4, pp. 19:1–19:40, 2013
work page 2013
-
[16]
Using task context to improve pro- grammer productivity,
M. Kersten and G. C. Murphy, “Using task context to improve pro- grammer productivity,” in Proceedings of the International Symposium on Foundations of Software Engineering , 2006, pp. 1–11
work page 2006
-
[17]
Extracting development tasks to navigate software documentation,
C. Treude, M. P. Robillard, and B. Dagenais, “Extracting development tasks to navigate software documentation,” IEEE Transactions on Soft- ware Engineering, vol. 41, no. 6, pp. 565–581, 2015
work page 2015
-
[18]
Tasknav: Task- based navigation of software documentation,
C. Treude, M. Sicard, M. Klocke, and M. P. Robillard, “Tasknav: Task- based navigation of software documentation,” in Proceedings of the International Conference on Software Engineering - Volume 2 , 2015, pp. 649–652
work page 2015
-
[19]
Sewordsim: Software-specific word similarity database,
Y . Tian, D. Lo, and J. Lawall, “Sewordsim: Software-specific word similarity database,” in Companion Proceedings of the International Conference on Software Engineering , 2014, pp. 568–571
work page 2014
-
[20]
Online python tutor: Embeddable web-based program visu- alization for CS education,
P. J. Guo, “Online python tutor: Embeddable web-based program visu- alization for CS education,” in Proceeding of the Technical Symposium on Computer Science Education , 2013, pp. 579–584
work page 2013
-
[21]
Debugging with the crowd: a debug recommendation system based on Stackoverflow,
M. Monperrus and A. Maia, “Debugging with the crowd: a debug recommendation system based on Stackoverflow,” Universit ´e Lille 1 - Sciences et Technologies, Tech. Rep. hal-00987395, 2014
work page 2014
-
[22]
The automatic creation of literature abstracts,
H. P. Luhn, “The automatic creation of literature abstracts,” IBM Journal of Research and Development , vol. 2, no. 2, pp. 159–165, 1958
work page 1958
-
[23]
Compiler error messages: What can help novices?
M.-H. Nienaltowski, M. Pedroni, and B. Meyer, “Compiler error messages: What can help novices?” in Proceedings of the Technical Symposium on Computer Science Education , 2008, pp. 168–172
work page 2008
-
[24]
Automatic generation of natural language summaries for Java classes,
L. Moreno, J. Aponte, G. Sridhara, A. Marcus, L. Pollock, and K. Vijay- Shanker, “Automatic generation of natural language summaries for Java classes,” in Proceedings of the International Conference on Program Comprehension, 2013, pp. 23–32
work page 2013
-
[25]
A. Strauss and J. Corbin, Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd ed. Sage Publications, Inc., 1998
work page 1998
-
[26]
Grounded theory in software engineering research: A critical review and guidelines,
K.-J. Stol, P. Ralph, and B. Fitzgerald, “Grounded theory in software engineering research: A critical review and guidelines,” in Proceedings of the International Conference on Software Engineering, 2016, pp. 120– 131
work page 2016
-
[27]
P. Bazeley and K. Jackson, Qualitative data analysis with NVivo . Sage Publications Limited, 2013
work page 2013
-
[28]
Toxic code snippets on Stack Overflow,
C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto, “Toxic code snippets on Stack Overflow,” IEEE Transactions on Soft- ware Engineering, 2019, to appear
work page 2019
-
[29]
Patterns of knowledge in API reference documentation,
W. Maalej and M. P. Robillard, “Patterns of knowledge in API reference documentation,” IEEE Transactions on Software Engineering , vol. 39, no. 9, pp. 1264–1282, 2013
work page 2013
-
[30]
Crowd documen- tation: Exploring the coverage and the dynamics of API discussions on Stack Overflow,
C. Parnin, C. Treude, L. Grammel, and M.-A. Storey, “Crowd documen- tation: Exploring the coverage and the dynamics of API discussions on Stack Overflow,” Georgia Institute of Technology, Tech. Rep., 2012
work page 2012
-
[31]
Reviewing the quality of awareness support in collaborative applications,
P. Antunes, V . Herskovic, S. F. Ochoa, and J. A. Pino, “Reviewing the quality of awareness support in collaborative applications,” Journal of Systems and Software , vol. 89, no. C, pp. 146–169, 2014
work page 2014
-
[32]
T. Barik, J. Witschey, B. Johnson, and E. Murphy-Hill, “Compiler error notifications revisited: An interaction-first approach for helping developers more effectively comprehend and resolve error notifications,” in Companion Proceedings of the International Conference on Software Engineering, 2014, pp. 536–539
work page 2014
-
[33]
On novices’ interaction with compiler error messages: A human factors approach,
J. Prather, R. Pettit, K. H. McMurry, A. Peters, J. Homer, N. Simone, and M. Cohen, “On novices’ interaction with compiler error messages: A human factors approach,” in Proceedings of the Conference on International Computing Education Research , 2017, pp. 74–82
work page 2017
-
[34]
Usability measurement and metrics: A consolidated model,
A. Seffah, M. Donyaee, R. B. Kline, and H. K. Padda, “Usability measurement and metrics: A consolidated model,” Software Quality Journal, vol. 14, no. 2, pp. 159–178, 2006
work page 2006
-
[35]
Identifying and correcting Java programming errors for introductory computer science students,
M. Hristova, A. Misra, M. Rutter, and R. Mercuri, “Identifying and correcting Java programming errors for introductory computer science students,” in Proceedings of the Technical Symposium on Computer Science Education, 2003, pp. 153–156
work page 2003
-
[36]
Seahawk: Stack Overflow in the IDE,
L. Ponzanelli, A. Bacchelli, and M. Lanza, “Seahawk: Stack Overflow in the IDE,” in Proceedings of the International Conference on Software Engineering, 2013, pp. 1295–1298
work page 2013
-
[37]
Mining StackOverflow to turn the IDE into a self-confident program- ming prompter,
L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza, “Mining StackOverflow to turn the IDE into a self-confident program- ming prompter,” in Proceedings of the Working Conference on Mining Software Repositories, 2014, pp. 102–111
work page 2014
-
[38]
Context-based recommendation to support problem solving in software development,
J. Cordeiro, B. Antunes, and P. Gomes, “Context-based recommendation to support problem solving in software development,” in Proceedings of the International Workshop on Recommendation Systems for Software Engineering, 2012, pp. 85–89
work page 2012
-
[39]
Autocomment: Mining question and answer sites for automatic comment generation,
E. Wong, J. Yang, and L. Tan, “Autocomment: Mining question and answer sites for automatic comment generation,” in Proceedings of the International Conference on Automated Software Engineering, 2013, pp. 562–567
work page 2013
-
[40]
NLP2Code: Code snippet content assist via natural language tasks,
B. A. Campbell and C. Treude, “NLP2Code: Code snippet content assist via natural language tasks,” in Proceedings of the International Conference on Software Maintenance and Evolution, 2017, pp. 628–632
work page 2017
-
[41]
Bing developer assistant: Improving developer productivity by recom- mending sample code,
H. Zhang, A. Jain, G. Khandelwal, C. Kaushik, S. Ge, and W. Hu, “Bing developer assistant: Improving developer productivity by recom- mending sample code,” in Proceedings of the International Symposium on Foundations of Software Engineering , 2016, pp. 956–961
work page 2016
-
[42]
Understanding Stack Overflow code fragments,
C. Treude and M. P. Robillard, “Understanding Stack Overflow code fragments,” in Proceedings of the International Conference on Software Maintenance and Evolution , 2017, pp. 509–513
work page 2017
-
[43]
On the use of automated text summarization techniques for summarizing source code,
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, “On the use of automated text summarization techniques for summarizing source code,” in Proceedings of the Working Conference on Reverse Engineering , 2010, pp. 35–44
work page 2010
-
[44]
Automatic source code summa- rization of context for Java methods,
P. W. McBurney and C. McMillan, “Automatic source code summa- rization of context for Java methods,” IEEE Transactions on Software Engineering, vol. 42, no. 2, pp. 103–119, 2016
work page 2016
-
[45]
Automatically generating documentation for lambda expressions in Java,
A. Alqaimi, P. Thongtanunam, and C. Treude, “Automatically generating documentation for lambda expressions in Java,” in Proceedings of the International Conference on Mining Software Repositories , 2019, pp. 310–320
work page 2019
-
[46]
A. T. T. Ying and M. P. Robillard, “Code fragment summarization,” in Proceedings of the Joint Meeting on Foundations of Software Engineer- ing, 2013, pp. 655–658
work page 2013
-
[47]
Automatic documentation inference for exceptions,
R. P. Buse and W. R. Weimer, “Automatic documentation inference for exceptions,” in Proceedings of the International Symposium on Software Testing and Analysis, 2008, pp. 273–282
work page 2008
-
[48]
Generating natural language summaries for crosscutting source code concerns,
S. Rastkar, G. C. Murphy, and A. W. J. Bradley, “Generating natural language summaries for crosscutting source code concerns,” in Proceed- ings of the International Conference on Software Maintenance, 2011, pp. 103–112
work page 2011
-
[49]
Summarizing software arti- facts: A case study of bug reports,
S. Rastkar, G. C. Murphy, and G. Murray, “Summarizing software arti- facts: A case study of bug reports,” in Proceedings of the International Conference on Software Engineering - Volume 1 , 2010, pp. 505–514
work page 2010
-
[50]
Automatic summarization of bug reports,
——, “Automatic summarization of bug reports,” IEEE Transactions on Software Engineering, vol. 40, no. 4, pp. 366–380, 2014
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.