Identifying Algorithm Names in Code Comments

Arnon Rungsawang; Bundit Manaskasemsak; Hideaki Hata; Jakapong Klainongsuang; Kenichi Matsumoto; Pattara Leelaprute; Yusuf Sulistyo Nugroho

arxiv: 1907.04557 · v1 · pith:5NAIJWPNnew · submitted 2019-07-10 · 💻 cs.SE

Identifying Algorithm Names in Code Comments

Jakapong Klainongsuang , Yusuf Sulistyo Nugroho , Hideaki Hata , Bundit Manaskasemsak , Arnon Rungsawang , Pattara Leelaprute , Kenichi Matsumoto This is my paper

Pith reviewed 2026-05-24 23:50 UTC · model grok-4.3

classification 💻 cs.SE

keywords algorithm identificationcode commentsN-gramspart of speechopen source projectsrule based methodsoftware comments

0 comments

The pith

Algorithm names in code comments can be automatically extracted using N-grams ending with the word 'algorithm' and part-of-speech patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automatic method to identify algorithm names in code comments. Developers frequently mention the algorithms they use in comments, which could supply data for machine learning tasks like generating API sequences or comments. The method extracts N-gram phrases ending with 'algorithm' and applies rules based on part-of-speech patterns to select appropriate names. Evaluation shows these rules reach precision and recall above 0.70. The rules are then used on comments from active open-source projects in seven languages to find commonly mentioned algorithms.

Core claim

The paper claims that N-grams ending with 'algorithm' combined with part-of-speech patterns produce rules that identify algorithm names in code comments with high precision and recall, allowing extraction from large comment collections in C, C++, Java, JavaScript, Python, PHP, and Ruby.

What carries the argument

N-gram words containing 'algorithm' as the final word, refined by part-of-speech patterns to form identification rules.

If this is right

Code comments become a source of labeled data for machine learning in software engineering.
Commonly used algorithms can be listed from real projects across multiple languages.
The approach works on a large scale without needing manual labeling for each project.
Similar techniques might identify other specific terms in comments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the rules generalize, they could support better training of models for code documentation.
Analysis of extracted names might show differences in algorithm usage by language or project type.
The list of names could aid in developing algorithm recommendation tools for developers.

Load-bearing premise

The assumption that N-grams ending with 'algorithm' and POS patterns will capture algorithm names reliably without many false positives or misses across varied code.

What would settle it

Finding that manual inspection of extracted names from additional projects yields precision or recall below 0.70.

Figures

Figures reproduced from arXiv: 1907.04557 by Arnon Rungsawang, Bundit Manaskasemsak, Hideaki Hata, Jakapong Klainongsuang, Kenichi Matsumoto, Pattara Leelaprute, Yusuf Sulistyo Nugroho.

read the original abstract

For recent machine-learning-based tasks like API sequence generation, comment generation, and document generation, large amount of data is needed. When software developers implement algorithms in code, we find that they often mention algorithm names in code comments. Code annotated with such algorithm names can be valuable data sources. In this paper, we propose an automatic method of algorithm name identification. The key idea is extracting important N-gram words containing the word `algorithm' in the last. We also consider part of speech patterns to derive rules for appropriate algorithm name identification. The result of our rule evaluation produced high precision and recall values (more than 0.70). We apply our rules to extract algorithm names in a large amount of comments from active FLOSS projects written in seven programming languages, C, C++, Java, JavaScript, Python, PHP, and Ruby, and report commonly mentioned algorithm names in code comments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A simple N-gram plus POS rule set for algorithm names in comments that might help dataset curation, but the evaluation looks under-specified and risks overstated generalization.

read the letter

The paper's core contribution is a small set of hand-crafted rules: pull N-grams that end with the word 'algorithm', then filter them with part-of-speech patterns to decide whether the preceding words name an actual algorithm. They ran the rules on comments from active projects in seven languages and listed the most frequent names that surfaced. That part is straightforward and could be useful if you need quick labels for ML4SE datasets on API sequences or comment generation. They also report precision and recall above 0.70 on their rule evaluation, which is the headline number. The work is narrow by design and does not claim to solve broader open problems in mining or NLP for code. What it does is deliver one targeted heuristic and show the output on real FLOSS data. The soft spot is exactly where the stress-test note points: the description leaves unclear whether the rules were developed and tuned on the same comments later used for the precision/recall numbers, or whether any held-out or cross-project split was used. Without that, the claim that the method works reliably across coding styles and the seven languages rests on thin evidence. No baselines are mentioned in the abstract, and there is no indication of released code or data that would let someone reproduce the extraction. The citation pattern is light and mostly points to prior mining work rather than deep NLP literature. This paper is for readers who build datasets from GitHub comments and want a quick filter rather than a general method. It is coherent on its own terms and shows honest engagement with a practical task, so it clears the bar for serious refereeing even if the evaluation section needs strengthening. I would bring it to a reading group only if someone is specifically looking for rule-based mining examples. I would not cite it myself in the next year unless the full methods show proper independent testing.

Referee Report

3 major / 2 minor

Summary. The manuscript describes a rule-based method for identifying algorithm names in source code comments. The approach extracts N-grams that end with the word 'algorithm' and filters them using part-of-speech patterns. The authors report that their rules achieve precision and recall values exceeding 0.70 on an evaluation set. They then apply the rules to comments from active free/libre open source software (FLOSS) projects in seven languages (C, C++, Java, JavaScript, Python, PHP, Ruby) and present commonly mentioned algorithm names.

Significance. If the evaluation methodology is sound, the work could support creation of large annotated datasets for downstream ML tasks in software engineering such as comment generation and API sequence prediction. The multi-language application to real FLOSS projects is a positive aspect of the large-scale extraction step.

major comments (3)

[Evaluation] Evaluation section: The manuscript reports precision and recall >0.70 but provides no information on the size or construction of the labeled evaluation set, nor whether this set was held out from the data used to derive the N-gram and POS rules. This is load-bearing for the generalization claim across seven languages and coding styles.
[Large-scale extraction] Large-scale extraction section: No manual validation, sampling, or error analysis is reported for the algorithm names extracted from the FLOSS comment corpus. Without this, the claim that the rules reliably identify algorithm names in active projects cannot be assessed.
[Method and evaluation] Method and evaluation sections: The paper presents no baseline comparisons (e.g., keyword matching without POS filtering or simple frequency-based extraction) against which the contribution of the POS patterns can be measured.

minor comments (2)

[Abstract] Abstract: The phrase 'high precision and recall values (more than 0.70)' should be replaced with the exact measured values and the size of the evaluation set for clarity.
The manuscript would benefit from a short related-work subsection situating the heuristic against prior NLP techniques applied to code comments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify areas where additional detail would strengthen the paper. We address each point below and will revise the manuscript to incorporate the requested information.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The manuscript reports precision and recall >0.70 but provides no information on the size or construction of the labeled evaluation set, nor whether this set was held out from the data used to derive the N-gram and POS rules. This is load-bearing for the generalization claim across seven languages and coding styles.

Authors: We agree that the Evaluation section lacks necessary details. In the revised manuscript we will add a description of the labeled set size, its construction via manual annotation of N-grams from code comments, and explicit confirmation that the evaluation instances were held out from the development data used to formulate the N-gram and POS rules. revision: yes
Referee: [Large-scale extraction] Large-scale extraction section: No manual validation, sampling, or error analysis is reported for the algorithm names extracted from the FLOSS comment corpus. Without this, the claim that the rules reliably identify algorithm names in active projects cannot be assessed.

Authors: We acknowledge the absence of validation for the large-scale results. We will add a new subsection reporting a manual sampling and error analysis of extracted names from the FLOSS corpus (e.g., precision on a random sample of 100 instances) to support the reliability claim. revision: yes
Referee: [Method and evaluation] Method and evaluation sections: The paper presents no baseline comparisons (e.g., keyword matching without POS filtering or simple frequency-based extraction) against which the contribution of the POS patterns can be measured.

Authors: We agree that baselines would clarify the value of the POS patterns. In the revision we will add a comparison subsection evaluating a keyword-only baseline (N-grams ending in 'algorithm' without POS filtering) and a frequency-based extraction method on the same evaluation set, reporting their precision and recall. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper proposes a heuristic rule set that extracts N-grams ending in the word 'algorithm' and filters them via POS patterns. These rules are evaluated on external comment data from FLOSS projects to report precision and recall above 0.70, then applied to a larger corpus across seven languages. No equations, fitted parameters, self-citations, or self-definitional steps appear in the provided description. The central claim rests on direct evaluation against held-out or external data rather than any reduction of outputs to inputs by construction, so the derivation is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard NLP assumptions about the utility of N-grams and POS tagging for extracting technical terms from comments. No free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Part-of-speech taggers produce reliable tags on code comments
The method derives identification rules from POS patterns.

pith-pipeline@v0.9.0 · 5709 in / 1166 out tokens · 45449 ms · 2026-05-24T23:50:23.866868+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

X. Gu, H. Zhang, D. Zhang, S. Kim, Deep api learning, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, ACM, New York, NY, USA, 2016, pp. 631–642. 8

work page 2016
[2]

Jiang, A

S. Jiang, A. Armaly, C. McMillan, Automatically generating commit messages from diﬀs using neural machine translation, in: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, IEEE Press, Piscataway, NJ, USA, 2017, pp. 135–146

work page 2017
[3]

Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura, Learning to generate pseudo-code from source code using statistical machine translation (t), in: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Soft- ware Engineering (ASE), ASE ’15, IEEE Computer Society, Washington, DC, USA, 2015, pp. 574–584

work page 2015
[4]

E. Wong, J. Yang, L. Tan, Autocomment: Mining question and answer sites for au- tomatic comment generation, in: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 562–567

work page 2013
[5]

Takata, A

D. Takata, A. Alhefdhi, M. Rungroj, H. Hata, H. K. Dam, T. Ishio, K. Matsumoto, Catalogen: Generating catalogs of code examples collected from oss, in: 2018 IEEE Third International Workshop on Dynamic Software Documentation (DySDoc3), pp. 11–12

work page 2018
[6]

P. Yin, B. Deng, E. Chen, B. Vasilescu, G. Neubig, Learning to mine aligned code and natural language pairs from stack overﬂow, in: Proceedings of the 15th International Conference on Mining Software Repositories, MSR ’18, ACM, New York, NY, USA, 2018, pp. 476–486

work page 2018
[7]

D. R. Smith, M. R. Lowry, Algorithm theories and design tactics, Science of Computer Programming 14 (1990) 305 – 321

work page 1990
[8]

Terdchanakul, H

P. Terdchanakul, H. Hata, P. Phannachitta, K. Matsumoto, Bug or not? bug report classiﬁcation using n-gram idf, in: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 534–538

work page 2017
[9]

F. N. A. A. Omran, C. Treude, Choosing an nlp library for analyzing software docu- mentation: A systematic literature review and a series of experiments, in: Proceedings of the 14th International Conference on Mining Software Repositories, MSR ’17, IEEE Press, Piscataway, NJ, USA, 2017, pp. 187–197

work page 2017
[10]

Shirakawa, T

M. Shirakawa, T. Hara, S. Nishio, N-gram idf: A global term weighting scheme based on information distance, in: Proceedings of the 24th International Conference on World Wide Web, WWW ’15, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2015, pp. 960–970

work page 2015
[11]

Shirakawa, T

M. Shirakawa, T. Hara, S. Nishio, Idf for word n-grams, ACM Trans. Inf. Syst. 36 (2017) 5:1–5:38

work page 2017
[12]

R. Ojha, Top 7 algorithms and data structures every programmer should know about, https://www.hackerearth.com/blog/algorithms/top-7-algorithms-data- structures-every-programmer-know/, 2015. 9

work page 2015
[13]

Quora, What are the top 10 algorithms every software engineer should know by heart?, https://www.quora.com/What-are-the-top-10-algorithms-every-software- engineer-should-know-by-heart, 2016

work page 2016
[14]

Ojha, Top 10 algorithms every software engineer should know by heart, https://www.freelancinggig.com/blog/2017/05/09/top-10-algorithms-every-software- engineer-know-heart/, 2017

R. Ojha, Top 10 algorithms every software engineer should know by heart, https://www.freelancinggig.com/blog/2017/05/09/top-10-algorithms-every-software- engineer-know-heart/, 2017

work page 2017
[15]

Tenny, Program readability: procedures versus comments, IEEE Transactions on Software Engineering 14 (1988) 1271–1279

T. Tenny, Program readability: procedures versus comments, IEEE Transactions on Software Engineering 14 (1988) 1271–1279

work page 1988
[16]

S. N. Woodﬁeld, H. E. Dunsmore, V. Y. Shen, The eﬀect of modularization and com- ments on program comprehension, in: Proceedings of the 5th International Conference on Software Engineering, ICSE ’81, IEEE Press, Piscataway, NJ, USA, 1981, pp. 215– 223

work page 1981
[17]

X. Hu, G. Li, X. Xia, D. Lo, Z. Jin, Deep code comment generation, in: Proceedings of the 26th Conference on Program Comprehension, ICPC ’18, ACM, New York, NY, USA, 2018, pp. 200–210

work page 2018
[18]

Steidl, B

D. Steidl, B. Hummel, E. Juergens, Quality analysis of source code comments, in: 2013 21st International Conference on Program Comprehension (ICPC), pp. 83–92. 10

work page 2013

[1] [1]

X. Gu, H. Zhang, D. Zhang, S. Kim, Deep api learning, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, ACM, New York, NY, USA, 2016, pp. 631–642. 8

work page 2016

[2] [2]

Jiang, A

S. Jiang, A. Armaly, C. McMillan, Automatically generating commit messages from diﬀs using neural machine translation, in: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, IEEE Press, Piscataway, NJ, USA, 2017, pp. 135–146

work page 2017

[3] [3]

Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura, Learning to generate pseudo-code from source code using statistical machine translation (t), in: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Soft- ware Engineering (ASE), ASE ’15, IEEE Computer Society, Washington, DC, USA, 2015, pp. 574–584

work page 2015

[4] [4]

E. Wong, J. Yang, L. Tan, Autocomment: Mining question and answer sites for au- tomatic comment generation, in: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 562–567

work page 2013

[5] [5]

Takata, A

D. Takata, A. Alhefdhi, M. Rungroj, H. Hata, H. K. Dam, T. Ishio, K. Matsumoto, Catalogen: Generating catalogs of code examples collected from oss, in: 2018 IEEE Third International Workshop on Dynamic Software Documentation (DySDoc3), pp. 11–12

work page 2018

[6] [6]

P. Yin, B. Deng, E. Chen, B. Vasilescu, G. Neubig, Learning to mine aligned code and natural language pairs from stack overﬂow, in: Proceedings of the 15th International Conference on Mining Software Repositories, MSR ’18, ACM, New York, NY, USA, 2018, pp. 476–486

work page 2018

[7] [7]

D. R. Smith, M. R. Lowry, Algorithm theories and design tactics, Science of Computer Programming 14 (1990) 305 – 321

work page 1990

[8] [8]

Terdchanakul, H

P. Terdchanakul, H. Hata, P. Phannachitta, K. Matsumoto, Bug or not? bug report classiﬁcation using n-gram idf, in: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 534–538

work page 2017

[9] [9]

F. N. A. A. Omran, C. Treude, Choosing an nlp library for analyzing software docu- mentation: A systematic literature review and a series of experiments, in: Proceedings of the 14th International Conference on Mining Software Repositories, MSR ’17, IEEE Press, Piscataway, NJ, USA, 2017, pp. 187–197

work page 2017

[10] [10]

Shirakawa, T

M. Shirakawa, T. Hara, S. Nishio, N-gram idf: A global term weighting scheme based on information distance, in: Proceedings of the 24th International Conference on World Wide Web, WWW ’15, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2015, pp. 960–970

work page 2015

[11] [11]

Shirakawa, T

M. Shirakawa, T. Hara, S. Nishio, Idf for word n-grams, ACM Trans. Inf. Syst. 36 (2017) 5:1–5:38

work page 2017

[12] [12]

R. Ojha, Top 7 algorithms and data structures every programmer should know about, https://www.hackerearth.com/blog/algorithms/top-7-algorithms-data- structures-every-programmer-know/, 2015. 9

work page 2015

[13] [13]

Quora, What are the top 10 algorithms every software engineer should know by heart?, https://www.quora.com/What-are-the-top-10-algorithms-every-software- engineer-should-know-by-heart, 2016

work page 2016

[14] [14]

Ojha, Top 10 algorithms every software engineer should know by heart, https://www.freelancinggig.com/blog/2017/05/09/top-10-algorithms-every-software- engineer-know-heart/, 2017

R. Ojha, Top 10 algorithms every software engineer should know by heart, https://www.freelancinggig.com/blog/2017/05/09/top-10-algorithms-every-software- engineer-know-heart/, 2017

work page 2017

[15] [15]

Tenny, Program readability: procedures versus comments, IEEE Transactions on Software Engineering 14 (1988) 1271–1279

T. Tenny, Program readability: procedures versus comments, IEEE Transactions on Software Engineering 14 (1988) 1271–1279

work page 1988

[16] [16]

S. N. Woodﬁeld, H. E. Dunsmore, V. Y. Shen, The eﬀect of modularization and com- ments on program comprehension, in: Proceedings of the 5th International Conference on Software Engineering, ICSE ’81, IEEE Press, Piscataway, NJ, USA, 1981, pp. 215– 223

work page 1981

[17] [17]

X. Hu, G. Li, X. Xia, D. Lo, Z. Jin, Deep code comment generation, in: Proceedings of the 26th Conference on Program Comprehension, ICPC ’18, ACM, New York, NY, USA, 2018, pp. 200–210

work page 2018

[18] [18]

Steidl, B

D. Steidl, B. Hummel, E. Juergens, Quality analysis of source code comments, in: 2013 21st International Conference on Program Comprehension (ICPC), pp. 83–92. 10

work page 2013