Deterministic vs. Probabilistic Summarisation: An Empirical Trade-off Study in Design Pattern Centric Java Code

Christoph Treude; Najam Nazar

arxiv: 2605.21943 · v1 · pith:6C5L3GLJnew · submitted 2026-05-21 · 💻 cs.SE

Deterministic vs. Probabilistic Summarisation: An Empirical Trade-off Study in Design Pattern Centric Java Code

Najam Nazar , Christoph Treude This is my paper

Pith reviewed 2026-05-22 05:14 UTC · model grok-4.3

classification 💻 cs.SE

keywords code summarizationdeterministic methodsprobabilistic methodslarge language modelsdesign patternsJava codeempirical studytrade-off analysis

0 comments

The pith

Probabilistic code summarizers deliver richer semantic context than deterministic ones, which produce shorter and fully reproducible outputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts a controlled comparison of deterministic heuristic pipelines and probabilistic LLM pipelines for summarizing design-pattern-centric Java code. It tests three approaches on 150 files from three open-source projects covering nine patterns, measuring performance with BERTScore, cosine similarity, and an LLM judge scoring accuracy, conciseness, adequacy, code-context awareness, and design-pattern fidelity. The results identify a clear trade-off: probabilistic outputs align better with human references on meaning and context, while deterministic outputs remain shorter and identical across runs. This matters for developers choosing summarization tools because it clarifies when depth or consistency should take priority. Statistical tests and sensitivity checks support the observed differences while noting variability in the LLM results.

Core claim

The study shows that probabilistic summaries generated by the Mixtral LLM achieve stronger semantic alignment and broader contextual coverage than those from a rule-based NLG pipeline or a SWUM-based approach. Deterministic methods, by contrast, consistently produce more concise and exactly reproducible summaries. These differences hold across automated similarity metrics and rubric judgments, with relative trends remaining stable despite prompt sensitivity in the probabilistic case.

What carries the argument

The empirical comparison of a rule-based natural language generation pipeline, a Software Word Usage Model approach, and an LLM pipeline using Mixtral on 150 design-pattern Java files, evaluated through BERTScore, cosine similarity, and five-dimension rubric scoring by Llama 3.

If this is right

Teams prioritizing semantic depth and contextual accuracy in code summaries should favor probabilistic LLM pipelines.
Teams needing brief, identical outputs across runs should select deterministic rule-based or SWUM pipelines.
LLM-based summarization requires multiple runs or prompt tuning to manage output variability.
The identified trade-off supplies concrete selection rules for choosing summarization techniques in documentation tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid pipelines that combine deterministic brevity with selective probabilistic enrichment could mitigate the observed limitations of each approach.
Repeating the comparison on non-Java languages or non-design-pattern code would test whether the trade-off generalizes beyond the current testbed.
Replacing the LLM judge with human evaluators on the same summaries would provide a direct check on the reliability of the rubric proxy used here.

Load-bearing premise

That rubric scores assigned by Llama 3 across accuracy, conciseness, adequacy, code-context awareness, and design-pattern fidelity serve as a reliable stand-in for human expert judgments of summary quality.

What would settle it

Direct human ratings in which experts assign higher accuracy or context scores to the deterministic summaries than to the probabilistic ones would undermine the claim that probabilistic methods hold an advantage in semantic depth.

read the original abstract

Background: Automated code summarisation supports program comprehension and documentation, yet the relative strengths and limitations of deterministic (heuristic-based) and probabilistic (LLM-based) pipelines remain unclear. Aims: This paper presents a controlled empirical comparison of these paradigms for intent-oriented design-pattern code summarisation. Method: Using design-pattern-centric Java code as a structured testbed (150 files from three open-source repositories covering nine patterns), we compare a rule-based natural language generation (NLG) pipeline, a Software Word Usage Model (SWUM)-based approach, and a probabilistic pipeline based on the Mixtral LLM. Summaries are evaluated against human references using BERTScore and cosine similarity, complemented by rubric-based judgements produced by Llama 3 across five dimensions: accuracy, conciseness, adequacy, code-context awareness, and design-pattern fidelity. Statistical analysis includes Wilcoxon signed-rank tests (with effect sizes), Friedman tests with post-hoc corrections, and Spearman correlation for sensitivity analysis of rubric consistency. Results: Probabilistic summaries show stronger semantic alignment and richer contextual coverage, while deterministic approaches produce more concise and fully reproducible outputs. Prompt-sensitivity and multi-run analyses indicate variability in LLM outputs, though relative trends remain stable. Conclusions: A clear trade-off emerges: probabilistic methods favour semantic depth and contextual accuracy, whereas deterministic pipelines are preferable for brevity and reproducibility. These findings provide practical guidance for selecting code summarisation techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper runs a controlled comparison of rule-based and LLM code summarizers on design-pattern Java files and surfaces the expected depth-versus-reproducibility trade-off, but its quality judgments rest on an unvalidated LLM rubric.

read the letter

The main takeaway is a straightforward empirical head-to-head on 150 design-pattern Java files: probabilistic summaries from Mixtral score higher on semantic alignment and context, while the deterministic pipelines (rule-based NLG and SWUM) win on brevity and perfect reproducibility. They back this with BERTScore, cosine similarity, and Llama 3 rubric scores across accuracy, conciseness, adequacy, code-context awareness, and design-pattern fidelity, plus Wilcoxon, Friedman, and Spearman tests. Prompt-sensitivity checks are included and the relative trends hold.

Referee Report

2 major / 2 minor

Summary. The paper conducts a controlled empirical comparison of deterministic (rule-based NLG and SWUM-based) versus probabilistic (Mixtral LLM) pipelines for intent-oriented summarization of design-pattern-centric Java code. It uses a corpus of 150 files from three open-source repositories covering nine patterns, evaluating summaries against human references via BERTScore and cosine similarity, supplemented by Llama 3 rubric scores on five dimensions (accuracy, conciseness, adequacy, code-context awareness, design-pattern fidelity). Statistical tests include Wilcoxon signed-rank (with effect sizes), Friedman with post-hoc corrections, and Spearman correlation. The central claim is a clear trade-off: probabilistic methods favor semantic depth and contextual accuracy, while deterministic pipelines are preferable for brevity and reproducibility.

Significance. If the results hold after addressing validation gaps, the work offers practical guidance for selecting summarization techniques in software engineering contexts such as documentation and program comprehension. The structured testbed using design patterns is a methodological strength, as is the combination of automatic metrics with rubric evaluation and the examination of LLM output variability. These elements could inform tool selection where reproducibility or semantic richness is prioritized.

major comments (2)

[Evaluation section (rubric-based judgements)] Evaluation section (rubric-based judgements): The manuscript relies on Llama 3 rubric scores across the five dimensions as a primary proxy for human quality judgments to support the trade-off claim, but reports no human-Llama 3 correlation study, inter-rater agreement metrics, or calibration against human coders on the 150-file corpus. This is load-bearing because one comparator is itself an LLM (Mixtral), raising the risk that any systematic bias in Llama 3 could inflate the apparent advantage of the probabilistic arm without reflecting genuine summary quality.
[Results section (statistical reporting)] Results section (statistical reporting): The abstract and results describe stronger semantic alignment for probabilistic summaries via BERTScore/cosine and rubric scores, yet the manuscript does not report raw metric values, exact p-values, or effect sizes from the Wilcoxon signed-rank and Friedman tests for each dimension. Without these, the practical magnitude and reliability of the claimed trade-off cannot be fully assessed.

minor comments (2)

[Abstract] The abstract refers to 'three open-source repositories' without naming them; adding the repository identifiers would enhance reproducibility.
[Throughout] Acronyms such as SWUM and NLG appear without consistent initial expansion on first use in all sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. The comments highlight important aspects of evaluation validity and statistical transparency that we will address in the revision to strengthen the work.

read point-by-point responses

Referee: Evaluation section (rubric-based judgements): The manuscript relies on Llama 3 rubric scores across the five dimensions as a primary proxy for human quality judgments to support the trade-off claim, but reports no human-Llama 3 correlation study, inter-rater agreement metrics, or calibration against human coders on the 150-file corpus. This is load-bearing because one comparator is itself an LLM (Mixtral), raising the risk that any systematic bias in Llama 3 could inflate the apparent advantage of the probabilistic arm without reflecting genuine summary quality.

Authors: We agree that the absence of direct validation between Llama 3 rubric scores and human judgments represents a limitation, especially when evaluating outputs from another LLM. Our design used Llama 3 to enable scalable, consistent assessment across the full corpus where full human annotation would be resource-intensive. In the revised manuscript we will add a calibration subsection reporting results from a human study on a stratified subset of 30 files (balanced across patterns and repositories). Human raters will apply the same five-dimension rubric, and we will report Spearman correlations, mean absolute differences, and any inter-rater agreement statistics between Llama 3 and human scores. This addition will allow readers to assess the degree of alignment and any potential bias. revision: yes
Referee: Results section (statistical reporting): The abstract and results describe stronger semantic alignment for probabilistic summaries via BERTScore/cosine and rubric scores, yet the manuscript does not report raw metric values, exact p-values, or effect sizes from the Wilcoxon signed-rank and Friedman tests for each dimension. Without these, the practical magnitude and reliability of the claimed trade-off cannot be fully assessed.

Authors: We accept that fuller numerical reporting is necessary for readers to judge effect magnitude and statistical reliability. The revised Results section will include supplementary tables presenting (i) mean and standard deviation for every automatic metric and rubric dimension, (ii) exact (Bonferroni-corrected) p-values, and (iii) effect sizes (rank-biserial correlation for Wilcoxon tests and Kendall’s W for Friedman tests) for all pairwise and omnibus comparisons. These values will be cross-referenced in the text so that the practical significance of the reported trade-offs can be directly evaluated. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison rests on external human references and standard metrics

full rationale

The paper performs a direct empirical comparison of three distinct summarization pipelines (rule-based NLG, SWUM-based, Mixtral LLM) on a fixed corpus of 150 Java files. Results are derived from BERTScore and cosine similarity against human-written references, plus separate Llama 3 rubric scores on five dimensions, followed by standard non-parametric statistical tests (Wilcoxon, Friedman, Spearman). None of these steps involve self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central trade-off claim to its own inputs by construction. The evaluation chain is anchored in external benchmarks (human references and established similarity metrics) rather than internal re-use of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of standard statistical test assumptions and the representativeness of the 150-file testbed without introducing new free parameters, axioms beyond domain standards, or invented entities.

axioms (1)

standard math Wilcoxon signed-rank test and Friedman test assumptions hold for the paired summary scores and multi-method comparisons.
Invoked for statistical analysis of differences between deterministic and probabilistic summaries.

pith-pipeline@v0.9.0 · 5783 in / 1383 out tokens · 69695 ms · 2026-05-22T05:14:48.924058+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

, title =

Sridhara, Giriprasad and Hill, Emily and Muppaneni, Divya and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year =

work page
[2]

, title =

Sridhara, Giriprasad and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 33rd International Conference on Software Engineering (ICSE) , year =

work page
[3]

, title =

Moreno, Laura and Aponte, Jairo and Sridhara, Giriprasad and Marcus, Andrian and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 21st IEEE International Conference on Program Comprehension (ICPC) , year =

work page
[4]

, title =

Malik, Sara and Hill, Emily and Pollock, Lori and Vijay-Shanker, K. , title =. 2009 , type =

work page 2009
[5]

, title =

Hill, Emily and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 31st International Conference on Software Engineering (ICSE) , year =

work page
[6]

and Hill, Emily and Sridhara, Giriprasad and Shepherd, David , title =

Pollock, Lori and Vijay-Shanker, K. and Hill, Emily and Sridhara, Giriprasad and Shepherd, David , title =. Software Engineering -- International Summer Schools, ISSSE 2009--2011 , series =. 2013 , pages =

work page 2009
[7]

and McMillan, Collin , title =

McBurney, Paul W. and McMillan, Collin , title =. Proceedings of the 22nd IEEE International Conference on Program Comprehension (ICPC) , year =

work page
[8]

and Bosch, Nathaniel and D'Mello, Sidney , title =

Rodeghero, Paige and McMillan, Collin and McBurney, Paul W. and Bosch, Nathaniel and D'Mello, Sidney , title =. Proceedings of the 36th International Conference on Software Engineering (ICSE) , year =

work page
[9]

and McMillan, Collin , title =

Rodeghero, Paige and Liu, Christopher and McBurney, Paul W. and McMillan, Collin , title =. IEEE Transactions on Software Engineering , volume =

work page
[10]

and McMillan, Collin , title =

McBurney, Paul W. and McMillan, Collin , title =. IEEE Transactions on Software Engineering , volume =

work page
[11]

and Bansal, A

Karas, Z. and Bansal, A. and Zhang, Y. and Li, T. and McMillan, C. and Huang, Y. , title =. ACM Transactions on Software Engineering and Methodology (TOSEM) , year =

work page
[12]

IEEE Trans

Wallace, Robert and Bansal, Aakash and Karas, Zachary and Tang, Ningzhi and Huang, Yu and Jia-Jun Li, Toby and McMillan, Collin , title =. IEEE Trans. Softw. Eng. , year =

work page
[13]

and Wang, J

Zhang, C. and Wang, J. and Zhou, Q. and Xu, T. and Tang, K. and Gui, H. and Liu, F. , title =. Symmetry , volume =. 2022 , doi =

work page 2022
[14]

and Pan, M

Zhu, Y. and Pan, M. , title =. 2019 , archivePrefix =

work page 2019
[15]

and Hou, X

Zhang, X. and Hou, X. and Qiao, X. and Song, W. , title =. Empirical Software Engineering , volume =. 2024 , doi =

work page 2024
[16]

McBurney, P. W. and McMillan, C. , title =. Empirical Software Engineering , volume =. 2016 , doi =

work page 2016
[17]

and Konstas, I

Iyer, S. and Konstas, I. and Cheung, A. and Zettlemoyer, L. , title =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

work page
[18]

Ahmad, W. U. and Chakraborty, S. and Ray, B. and Chang, K.-W. , title =. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =. doi:10.18653/v1/2021.naacl-main.211" , pages =

work page doi:10.18653/v1/2021.naacl-main.211 2021
[19]

and Wang, W

Wang, Y. and Wang, W. and Joty, S. and Hoi, S. C. H. , title =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

work page 2021
[20]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Wang, Y. and Le, H. and Gotmare, A. D. and Bui, N. D. Q. and Li, J. and Hoi, S. C. H. , title =. 2023 , booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", pages =

work page 2023
[21]

and Lu, S

Guo, D. and Lu, S. and Duan, N. and Wang, Y. and Zhou, M. and Yin, J. , title =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , address =. 2022 , pages =

work page 2022
[22]

and Allamanis, M

Husain, H. and Allamanis, M. and Wu, H.-H. and Brockschmidt, M. and Gazit, T. , title =. 2019 , archivePrefix =

work page 2019
[23]

Proceedings of the International Conference on Software Engineering (ICSE) , year =

Sun, Weisong and Miao, Yun and Li, Yuekang and Zhang, Hongyu and Fang, Chunrong and Liu, Yi and Deng, Gelei and Liu, Yang and Chen, Zhenyu , title =. Proceedings of the International Conference on Software Engineering (ICSE) , year =

work page
[24]

and Iter, D

Liu, Y. and Iter, D. and Xu, Y. and Wang, S. and Xu, R. and Zhu, C. , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023 , doi =

work page 2023
[25]

, title =

Graham, Y. , title =. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , address =. 2015 , doi =

work page 2015
[26]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , articleno =

Mastropaolo, Antonio and Ciniselli, Matteo and Di Penta, Massimiliano and Bavota, Gabriele , title =. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , articleno =

work page
[27]

2024 , url =

Evaluating the Performance of. 2024 , url =

work page 2024
[28]

Automated Software Engineering , volume =

Su, Chia-Yi and McMillan, Collin , title =. Automated Software Engineering , volume =. 2024 , doi =

work page 2024
[29]

Journal of Systems and Software , volume =

Nazar, Najam and Aleti, Aldeida and Zheng, Yaokun , title =. Journal of Systems and Software , volume =

work page
[30]

Journal of Systems and Software , volume =

Arcelli Fontana, Francesca and Zanoni, Marco and Stella, Fabio , title =. Journal of Systems and Software , volume =. 2015 , doi =

work page 2015
[31]

A tool for design pattern detection and software architecture reconstruction , journal =

Francesca. A tool for design pattern detection and software architecture reconstruction , journal =. 2011 , doi =

work page 2011
[32]

and Hasheminejad, S

Yarahmadi, H. and Hasheminejad, S. M. H. , title =. Artificial Intelligence Review , volume =. 2020 , doi =

work page 2020
[33]

Proceedings of the 29th Conference on Pattern Languages of Programs , articleno =

Moreira, Rodrigo and Fernandes, Eduardo and Figueiredo, Eduardo , title =. Proceedings of the 29th Conference on Pattern Languages of Programs , articleno =

work page
[34]

2012 16th European Conference on Software Maintenance and Reengineering , year =

Fontana, Francesca Arcelli and Caracciolo, Andrea and Zanoni, Marco , title =. 2012 16th European Conference on Software Maintenance and Reengineering , year =

work page 2012
[35]

and Zhao, T

Alnusair, A. and Zhao, T. and Yan, G. , title =. International Journal on Software Tools for Technology Transfer (STTT) , volume =. 2014 , pages =. doi:10.1007/s10009-013-0292-z

work page doi:10.1007/s10009-013-0292-z 2014
[36]

Empirical Software Engineering , volume =

Nazar, Najam and Sikka, Sameer and Treude, Christoph , title =. Empirical Software Engineering , volume =. 2026 , doi =

work page 2026
[37]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year =

Kamal Eddine, Moussa and Shang, Guokan and Tixier, Antoine and Vazirgiannis, Michalis , title =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year =. doi:10.18653/v1/2022.acl-long.93 , pages =

work page doi:10.18653/v1/2022.acl-long.93 2022
[38]

Empirical Software Engineering , volume =

Pandey, Sushant Kumar and Chand, Sivajeet and Horkoff, Jennifer and Staron, Miroslaw and Ochodek, Miroslaw and Durisic, Darko , title =. Empirical Software Engineering , volume =. 2025 , doi =

work page 2025
[39]

2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) , year =

Pan, Zhenyu and Song, Xuefeng and Wang, Yunkun and Cao, Rongyu and Li, Binhua and Li, Yongbin and Liu, Han , title =. 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) , year =

work page 2025
[40]

2024 , url =

Cheaper, Better, Faster, Stronger --- Mixtral 8. 2024 , url =

work page 2024
[41]

2024 , url =

Mixtral-8. 2024 , url =

work page 2024
[42]

2024 , url =

Mixtral 8. 2024 , url =

work page 2024
[43]

Jiang, Albert Q. and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bou Hanna, Emma and Bressand, Florian and Lengyel, Gianna and Bour, Guillaume and Lample, Guillaume and Lavaud, L. Mixtral of Experts , year =

work page
[44]

International Journal of Software Engineering and Knowledge Engineering , volume =

Nazar, Najam and Chen, Norman and Chong, Chun Yong , title =. International Journal of Software Engineering and Knowledge Engineering , volume =. 2023 , doi =

work page 2023
[45]

Proceedings of the Seventeenth International Conference on Pervasive Patterns and Applications (PATTERNS 2025) , year =

Schindler, Christian and Rausch, Andreas , title =. Proceedings of the Seventeenth International Conference on Pervasive Patterns and Applications (PATTERNS 2025) , year =

work page 2025

[1] [1]

, title =

Sridhara, Giriprasad and Hill, Emily and Muppaneni, Divya and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year =

work page

[2] [2]

, title =

Sridhara, Giriprasad and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 33rd International Conference on Software Engineering (ICSE) , year =

work page

[3] [3]

, title =

Moreno, Laura and Aponte, Jairo and Sridhara, Giriprasad and Marcus, Andrian and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 21st IEEE International Conference on Program Comprehension (ICPC) , year =

work page

[4] [4]

, title =

Malik, Sara and Hill, Emily and Pollock, Lori and Vijay-Shanker, K. , title =. 2009 , type =

work page 2009

[5] [5]

, title =

Hill, Emily and Pollock, Lori and Vijay-Shanker, K. , title =. Proceedings of the 31st International Conference on Software Engineering (ICSE) , year =

work page

[6] [6]

and Hill, Emily and Sridhara, Giriprasad and Shepherd, David , title =

Pollock, Lori and Vijay-Shanker, K. and Hill, Emily and Sridhara, Giriprasad and Shepherd, David , title =. Software Engineering -- International Summer Schools, ISSSE 2009--2011 , series =. 2013 , pages =

work page 2009

[7] [7]

and McMillan, Collin , title =

McBurney, Paul W. and McMillan, Collin , title =. Proceedings of the 22nd IEEE International Conference on Program Comprehension (ICPC) , year =

work page

[8] [8]

and Bosch, Nathaniel and D'Mello, Sidney , title =

Rodeghero, Paige and McMillan, Collin and McBurney, Paul W. and Bosch, Nathaniel and D'Mello, Sidney , title =. Proceedings of the 36th International Conference on Software Engineering (ICSE) , year =

work page

[9] [9]

and McMillan, Collin , title =

Rodeghero, Paige and Liu, Christopher and McBurney, Paul W. and McMillan, Collin , title =. IEEE Transactions on Software Engineering , volume =

work page

[10] [10]

and McMillan, Collin , title =

McBurney, Paul W. and McMillan, Collin , title =. IEEE Transactions on Software Engineering , volume =

work page

[11] [11]

and Bansal, A

Karas, Z. and Bansal, A. and Zhang, Y. and Li, T. and McMillan, C. and Huang, Y. , title =. ACM Transactions on Software Engineering and Methodology (TOSEM) , year =

work page

[12] [12]

IEEE Trans

Wallace, Robert and Bansal, Aakash and Karas, Zachary and Tang, Ningzhi and Huang, Yu and Jia-Jun Li, Toby and McMillan, Collin , title =. IEEE Trans. Softw. Eng. , year =

work page

[13] [13]

and Wang, J

Zhang, C. and Wang, J. and Zhou, Q. and Xu, T. and Tang, K. and Gui, H. and Liu, F. , title =. Symmetry , volume =. 2022 , doi =

work page 2022

[14] [14]

and Pan, M

Zhu, Y. and Pan, M. , title =. 2019 , archivePrefix =

work page 2019

[15] [15]

and Hou, X

Zhang, X. and Hou, X. and Qiao, X. and Song, W. , title =. Empirical Software Engineering , volume =. 2024 , doi =

work page 2024

[16] [16]

McBurney, P. W. and McMillan, C. , title =. Empirical Software Engineering , volume =. 2016 , doi =

work page 2016

[17] [17]

and Konstas, I

Iyer, S. and Konstas, I. and Cheung, A. and Zettlemoyer, L. , title =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

work page

[18] [18]

Ahmad, W. U. and Chakraborty, S. and Ray, B. and Chang, K.-W. , title =. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =. doi:10.18653/v1/2021.naacl-main.211" , pages =

work page doi:10.18653/v1/2021.naacl-main.211 2021

[19] [19]

and Wang, W

Wang, Y. and Wang, W. and Joty, S. and Hoi, S. C. H. , title =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

work page 2021

[20] [20]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Wang, Y. and Le, H. and Gotmare, A. D. and Bui, N. D. Q. and Li, J. and Hoi, S. C. H. , title =. 2023 , booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", pages =

work page 2023

[21] [21]

and Lu, S

Guo, D. and Lu, S. and Duan, N. and Wang, Y. and Zhou, M. and Yin, J. , title =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , address =. 2022 , pages =

work page 2022

[22] [22]

and Allamanis, M

Husain, H. and Allamanis, M. and Wu, H.-H. and Brockschmidt, M. and Gazit, T. , title =. 2019 , archivePrefix =

work page 2019

[23] [23]

Proceedings of the International Conference on Software Engineering (ICSE) , year =

Sun, Weisong and Miao, Yun and Li, Yuekang and Zhang, Hongyu and Fang, Chunrong and Liu, Yi and Deng, Gelei and Liu, Yang and Chen, Zhenyu , title =. Proceedings of the International Conference on Software Engineering (ICSE) , year =

work page

[24] [24]

and Iter, D

Liu, Y. and Iter, D. and Xu, Y. and Wang, S. and Xu, R. and Zhu, C. , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023 , doi =

work page 2023

[25] [25]

, title =

Graham, Y. , title =. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , address =. 2015 , doi =

work page 2015

[26] [26]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , articleno =

Mastropaolo, Antonio and Ciniselli, Matteo and Di Penta, Massimiliano and Bavota, Gabriele , title =. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , articleno =

work page

[27] [27]

2024 , url =

Evaluating the Performance of. 2024 , url =

work page 2024

[28] [28]

Automated Software Engineering , volume =

Su, Chia-Yi and McMillan, Collin , title =. Automated Software Engineering , volume =. 2024 , doi =

work page 2024

[29] [29]

Journal of Systems and Software , volume =

Nazar, Najam and Aleti, Aldeida and Zheng, Yaokun , title =. Journal of Systems and Software , volume =

work page

[30] [30]

Journal of Systems and Software , volume =

Arcelli Fontana, Francesca and Zanoni, Marco and Stella, Fabio , title =. Journal of Systems and Software , volume =. 2015 , doi =

work page 2015

[31] [31]

A tool for design pattern detection and software architecture reconstruction , journal =

Francesca. A tool for design pattern detection and software architecture reconstruction , journal =. 2011 , doi =

work page 2011

[32] [32]

and Hasheminejad, S

Yarahmadi, H. and Hasheminejad, S. M. H. , title =. Artificial Intelligence Review , volume =. 2020 , doi =

work page 2020

[33] [33]

Proceedings of the 29th Conference on Pattern Languages of Programs , articleno =

Moreira, Rodrigo and Fernandes, Eduardo and Figueiredo, Eduardo , title =. Proceedings of the 29th Conference on Pattern Languages of Programs , articleno =

work page

[34] [34]

2012 16th European Conference on Software Maintenance and Reengineering , year =

Fontana, Francesca Arcelli and Caracciolo, Andrea and Zanoni, Marco , title =. 2012 16th European Conference on Software Maintenance and Reengineering , year =

work page 2012

[35] [35]

and Zhao, T

Alnusair, A. and Zhao, T. and Yan, G. , title =. International Journal on Software Tools for Technology Transfer (STTT) , volume =. 2014 , pages =. doi:10.1007/s10009-013-0292-z

work page doi:10.1007/s10009-013-0292-z 2014

[36] [36]

Empirical Software Engineering , volume =

Nazar, Najam and Sikka, Sameer and Treude, Christoph , title =. Empirical Software Engineering , volume =. 2026 , doi =

work page 2026

[37] [37]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year =

Kamal Eddine, Moussa and Shang, Guokan and Tixier, Antoine and Vazirgiannis, Michalis , title =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year =. doi:10.18653/v1/2022.acl-long.93 , pages =

work page doi:10.18653/v1/2022.acl-long.93 2022

[38] [38]

Empirical Software Engineering , volume =

Pandey, Sushant Kumar and Chand, Sivajeet and Horkoff, Jennifer and Staron, Miroslaw and Ochodek, Miroslaw and Durisic, Darko , title =. Empirical Software Engineering , volume =. 2025 , doi =

work page 2025

[39] [39]

2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) , year =

Pan, Zhenyu and Song, Xuefeng and Wang, Yunkun and Cao, Rongyu and Li, Binhua and Li, Yongbin and Liu, Han , title =. 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) , year =

work page 2025

[40] [40]

2024 , url =

Cheaper, Better, Faster, Stronger --- Mixtral 8. 2024 , url =

work page 2024

[41] [41]

2024 , url =

Mixtral-8. 2024 , url =

work page 2024

[42] [42]

2024 , url =

Mixtral 8. 2024 , url =

work page 2024

[43] [43]

Jiang, Albert Q. and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bou Hanna, Emma and Bressand, Florian and Lengyel, Gianna and Bour, Guillaume and Lample, Guillaume and Lavaud, L. Mixtral of Experts , year =

work page

[44] [44]

International Journal of Software Engineering and Knowledge Engineering , volume =

Nazar, Najam and Chen, Norman and Chong, Chun Yong , title =. International Journal of Software Engineering and Knowledge Engineering , volume =. 2023 , doi =

work page 2023

[45] [45]

Proceedings of the Seventeenth International Conference on Pervasive Patterns and Applications (PATTERNS 2025) , year =

Schindler, Christian and Rausch, Andreas , title =. Proceedings of the Seventeenth International Conference on Pervasive Patterns and Applications (PATTERNS 2025) , year =

work page 2025