TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs

Arie van Deursen; Daniele Cipollone; Egor Bogomolov; Maliheh Izadi

arxiv: 2508.02455 · v1 · submitted 2025-08-04 · 💻 cs.SE · cs.AI· cs.IR

TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs

Daniele Cipollone , Egor Bogomolov , Arie van Deursen , Maliheh Izadi This is my paper

Pith reviewed 2026-05-19 01:07 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.IR

keywords code completionrankingprefix treelanguage modelsIDEstoken scoringmodel agnosticgreedy decoding

0 comments

The pith

A prefix tree of completions lets any language model rank IDE suggestions via one greedy pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ranking static code completions by first placing all valid suggestions into a prefix tree and then running a single greedy decoding pass with a language model to extract token-level scores along the tree. This produces a context-sensitive ranking that reflects actual token probabilities without requiring beam search, custom prompts, or any changes to the underlying model. The method stays lightweight enough to run inside existing IDE tools and works with models already deployed for code completion. If the ranking proves useful in practice, developers would see better suggestions higher in the list, increasing the chance that correct completions are noticed and accepted during normal coding.

Core claim

Static completions collected from analysis can be ranked by organizing them into a prefix tree and executing one greedy decoding pass that accumulates token scores across every branch. The resulting order reflects fine-grained token likelihoods in the current context while remaining fast and compatible with any language model architecture.

What carries the argument

The prefix tree that holds all valid completions, allowing one forward greedy traversal to score every token in every suggestion.

If this is right

Existing language models already running in IDEs can be reused for ranking without retraining or extra infrastructure.
The ranking step avoids the latency of beam search while still using full token-level information from the model.
The approach remains independent of any particular model architecture or training data.
Integration requires only the ability to run the model once over a tree of completions, fitting within current IDE performance budgets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tree-based scoring could be applied to rank suggestions in other structured domains such as API usage or configuration files.
If token scores correlate with acceptance, the method could replace many hand-tuned heuristic rankers in production IDEs.
Extending the tree to include partial edits or multi-token rewrites might further improve suggestion quality without changing the core pass.

Load-bearing premise

Token scores collected from a single greedy pass over the prefix tree will produce rankings that developers find more useful than existing heuristics.

What would settle it

An A/B test in a live IDE that measures whether the new ranking increases the rate at which developers accept or edit the top suggestion compared with the current system.

Figures

Figures reproduced from arXiv: 2508.02455 by Arie van Deursen, Daniele Cipollone, Egor Bogomolov, Maliheh Izadi.

**Figure 2.** Figure 2: Distribution of ranking times across models and decoding strategies [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Recall@1 comparison of model outputs w/ and w/o constrained [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Token-level code completion is one of the most critical features in modern Integrated Development Environments (IDEs). It assists developers by suggesting relevant identifiers and APIs during coding. While completions are typically derived from static analysis, their usefulness depends heavily on how they are ranked, as correct predictions buried deep in the list are rarely seen by users. Most current systems rely on hand-crafted heuristics or lightweight machine learning models trained on user logs, which can be further improved to capture context information and generalize across projects and coding styles. In this work, we propose a new scoring approach to ranking static completions using language models in a lightweight and model-agnostic way. Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree. This enables a precise token-aware ranking without needing beam search, prompt engineering, or model adaptations. The approach is fast, architecture-agnostic, and compatible with already deployed models for code completion. These findings highlight a practical and effective pathway for integrating language models into already existing tools within IDEs, and ultimately providing smarter and more responsive developer assistance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TreeRanker describes a prefix-tree plus single greedy pass trick for ranking static code completions with off-the-shelf LMs, but the paper supplies no experiments or baseline comparisons to show the rankings actually help developers.

read the letter

The paper's main contribution is a concrete way to score and rank completions from static analysis: stuff all valid ones into a prefix tree, run one greedy decode over the tree with a language model, and use the resulting token scores for ordering. This avoids beam search, prompt changes, or any model adaptation, which keeps it lightweight and compatible with models already running in IDEs. That combination of tree organization and single-pass token scoring is not laid out in the prior heuristics or small ML rankers they cite, so the engineering detail is the actual new piece here. It is presented clearly enough that someone could implement it from the description. The approach is also architecture-agnostic by design, which is a practical plus for real tool integration. The soft spot is the missing evidence. The abstract and method claim the token scores will produce better rankings than current heuristics or lightweight models, yet there are no quantitative results, no ablation on the greedy pass, and no comparison on acceptance rates, position of correct items, or IDE telemetry. Without those numbers the utility argument stays assumptive. If the full paper has experiments they need to be shown up front; otherwise the central claim does not yet hold up. This is aimed at people building or improving code completion inside IDEs. A practitioner or applied researcher who wants a fast way to layer LM signals on top of existing static completions would get the most from the details. It deserves a serious referee because the technique is implementable and the problem is real, even though the evaluation needs work. I would send it for review and ask the authors to add direct comparisons against the baselines they mention.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TreeRanker, a ranking system for static code completions in IDEs. Valid completions are organized into a prefix tree; a single greedy decoding pass over this tree collects token-level scores from an unmodified language model. The resulting scores are used to rank suggestions, with the claim that this yields a precise, token-aware ordering that is fast, architecture-agnostic, and requires neither beam search, prompt engineering, nor model adaptation.

Significance. If the claimed ranking quality and runtime hold under realistic IDE workloads, the technique would offer a lightweight route for injecting existing code LMs into production completion engines without retraining or architectural changes. The prefix-tree-plus-greedy-pass construction is a concrete, implementable idea that could reduce reliance on hand-crafted heuristics.

major comments (2)

[Evaluation] The central claim that token-level scores obtained from one greedy pass over the prefix tree produce rankings more useful to developers than hand-crafted heuristics or lightweight ML models is unsupported. No quantitative results, acceptance-rate metrics, position-of-correct-completion statistics, or IDE-telemetry comparisons appear anywhere in the manuscript.
[Approach] §3 (Approach): the description of the prefix-tree traversal and score aggregation is given at a high level; it is unclear how ties are broken, how the final ranking is extracted from the collected token scores, or whether the procedure remains exact when the LM vocabulary and the static-analysis vocabulary differ.

minor comments (2)

[Abstract] The abstract and introduction repeatedly use the phrase 'precise token-aware ranking' without defining what 'precise' means in this context or how it is measured.
[Approach] No runtime or memory figures are supplied even for the prefix-tree construction step, making the 'fast' claim difficult to assess.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for noting the potential practical value of TreeRanker. We agree that the manuscript requires stronger empirical support and more precise algorithmic details to substantiate its claims. We address each major comment below and will revise the paper accordingly.

read point-by-point responses

Referee: [Evaluation] The central claim that token-level scores obtained from one greedy pass over the prefix tree produce rankings more useful to developers than hand-crafted heuristics or lightweight ML models is unsupported. No quantitative results, acceptance-rate metrics, position-of-correct-completion statistics, or IDE-telemetry comparisons appear anywhere in the manuscript.

Authors: We acknowledge that the present manuscript does not contain quantitative evaluation results to support the ranking-quality claims. The current version emphasizes the design of the prefix-tree approach and its computational advantages. In the revised manuscript we will add an evaluation section that reports comparisons against hand-crafted heuristics and lightweight ML baselines on standard code-completion benchmarks, using metrics including mean reciprocal rank, top-k accuracy, and simulated acceptance rates derived from developer telemetry where possible. revision: yes
Referee: [Approach] §3 (Approach): the description of the prefix-tree traversal and score aggregation is given at a high level; it is unclear how ties are broken, how the final ranking is extracted from the collected token scores, or whether the procedure remains exact when the LM vocabulary and the static-analysis vocabulary differ.

Authors: We thank the referee for highlighting the insufficient detail in §3. We will revise this section to include pseudocode and step-by-step explanations of the traversal and aggregation procedure. Ties will be broken by the order of completion generation from the static analyzer, with a secondary lexical sort. The final ranking is obtained by sorting completions in descending order of their aggregated log-probability scores. For vocabulary mismatch we will describe an explicit token-mapping step that aligns static-analysis identifiers with the LM's subword vocabulary while preserving the exactness of the collected scores. revision: yes

Circularity Check

0 steps flagged

No circularity: procedural method described without self-referential reductions or fitted predictions

full rationale

The paper introduces TreeRanker as a prefix-tree organization of completions followed by a single greedy decoding pass to obtain token-level scores for ranking. This is presented as a direct algorithmic construction that is model-agnostic and requires no beam search or adaptations. No equations, parameter fitting, or self-citations are shown to define the ranking output in terms of itself or prior author work. The central claim remains an independent procedural proposal whose utility would be evaluated externally rather than by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the provided abstract; the method appears to rely on standard language-model inference and static-analysis validity checks.

pith-pipeline@v0.9.0 · 5744 in / 1079 out tokens · 25602 ms · 2026-05-19T01:07:01.484429+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 6 internal anchors

[1]

How are Java software developers using the Eclipse IDE?

G. Murphy, M. Kersten, and L. Findlater, “How are Java software developers using the Eclipse IDE?” IEEE Software , vol. 23, no. 4, pp. 76–83, Jul. 2006. [Online]. Available: https://ieeexplore.ieee.org/ document/1657944

work page arXiv 2006
[2]

A Study of Visual Studio Usage in Practice,

S. Amann, S. Proksch, S. Nadi, and M. Mezini, “A Study of Visual Studio Usage in Practice,” in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, Mar. 2016, pp. 124–134. [Online]. Available: https://ieeexplore.ieee.org/document/7476636

work page arXiv 2016
[3]

Language models for code completion: A practical evaluation,

M. Izadi, J. Katzy, T. Van Dam, M. Otten, R. M. Popescu, and A. Van Deursen, “Language models for code completion: A practical evaluation,” in Proceedings of the IEEE/ACM 46th International Con- ference on Software Engineering , 2024, pp. 1–13

work page 2024
[4]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . K. Li, F. Luo, Y . Xiong, and W. Liang, “DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence,” Jan. 2024, arXiv:2401.14196 [cs]. [Online]. Available: http://arxiv.org/abs/2401.14196

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Practitioners’ expectations on code completion,

C. Wang, J. Hu, C. Gao, Y . Jin, T. Xie, H. Huang, Z. Lei, and Y . Deng, “Practitioners’ expectations on code completion,” arXiv preprint arXiv:2301.03846, 2023

work page arXiv 2023
[6]

Codefill: Multi-token code completion by jointly learning from structure and naming sequences,

M. Izadi, R. Gismondi, and G. Gousios, “Codefill: Multi-token code completion by jointly learning from structure and naming sequences,” in Proceedings of the 44th international conference on software engi- neering, 2022, pp. 401–412

work page 2022
[7]

Better context makes better code language models: a case study on function call argument completion,

H. Pei, J. Zhao, L. Lausen, S. Zha, and G. Karypis, “Better context makes better code language models: a case study on function call argument completion,” in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational ...

work page doi:10.1609/aaai.v37i4.25653 2023
[8]

Repository-level prompt generation for large language models of code,

D. Shrivastava, H. Larochelle, and D. Tarlow, “Repository-level prompt generation for large language models of code,” in Proceedings of the 40th International Conference on Machine Learning , ser. ICML’23. JMLR.org, 2023

work page 2023
[9]

All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,

V . Bibaev, A. Kalina, V . Lomshakov, Y . Golubev, A. Bezzubov, N. Povarov, and T. Bryksin, “All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,” Sep. 2022, arXiv:2205.10692 [cs]. [Online]. Available: http://arxiv.org/abs/ 2205.10692

work page arXiv 2022
[10]

Beam Search Strategies for Neural Machine Translation

M. Freitag and Y . Al-Onaizan, “Beam Search Strategies for Neural Machine Translation,” in Proceedings of the First Workshop on Neural Machine Translation, 2017, pp. 56–60, arXiv:1702.01806 [cs]. [Online]. Available: http://arxiv.org/abs/1702.01806

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),

L. Huang, K. Zhao, and M. Ma, “When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, M. Palmer, R. Hwa, and S. Riedel, Eds. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 2134–2139. [Online]. Available: h...

work page 2017
[12]

A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . Barcelona Spain: ACM, Aug. 2024, pp. 6491–6501. [Online]. Available: https://dl.acm.org/doi/10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470 2024
[13]

Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality- aware decisions

T. Ahmed, K. S. Pai, P. Devanbu, and E. Barr, “Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization),” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, Apr. 2024, pp. 1–13. [Online]. Available: https://dl.acm.org/doi/1...

work page doi:10.1145/3597503.3639183 2024
[14]

Full Line Code Completion: Bringing AI to Desktop,

A. Semenkin, V . Bibaev, Y . Sokolov, K. Krylov, A. Kalina, A. Khannanova, D. Savenkov, D. Rovdo, I. Davidenko, K. Karnaukhov, M. Vakhrushev, M. Kostyukov, M. Podvitskii, P. Surkov, Y . Golubev, N. Povarov, and T. Bryksin, “Full Line Code Completion: Bringing AI to Desktop,” Oct. 2024, arXiv:2405.08704 [cs]. [Online]. Available: http://arxiv.org/abs/2405.08704

work page arXiv 2024
[15]

FIRST: Faster improved listwise reranking with single token decoding,

R. Gangi Reddy, J. Doo, Y . Xu, M. A. Sultan, D. Swain, A. Sil, and H. Ji, “FIRST: Faster improved listwise reranking with single token decoding,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. ...

work page 2024
[16]

Web API change- proneness prediction,

Y . Chen, C. Gao, M. Zhu, Q. Liao, Y . Wang, and G. Xu, “ APIGen: Generative API Method Recommendation ,” in 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) . Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2024, pp. 171–182. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SANER60148.2024.00025

work page doi:10.1109/saner60148.2024.00025 2024
[17]

Large language models are effective text rankers with pairwise ranking prompting,

Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, L. Yan, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang, and M. Bendersky, “Large language models are effective text rankers with pairwise ranking prompting,” in Findings of the Association for Computational Linguistics: NAACL 2024, K. Duh, H. Gomez, and S. Bethard, Eds. Mexico City, Mexico: Association for Comp...

work page 2024
[18]

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,

L. A. Agrawal, A. Kanade, N. Goyal, S. Lahiri, and S. Rajamani, “Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,” Advances in Neural Information Processing Systems, vol. 36, pp. 32 270–32 298, Dec. 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ 662b1774ba8845fc1fa3d1fc0177ceeb-Abstrac...

work page 2023
[19]

Long Code Arena: a Set of Benchmarks for Long-Context Code Models,

E. Bogomolov, A. Eliseeva, T. Galimzyanov, E. Glukhov, A. Shapkin, M. Tigina, Y . Golubev, A. Kovrigin, A. van Deursen, M. Izadi, and T. Bryksin, “Long Code Arena: a Set of Benchmarks for Long-Context Code Models,” Jun. 2024, arXiv:2406.11612 [cs]. [Online]. Available: http://arxiv.org/abs/2406.11612

work page arXiv 2024
[20]

Attention is All you Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems , I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pro...

work page 2017
[21]

Grammar-constrained decoding for structured NLP tasks without finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-constrained decoding for structured NLP tasks without finetuning,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Availabl...

work page 2023
[22]

A literature review on reaction time,

R. J. Kosinski, “A literature review on reaction time,” Clemson Univer- sity, vol. 10, no. 1, pp. 337–344, 2008

work page 2008
[23]

Greed is All You Need: An Evaluation of Tokenizer Inference Methods,

O. Uzan, C. W. Schmidt, C. Tanner, and Y . Pinter, “Greed is All You Need: An Evaluation of Tokenizer Inference Methods,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024,...

work page 2024
[24]

tree-sitter/tree-sitter,

“tree-sitter/tree-sitter,” May 2025, original-date: 2013-11-06T06:16:00Z. [Online]. Available: https://github.com/tree-sitter/tree-sitter

work page 2025
[25]

API Overview — Jedi 0.19.2 documentation

“API Overview — Jedi 0.19.2 documentation.” [Online]. Available: https://jedi.readthedocs.io/en/latest/docs/api.html

work page
[26]

ONNX | Home

“ONNX | Home.” [Online]. Available: https://onnx.ai/

work page
[27]

ggml-org/llama.cpp,

“ggml-org/llama.cpp,” May 2025, original-date: 2023-03-10T18:58:00Z. [Online]. Available: https://github.com/ggml-org/llama.cpp

work page 2025
[28]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” Feb. 2023, arXiv:2302.13971 [cs]. [Online]. Available: http://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

L. B. Allal, A. Lozhkov, E. Bakouch, G. M. Bl ´azquez, G. Penedo, L. Tunstall, A. Marafioti, H. Kydl ´ıˇcek, A. P. Lajar ´ın, V . Srivastav, J. Lochner, C. Fahlgren, X.-S. Nguyen, C. Fourrier, B. Burtenshaw, H. Larcher, H. Zhao, C. Zakka, M. Morlon, C. Raffel, L. v. Werra, and T. Wolf, “SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Langua...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y . Zhou, S. Savarese, and C. Xiong, “CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis,” Feb. 2023, arXiv:2203.13474 [cs]. [Online]. Available: http://arxiv.org/abs/2203.13474

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

IntelliJ IDEA – the IDE for Pro Java and Kotlin Development

“IntelliJ IDEA – the IDE for Pro Java and Kotlin Development.” [Online]. Available: https://www.jetbrains.com/idea/

work page
[32]

Visual Studio Code - Code Editing. Redefined

“Visual Studio Code - Code Editing. Redefined.” [Online]. Available: https://code.visualstudio.com/

work page
[33]

Learning to rank: from pairwise approach to listwise approach,

Z. Cao, T. Qin, T.-Y . Liu, M.-F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proceedings of the 24th international conference on Machine learning , ser. ICML ’07. New York, NY , USA: Association for Computing Machinery, Jun. 2007, pp. 129–136. [Online]. Available: https://doi.org/10.1145/1273496.1273513

work page doi:10.1145/1273496.1273513 2007
[34]

Ranking: objectives and metrics |

“Ranking: objectives and metrics |.” [Online]. Available: https: //catboost.ai/docs/en/concepts/loss-functions-ranking#QuerySoftMax

work page
[35]

Introducing Visual Studio IntelliCode,

A. Silver, “Introducing Visual Studio IntelliCode,” May

work page
[36]

Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/

[Online]. Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/

work page
[37]

ML-Enhanced Code Completion Improves Developer Productivity

M. Tabachnyk, “ML-Enhanced Code Completion Improves Developer Productivity.” [Online]. Available: https://research.google/ blog/ml-enhanced-code-completion-improves-developer-productivity/

work page
[38]

Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,

J. Li, R. Huang, W. Li, K. Yao, and W. Tan, “Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,” Jun. 2021, arXiv:2106.13928 [cs]. [Online]. Available: http://arxiv.org/abs/2106.13928

work page arXiv 2021
[39]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. v. Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “HuggingFace’s Transformers: State- of-the-art Natural Language Processing,” Jul. 2020, arXiv:1910.03771 [cs]. [Onli...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[40]

The Probable Error of a Mean,

Student, “The Probable Error of a Mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908, publisher: [Oxford University Press, Biometrika Trust]. [Online]. Available: https://www.jstor.org/stable/2331554

work page arXiv 1908
[41]

Pythia: Ai-assisted code completion system,

A. Svyatkovskiy, Y . Zhao, S. Fu, and N. Sundaresan, “Pythia: Ai-assisted code completion system,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , ser. KDD ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 2727–2735. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1145...

work page doi:10.1145/3292500.3330699 2019
[42]

Cscc: Simple, efficient, context sensitive code completion,

M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou, “Cscc: Simple, efficient, context sensitive code completion,” in Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ser. ICSME ’14. USA: IEEE Computer Society, 2014, p. 71–80. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1109/ ICSME.2014.29

work page 2014
[43]

Code Prediction by Feeding Trees to Transformers,

S. Kim, J. Zhao, Y . Tian, and S. Chandra, “Code Prediction by Feeding Trees to Transformers,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) , May 2021, pp. 150–162, iSSN: 1558-1225. [Online]. Available: https: //ieeexplore.ieee.org/document/9402114/

work page arXiv 2021

[1] [1]

How are Java software developers using the Eclipse IDE?

G. Murphy, M. Kersten, and L. Findlater, “How are Java software developers using the Eclipse IDE?” IEEE Software , vol. 23, no. 4, pp. 76–83, Jul. 2006. [Online]. Available: https://ieeexplore.ieee.org/ document/1657944

work page arXiv 2006

[2] [2]

A Study of Visual Studio Usage in Practice,

S. Amann, S. Proksch, S. Nadi, and M. Mezini, “A Study of Visual Studio Usage in Practice,” in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, Mar. 2016, pp. 124–134. [Online]. Available: https://ieeexplore.ieee.org/document/7476636

work page arXiv 2016

[3] [3]

Language models for code completion: A practical evaluation,

M. Izadi, J. Katzy, T. Van Dam, M. Otten, R. M. Popescu, and A. Van Deursen, “Language models for code completion: A practical evaluation,” in Proceedings of the IEEE/ACM 46th International Con- ference on Software Engineering , 2024, pp. 1–13

work page 2024

[4] [4]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . K. Li, F. Luo, Y . Xiong, and W. Liang, “DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence,” Jan. 2024, arXiv:2401.14196 [cs]. [Online]. Available: http://arxiv.org/abs/2401.14196

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

Practitioners’ expectations on code completion,

C. Wang, J. Hu, C. Gao, Y . Jin, T. Xie, H. Huang, Z. Lei, and Y . Deng, “Practitioners’ expectations on code completion,” arXiv preprint arXiv:2301.03846, 2023

work page arXiv 2023

[6] [6]

Codefill: Multi-token code completion by jointly learning from structure and naming sequences,

M. Izadi, R. Gismondi, and G. Gousios, “Codefill: Multi-token code completion by jointly learning from structure and naming sequences,” in Proceedings of the 44th international conference on software engi- neering, 2022, pp. 401–412

work page 2022

[7] [7]

Better context makes better code language models: a case study on function call argument completion,

H. Pei, J. Zhao, L. Lausen, S. Zha, and G. Karypis, “Better context makes better code language models: a case study on function call argument completion,” in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational ...

work page doi:10.1609/aaai.v37i4.25653 2023

[8] [8]

Repository-level prompt generation for large language models of code,

D. Shrivastava, H. Larochelle, and D. Tarlow, “Repository-level prompt generation for large language models of code,” in Proceedings of the 40th International Conference on Machine Learning , ser. ICML’23. JMLR.org, 2023

work page 2023

[9] [9]

All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,

V . Bibaev, A. Kalina, V . Lomshakov, Y . Golubev, A. Bezzubov, N. Povarov, and T. Bryksin, “All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,” Sep. 2022, arXiv:2205.10692 [cs]. [Online]. Available: http://arxiv.org/abs/ 2205.10692

work page arXiv 2022

[10] [10]

Beam Search Strategies for Neural Machine Translation

M. Freitag and Y . Al-Onaizan, “Beam Search Strategies for Neural Machine Translation,” in Proceedings of the First Workshop on Neural Machine Translation, 2017, pp. 56–60, arXiv:1702.01806 [cs]. [Online]. Available: http://arxiv.org/abs/1702.01806

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),

L. Huang, K. Zhao, and M. Ma, “When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, M. Palmer, R. Hwa, and S. Riedel, Eds. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 2134–2139. [Online]. Available: h...

work page 2017

[12] [12]

A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . Barcelona Spain: ACM, Aug. 2024, pp. 6491–6501. [Online]. Available: https://dl.acm.org/doi/10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470 2024

[13] [13]

Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality- aware decisions

T. Ahmed, K. S. Pai, P. Devanbu, and E. Barr, “Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization),” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, Apr. 2024, pp. 1–13. [Online]. Available: https://dl.acm.org/doi/1...

work page doi:10.1145/3597503.3639183 2024

[14] [14]

Full Line Code Completion: Bringing AI to Desktop,

A. Semenkin, V . Bibaev, Y . Sokolov, K. Krylov, A. Kalina, A. Khannanova, D. Savenkov, D. Rovdo, I. Davidenko, K. Karnaukhov, M. Vakhrushev, M. Kostyukov, M. Podvitskii, P. Surkov, Y . Golubev, N. Povarov, and T. Bryksin, “Full Line Code Completion: Bringing AI to Desktop,” Oct. 2024, arXiv:2405.08704 [cs]. [Online]. Available: http://arxiv.org/abs/2405.08704

work page arXiv 2024

[15] [15]

FIRST: Faster improved listwise reranking with single token decoding,

R. Gangi Reddy, J. Doo, Y . Xu, M. A. Sultan, D. Swain, A. Sil, and H. Ji, “FIRST: Faster improved listwise reranking with single token decoding,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. ...

work page 2024

[16] [16]

Web API change- proneness prediction,

Y . Chen, C. Gao, M. Zhu, Q. Liao, Y . Wang, and G. Xu, “ APIGen: Generative API Method Recommendation ,” in 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) . Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2024, pp. 171–182. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SANER60148.2024.00025

work page doi:10.1109/saner60148.2024.00025 2024

[17] [17]

Large language models are effective text rankers with pairwise ranking prompting,

Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, L. Yan, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang, and M. Bendersky, “Large language models are effective text rankers with pairwise ranking prompting,” in Findings of the Association for Computational Linguistics: NAACL 2024, K. Duh, H. Gomez, and S. Bethard, Eds. Mexico City, Mexico: Association for Comp...

work page 2024

[18] [18]

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,

L. A. Agrawal, A. Kanade, N. Goyal, S. Lahiri, and S. Rajamani, “Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,” Advances in Neural Information Processing Systems, vol. 36, pp. 32 270–32 298, Dec. 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ 662b1774ba8845fc1fa3d1fc0177ceeb-Abstrac...

work page 2023

[19] [19]

Long Code Arena: a Set of Benchmarks for Long-Context Code Models,

E. Bogomolov, A. Eliseeva, T. Galimzyanov, E. Glukhov, A. Shapkin, M. Tigina, Y . Golubev, A. Kovrigin, A. van Deursen, M. Izadi, and T. Bryksin, “Long Code Arena: a Set of Benchmarks for Long-Context Code Models,” Jun. 2024, arXiv:2406.11612 [cs]. [Online]. Available: http://arxiv.org/abs/2406.11612

work page arXiv 2024

[20] [20]

Attention is All you Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems , I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pro...

work page 2017

[21] [21]

Grammar-constrained decoding for structured NLP tasks without finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-constrained decoding for structured NLP tasks without finetuning,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Availabl...

work page 2023

[22] [22]

A literature review on reaction time,

R. J. Kosinski, “A literature review on reaction time,” Clemson Univer- sity, vol. 10, no. 1, pp. 337–344, 2008

work page 2008

[23] [23]

Greed is All You Need: An Evaluation of Tokenizer Inference Methods,

O. Uzan, C. W. Schmidt, C. Tanner, and Y . Pinter, “Greed is All You Need: An Evaluation of Tokenizer Inference Methods,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024,...

work page 2024

[24] [24]

tree-sitter/tree-sitter,

“tree-sitter/tree-sitter,” May 2025, original-date: 2013-11-06T06:16:00Z. [Online]. Available: https://github.com/tree-sitter/tree-sitter

work page 2025

[25] [25]

API Overview — Jedi 0.19.2 documentation

“API Overview — Jedi 0.19.2 documentation.” [Online]. Available: https://jedi.readthedocs.io/en/latest/docs/api.html

work page

[26] [26]

ONNX | Home

“ONNX | Home.” [Online]. Available: https://onnx.ai/

work page

[27] [27]

ggml-org/llama.cpp,

“ggml-org/llama.cpp,” May 2025, original-date: 2023-03-10T18:58:00Z. [Online]. Available: https://github.com/ggml-org/llama.cpp

work page 2025

[28] [28]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” Feb. 2023, arXiv:2302.13971 [cs]. [Online]. Available: http://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

L. B. Allal, A. Lozhkov, E. Bakouch, G. M. Bl ´azquez, G. Penedo, L. Tunstall, A. Marafioti, H. Kydl ´ıˇcek, A. P. Lajar ´ın, V . Srivastav, J. Lochner, C. Fahlgren, X.-S. Nguyen, C. Fourrier, B. Burtenshaw, H. Larcher, H. Zhao, C. Zakka, M. Morlon, C. Raffel, L. v. Werra, and T. Wolf, “SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Langua...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y . Zhou, S. Savarese, and C. Xiong, “CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis,” Feb. 2023, arXiv:2203.13474 [cs]. [Online]. Available: http://arxiv.org/abs/2203.13474

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

IntelliJ IDEA – the IDE for Pro Java and Kotlin Development

“IntelliJ IDEA – the IDE for Pro Java and Kotlin Development.” [Online]. Available: https://www.jetbrains.com/idea/

work page

[32] [32]

Visual Studio Code - Code Editing. Redefined

“Visual Studio Code - Code Editing. Redefined.” [Online]. Available: https://code.visualstudio.com/

work page

[33] [33]

Learning to rank: from pairwise approach to listwise approach,

Z. Cao, T. Qin, T.-Y . Liu, M.-F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proceedings of the 24th international conference on Machine learning , ser. ICML ’07. New York, NY , USA: Association for Computing Machinery, Jun. 2007, pp. 129–136. [Online]. Available: https://doi.org/10.1145/1273496.1273513

work page doi:10.1145/1273496.1273513 2007

[34] [34]

Ranking: objectives and metrics |

“Ranking: objectives and metrics |.” [Online]. Available: https: //catboost.ai/docs/en/concepts/loss-functions-ranking#QuerySoftMax

work page

[35] [35]

Introducing Visual Studio IntelliCode,

A. Silver, “Introducing Visual Studio IntelliCode,” May

work page

[36] [36]

Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/

[Online]. Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/

work page

[37] [37]

ML-Enhanced Code Completion Improves Developer Productivity

M. Tabachnyk, “ML-Enhanced Code Completion Improves Developer Productivity.” [Online]. Available: https://research.google/ blog/ml-enhanced-code-completion-improves-developer-productivity/

work page

[38] [38]

Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,

J. Li, R. Huang, W. Li, K. Yao, and W. Tan, “Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,” Jun. 2021, arXiv:2106.13928 [cs]. [Online]. Available: http://arxiv.org/abs/2106.13928

work page arXiv 2021

[39] [39]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. v. Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “HuggingFace’s Transformers: State- of-the-art Natural Language Processing,” Jul. 2020, arXiv:1910.03771 [cs]. [Onli...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[40] [40]

The Probable Error of a Mean,

Student, “The Probable Error of a Mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908, publisher: [Oxford University Press, Biometrika Trust]. [Online]. Available: https://www.jstor.org/stable/2331554

work page arXiv 1908

[41] [41]

Pythia: Ai-assisted code completion system,

A. Svyatkovskiy, Y . Zhao, S. Fu, and N. Sundaresan, “Pythia: Ai-assisted code completion system,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , ser. KDD ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 2727–2735. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1145...

work page doi:10.1145/3292500.3330699 2019

[42] [42]

Cscc: Simple, efficient, context sensitive code completion,

M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou, “Cscc: Simple, efficient, context sensitive code completion,” in Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ser. ICSME ’14. USA: IEEE Computer Society, 2014, p. 71–80. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1109/ ICSME.2014.29

work page 2014

[43] [43]

Code Prediction by Feeding Trees to Transformers,

S. Kim, J. Zhao, Y . Tian, and S. Chandra, “Code Prediction by Feeding Trees to Transformers,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) , May 2021, pp. 150–162, iSSN: 1558-1225. [Online]. Available: https: //ieeexplore.ieee.org/document/9402114/

work page arXiv 2021