TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs
Pith reviewed 2026-05-19 01:07 UTC · model grok-4.3
The pith
A prefix tree of completions lets any language model rank IDE suggestions via one greedy pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Static completions collected from analysis can be ranked by organizing them into a prefix tree and executing one greedy decoding pass that accumulates token scores across every branch. The resulting order reflects fine-grained token likelihoods in the current context while remaining fast and compatible with any language model architecture.
What carries the argument
The prefix tree that holds all valid completions, allowing one forward greedy traversal to score every token in every suggestion.
If this is right
- Existing language models already running in IDEs can be reused for ranking without retraining or extra infrastructure.
- The ranking step avoids the latency of beam search while still using full token-level information from the model.
- The approach remains independent of any particular model architecture or training data.
- Integration requires only the ability to run the model once over a tree of completions, fitting within current IDE performance budgets.
Where Pith is reading between the lines
- The same tree-based scoring could be applied to rank suggestions in other structured domains such as API usage or configuration files.
- If token scores correlate with acceptance, the method could replace many hand-tuned heuristic rankers in production IDEs.
- Extending the tree to include partial edits or multi-token rewrites might further improve suggestion quality without changing the core pass.
Load-bearing premise
Token scores collected from a single greedy pass over the prefix tree will produce rankings that developers find more useful than existing heuristics.
What would settle it
An A/B test in a live IDE that measures whether the new ranking increases the rate at which developers accept or edit the top suggestion compared with the current system.
Figures
read the original abstract
Token-level code completion is one of the most critical features in modern Integrated Development Environments (IDEs). It assists developers by suggesting relevant identifiers and APIs during coding. While completions are typically derived from static analysis, their usefulness depends heavily on how they are ranked, as correct predictions buried deep in the list are rarely seen by users. Most current systems rely on hand-crafted heuristics or lightweight machine learning models trained on user logs, which can be further improved to capture context information and generalize across projects and coding styles. In this work, we propose a new scoring approach to ranking static completions using language models in a lightweight and model-agnostic way. Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree. This enables a precise token-aware ranking without needing beam search, prompt engineering, or model adaptations. The approach is fast, architecture-agnostic, and compatible with already deployed models for code completion. These findings highlight a practical and effective pathway for integrating language models into already existing tools within IDEs, and ultimately providing smarter and more responsive developer assistance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TreeRanker, a ranking system for static code completions in IDEs. Valid completions are organized into a prefix tree; a single greedy decoding pass over this tree collects token-level scores from an unmodified language model. The resulting scores are used to rank suggestions, with the claim that this yields a precise, token-aware ordering that is fast, architecture-agnostic, and requires neither beam search, prompt engineering, nor model adaptation.
Significance. If the claimed ranking quality and runtime hold under realistic IDE workloads, the technique would offer a lightweight route for injecting existing code LMs into production completion engines without retraining or architectural changes. The prefix-tree-plus-greedy-pass construction is a concrete, implementable idea that could reduce reliance on hand-crafted heuristics.
major comments (2)
- [Evaluation] The central claim that token-level scores obtained from one greedy pass over the prefix tree produce rankings more useful to developers than hand-crafted heuristics or lightweight ML models is unsupported. No quantitative results, acceptance-rate metrics, position-of-correct-completion statistics, or IDE-telemetry comparisons appear anywhere in the manuscript.
- [Approach] §3 (Approach): the description of the prefix-tree traversal and score aggregation is given at a high level; it is unclear how ties are broken, how the final ranking is extracted from the collected token scores, or whether the procedure remains exact when the LM vocabulary and the static-analysis vocabulary differ.
minor comments (2)
- [Abstract] The abstract and introduction repeatedly use the phrase 'precise token-aware ranking' without defining what 'precise' means in this context or how it is measured.
- [Approach] No runtime or memory figures are supplied even for the prefix-tree construction step, making the 'fast' claim difficult to assess.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for noting the potential practical value of TreeRanker. We agree that the manuscript requires stronger empirical support and more precise algorithmic details to substantiate its claims. We address each major comment below and will revise the paper accordingly.
read point-by-point responses
-
Referee: [Evaluation] The central claim that token-level scores obtained from one greedy pass over the prefix tree produce rankings more useful to developers than hand-crafted heuristics or lightweight ML models is unsupported. No quantitative results, acceptance-rate metrics, position-of-correct-completion statistics, or IDE-telemetry comparisons appear anywhere in the manuscript.
Authors: We acknowledge that the present manuscript does not contain quantitative evaluation results to support the ranking-quality claims. The current version emphasizes the design of the prefix-tree approach and its computational advantages. In the revised manuscript we will add an evaluation section that reports comparisons against hand-crafted heuristics and lightweight ML baselines on standard code-completion benchmarks, using metrics including mean reciprocal rank, top-k accuracy, and simulated acceptance rates derived from developer telemetry where possible. revision: yes
-
Referee: [Approach] §3 (Approach): the description of the prefix-tree traversal and score aggregation is given at a high level; it is unclear how ties are broken, how the final ranking is extracted from the collected token scores, or whether the procedure remains exact when the LM vocabulary and the static-analysis vocabulary differ.
Authors: We thank the referee for highlighting the insufficient detail in §3. We will revise this section to include pseudocode and step-by-step explanations of the traversal and aggregation procedure. Ties will be broken by the order of completion generation from the static analyzer, with a secondary lexical sort. The final ranking is obtained by sorting completions in descending order of their aggregated log-probability scores. For vocabulary mismatch we will describe an explicit token-mapping step that aligns static-analysis identifiers with the LM's subword vocabulary while preserving the exactness of the collected scores. revision: yes
Circularity Check
No circularity: procedural method described without self-referential reductions or fitted predictions
full rationale
The paper introduces TreeRanker as a prefix-tree organization of completions followed by a single greedy decoding pass to obtain token-level scores for ranking. This is presented as a direct algorithmic construction that is model-agnostic and requires no beam search or adaptations. No equations, parameter fitting, or self-citations are shown to define the ranking output in terms of itself or prior author work. The central claim remains an independent procedural proposal whose utility would be evaluated externally rather than by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
How are Java software developers using the Eclipse IDE?
G. Murphy, M. Kersten, and L. Findlater, “How are Java software developers using the Eclipse IDE?” IEEE Software , vol. 23, no. 4, pp. 76–83, Jul. 2006. [Online]. Available: https://ieeexplore.ieee.org/ document/1657944
-
[2]
A Study of Visual Studio Usage in Practice,
S. Amann, S. Proksch, S. Nadi, and M. Mezini, “A Study of Visual Studio Usage in Practice,” in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, Mar. 2016, pp. 124–134. [Online]. Available: https://ieeexplore.ieee.org/document/7476636
-
[3]
Language models for code completion: A practical evaluation,
M. Izadi, J. Katzy, T. Van Dam, M. Otten, R. M. Popescu, and A. Van Deursen, “Language models for code completion: A practical evaluation,” in Proceedings of the IEEE/ACM 46th International Con- ference on Software Engineering , 2024, pp. 1–13
work page 2024
-
[4]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . K. Li, F. Luo, Y . Xiong, and W. Liang, “DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence,” Jan. 2024, arXiv:2401.14196 [cs]. [Online]. Available: http://arxiv.org/abs/2401.14196
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Practitioners’ expectations on code completion,
C. Wang, J. Hu, C. Gao, Y . Jin, T. Xie, H. Huang, Z. Lei, and Y . Deng, “Practitioners’ expectations on code completion,” arXiv preprint arXiv:2301.03846, 2023
-
[6]
Codefill: Multi-token code completion by jointly learning from structure and naming sequences,
M. Izadi, R. Gismondi, and G. Gousios, “Codefill: Multi-token code completion by jointly learning from structure and naming sequences,” in Proceedings of the 44th international conference on software engi- neering, 2022, pp. 401–412
work page 2022
-
[7]
Better context makes better code language models: a case study on function call argument completion,
H. Pei, J. Zhao, L. Lausen, S. Zha, and G. Karypis, “Better context makes better code language models: a case study on function call argument completion,” in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational ...
-
[8]
Repository-level prompt generation for large language models of code,
D. Shrivastava, H. Larochelle, and D. Tarlow, “Repository-level prompt generation for large language models of code,” in Proceedings of the 40th International Conference on Machine Learning , ser. ICML’23. JMLR.org, 2023
work page 2023
-
[9]
All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,
V . Bibaev, A. Kalina, V . Lomshakov, Y . Golubev, A. Bezzubov, N. Povarov, and T. Bryksin, “All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs,” Sep. 2022, arXiv:2205.10692 [cs]. [Online]. Available: http://arxiv.org/abs/ 2205.10692
-
[10]
Beam Search Strategies for Neural Machine Translation
M. Freitag and Y . Al-Onaizan, “Beam Search Strategies for Neural Machine Translation,” in Proceedings of the First Workshop on Neural Machine Translation, 2017, pp. 56–60, arXiv:1702.01806 [cs]. [Online]. Available: http://arxiv.org/abs/1702.01806
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[11]
When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),
L. Huang, K. Zhao, and M. Ma, “When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size),” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, M. Palmer, R. Hwa, and S. Riedel, Eds. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 2134–2139. [Online]. Available: h...
work page 2017
-
[12]
A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,
W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A Survey on RAG Meeting LLMs: Towards Retrieval- Augmented Large Language Models,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . Barcelona Spain: ACM, Aug. 2024, pp. 6491–6501. [Online]. Available: https://dl.acm.org/doi/10.1145/3637528.3671470
-
[13]
T. Ahmed, K. S. Pai, P. Devanbu, and E. Barr, “Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization),” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, Apr. 2024, pp. 1–13. [Online]. Available: https://dl.acm.org/doi/1...
-
[14]
Full Line Code Completion: Bringing AI to Desktop,
A. Semenkin, V . Bibaev, Y . Sokolov, K. Krylov, A. Kalina, A. Khannanova, D. Savenkov, D. Rovdo, I. Davidenko, K. Karnaukhov, M. Vakhrushev, M. Kostyukov, M. Podvitskii, P. Surkov, Y . Golubev, N. Povarov, and T. Bryksin, “Full Line Code Completion: Bringing AI to Desktop,” Oct. 2024, arXiv:2405.08704 [cs]. [Online]. Available: http://arxiv.org/abs/2405.08704
-
[15]
FIRST: Faster improved listwise reranking with single token decoding,
R. Gangi Reddy, J. Doo, Y . Xu, M. A. Sultan, D. Swain, A. Sil, and H. Ji, “FIRST: Faster improved listwise reranking with single token decoding,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. ...
work page 2024
-
[16]
Web API change- proneness prediction,
Y . Chen, C. Gao, M. Zhu, Q. Liao, Y . Wang, and G. Xu, “ APIGen: Generative API Method Recommendation ,” in 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) . Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2024, pp. 171–182. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SANER60148.2024.00025
-
[17]
Large language models are effective text rankers with pairwise ranking prompting,
Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, L. Yan, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang, and M. Bendersky, “Large language models are effective text rankers with pairwise ranking prompting,” in Findings of the Association for Computational Linguistics: NAACL 2024, K. Duh, H. Gomez, and S. Bethard, Eds. Mexico City, Mexico: Association for Comp...
work page 2024
-
[18]
Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,
L. A. Agrawal, A. Kanade, N. Goyal, S. Lahiri, and S. Rajamani, “Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context,” Advances in Neural Information Processing Systems, vol. 36, pp. 32 270–32 298, Dec. 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ 662b1774ba8845fc1fa3d1fc0177ceeb-Abstrac...
work page 2023
-
[19]
Long Code Arena: a Set of Benchmarks for Long-Context Code Models,
E. Bogomolov, A. Eliseeva, T. Galimzyanov, E. Glukhov, A. Shapkin, M. Tigina, Y . Golubev, A. Kovrigin, A. van Deursen, M. Izadi, and T. Bryksin, “Long Code Arena: a Set of Benchmarks for Long-Context Code Models,” Jun. 2024, arXiv:2406.11612 [cs]. [Online]. Available: http://arxiv.org/abs/2406.11612
-
[20]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems , I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pro...
work page 2017
-
[21]
Grammar-constrained decoding for structured NLP tasks without finetuning,
S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-constrained decoding for structured NLP tasks without finetuning,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Availabl...
work page 2023
-
[22]
A literature review on reaction time,
R. J. Kosinski, “A literature review on reaction time,” Clemson Univer- sity, vol. 10, no. 1, pp. 337–344, 2008
work page 2008
-
[23]
Greed is All You Need: An Evaluation of Tokenizer Inference Methods,
O. Uzan, C. W. Schmidt, C. Tanner, and Y . Pinter, “Greed is All You Need: An Evaluation of Tokenizer Inference Methods,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024,...
work page 2024
-
[24]
“tree-sitter/tree-sitter,” May 2025, original-date: 2013-11-06T06:16:00Z. [Online]. Available: https://github.com/tree-sitter/tree-sitter
work page 2025
-
[25]
API Overview — Jedi 0.19.2 documentation
“API Overview — Jedi 0.19.2 documentation.” [Online]. Available: https://jedi.readthedocs.io/en/latest/docs/api.html
- [26]
-
[27]
“ggml-org/llama.cpp,” May 2025, original-date: 2023-03-10T18:58:00Z. [Online]. Available: https://github.com/ggml-org/llama.cpp
work page 2025
-
[28]
LLaMA: Open and Efficient Foundation Language Models
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” Feb. 2023, arXiv:2302.13971 [cs]. [Online]. Available: http://arxiv.org/abs/2302.13971
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
L. B. Allal, A. Lozhkov, E. Bakouch, G. M. Bl ´azquez, G. Penedo, L. Tunstall, A. Marafioti, H. Kydl ´ıˇcek, A. P. Lajar ´ın, V . Srivastav, J. Lochner, C. Fahlgren, X.-S. Nguyen, C. Fourrier, B. Burtenshaw, H. Larcher, H. Zhao, C. Zakka, M. Morlon, C. Raffel, L. v. Werra, and T. Wolf, “SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Langua...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y . Zhou, S. Savarese, and C. Xiong, “CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis,” Feb. 2023, arXiv:2203.13474 [cs]. [Online]. Available: http://arxiv.org/abs/2203.13474
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
IntelliJ IDEA – the IDE for Pro Java and Kotlin Development
“IntelliJ IDEA – the IDE for Pro Java and Kotlin Development.” [Online]. Available: https://www.jetbrains.com/idea/
-
[32]
Visual Studio Code - Code Editing. Redefined
“Visual Studio Code - Code Editing. Redefined.” [Online]. Available: https://code.visualstudio.com/
-
[33]
Learning to rank: from pairwise approach to listwise approach,
Z. Cao, T. Qin, T.-Y . Liu, M.-F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proceedings of the 24th international conference on Machine learning , ser. ICML ’07. New York, NY , USA: Association for Computing Machinery, Jun. 2007, pp. 129–136. [Online]. Available: https://doi.org/10.1145/1273496.1273513
-
[34]
Ranking: objectives and metrics |
“Ranking: objectives and metrics |.” [Online]. Available: https: //catboost.ai/docs/en/concepts/loss-functions-ranking#QuerySoftMax
-
[35]
Introducing Visual Studio IntelliCode,
A. Silver, “Introducing Visual Studio IntelliCode,” May
-
[36]
Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/
[Online]. Available: https://devblogs.microsoft.com/visualstudio/ introducing-visual-studio-intellicode/
-
[37]
ML-Enhanced Code Completion Improves Developer Productivity
M. Tabachnyk, “ML-Enhanced Code Completion Improves Developer Productivity.” [Online]. Available: https://research.google/ blog/ml-enhanced-code-completion-improves-developer-productivity/
-
[38]
Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,
J. Li, R. Huang, W. Li, K. Yao, and W. Tan, “Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models,” Jun. 2021, arXiv:2106.13928 [cs]. [Online]. Available: http://arxiv.org/abs/2106.13928
-
[39]
HuggingFace's Transformers: State-of-the-art Natural Language Processing
T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. v. Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “HuggingFace’s Transformers: State- of-the-art Natural Language Processing,” Jul. 2020, arXiv:1910.03771 [cs]. [Onli...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[40]
Student, “The Probable Error of a Mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908, publisher: [Oxford University Press, Biometrika Trust]. [Online]. Available: https://www.jstor.org/stable/2331554
-
[41]
Pythia: Ai-assisted code completion system,
A. Svyatkovskiy, Y . Zhao, S. Fu, and N. Sundaresan, “Pythia: Ai-assisted code completion system,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , ser. KDD ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 2727–2735. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1145...
-
[42]
Cscc: Simple, efficient, context sensitive code completion,
M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou, “Cscc: Simple, efficient, context sensitive code completion,” in Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ser. ICSME ’14. USA: IEEE Computer Society, 2014, p. 71–80. [Online]. Available: https://doi-org.tudelft.idm.oclc.org/10.1109/ ICSME.2014.29
work page 2014
-
[43]
Code Prediction by Feeding Trees to Transformers,
S. Kim, J. Zhao, Y . Tian, and S. Chandra, “Code Prediction by Feeding Trees to Transformers,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) , May 2021, pp. 150–162, iSSN: 1558-1225. [Online]. Available: https: //ieeexplore.ieee.org/document/9402114/
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.