pith. sign in

arxiv: 2511.22277 · v2 · submitted 2025-11-27 · 💻 cs.LG

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Pith reviewed 2026-05-17 04:16 UTC · model grok-4.3

classification 💻 cs.LG
keywords LLM code generationdecoding strategiesconstraint optimizationtree searchMBPP benchmarkSQL-Spidersystematic tuning
0
0 comments X

The pith

TreeCoder models LLM decoding for code as an optimizable tree search over constraints to raise output accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TreeCoder as a framework that casts the decoding step in large language models for code generation as a tree search across possible program candidates. Decoding strategies and constraint functions for syntax, style, and execution are made into adjustable, first-class parts that can be tuned automatically with standard optimization methods. This moves the enforcement of correctness into the generation process itself instead of relying only on prompts. If the method works as described, models would produce more valid code on tasks such as Python function writing and SQL query construction without extra training. Reported tests on the MBPP and SQL-Spider sets show higher success rates for several open models compared with ordinary decoding.

Core claim

TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions such as style, syntax, and execution are treated as first-class, optimisable components that enable systematic exploration and automatic tuning using standard optimisation techniques, resulting in improved accuracy on the MBPP Python and SQL-Spider benchmarks for models including CodeLlama, Mistral, and DeepSeek.

What carries the argument

TreeCoder, a framework that represents decoding as a tree search over candidate programs and treats decoding strategies and constraint functions as first-class optimisable components.

Load-bearing premise

That turning decoding into a tree search with constraints as tunable components will deliver accuracy gains at acceptable computational cost and without losing flexibility.

What would settle it

If experiments on the same benchmarks show that tuned TreeCoder configurations produce no accuracy lift over unconstrained decoding or incur runtimes too high for practical use.

Figures

Figures reproduced from arXiv: 2511.22277 by Arindam Sharma, Cristina David, Henrijs Princis.

Figure 1
Figure 1. Figure 1: High-level illustration of our framework. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Expand without rollouts (left) and expand with rollouts (right). The [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Update step of all five decoding functions [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prune and move steps of all five decoding functions [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Partial order of constraints. These constraints can act individually or jointly during decoding, guiding the search toward syntactically valid, well-structured, and semantically meaningful programs (more details on the constraints and their use are given in the experimental evaluation in Section 7). Formally, they are composed under the product-of-experts formulation introduced earlier, where the overall c… view at source ↗
Figure 6
Figure 6. Figure 6: Constraint function examples: Syntax (checks syntactic validity), No-Comments (suppress comments), No-Explanations (encourage code over natural language explanations), Unit-Tests (validate generated code via assertions). function. As a result, TreeCoder enables complex constraint behaviours—such as enforcing syntax, style, and functional correctness simultaneously—without altering the underlying decoding l… view at source ↗
read the original abstract

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces TreeCoder, a framework that models LLM decoding for code generation as a tree search over candidate programs. Both decoding strategies and constraint functions (e.g., style, syntax, execution) are treated as first-class, optimizable components, enabling systematic exploration and automatic tuning via standard optimization methods. Experiments on the MBPP (Python) and SQL-Spider benchmarks are reported to show consistent accuracy gains across open-source models including CodeLlama, Mistral, and DeepSeek, often exceeding unconstrained baselines by considerable margins.

Significance. If the empirical results and efficiency claims hold, the work could offer a more principled alternative to prompt engineering for enforcing syntactic and semantic constraints in code generation. Treating constraints as tunable components within a tree-search formulation is a potentially generalizable idea. However, the abstract supplies no quantitative results, implementation details, or overhead measurements, so the practical significance remains difficult to assess from the provided text.

major comments (2)
  1. [Abstract] Abstract: The central claim that TreeCoder 'consistently improves accuracy' and 'outperforms their unconstrained baselines by considerable margins' is stated without any numerical results, baseline comparisons, statistical tests, or even the magnitude of the reported gains. This absence makes it impossible to evaluate whether the experimental evidence supports the claim.
  2. [Methods] Methods (tree-search formulation): No description is given of branching factor, pruning criteria, beam width, or the average number of LLM calls required per sample. Because the central claim requires that the search remains tractable on MBPP and Spider programs, the lack of these controls is load-bearing for the scalability and practicality assertions.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one concrete accuracy delta or table reference so readers can immediately gauge effect size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions have been made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that TreeCoder 'consistently improves accuracy' and 'outperforms their unconstrained baselines by considerable margins' is stated without any numerical results, baseline comparisons, statistical tests, or even the magnitude of the reported gains. This absence makes it impossible to evaluate whether the experimental evidence supports the claim.

    Authors: We agree that the abstract would benefit from including specific quantitative results to allow readers to immediately assess the strength of the claims. In the revised version, we have updated the abstract to report key accuracy improvements (e.g., absolute gains on MBPP and SQL-Spider for CodeLlama, Mistral, and DeepSeek) along with direct comparisons to the unconstrained baselines. revision: yes

  2. Referee: [Methods] Methods (tree-search formulation): No description is given of branching factor, pruning criteria, beam width, or the average number of LLM calls required per sample. Because the central claim requires that the search remains tractable on MBPP and Spider programs, the lack of these controls is load-bearing for the scalability and practicality assertions.

    Authors: The referee is correct that these implementation parameters are essential for evaluating tractability. We have added a dedicated paragraph in the Methods section specifying the branching factor, pruning criteria, beam width, and the measured average number of LLM calls per sample on both benchmarks, confirming that the search remains practical within the reported experimental setup. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces TreeCoder as an empirical framework that represents LLM decoding as tree search with first-class optimizable constraints and validates it through accuracy improvements on the MBPP and SQL-Spider benchmarks across multiple models. No mathematical derivation chain, fitted-parameter predictions, or self-citation load-bearing steps are present in the abstract or described methods; the central claims rest on external benchmark results rather than reducing to inputs by construction or renaming known patterns. The work is therefore self-contained against external benchmarks with no evidence of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters or axioms; the approach implicitly relies on the assumption that standard optimization techniques can effectively tune tree-search configurations and that constraints can be applied during decoding without major efficiency trade-offs.

pith-pipeline@v0.9.0 · 5455 in / 1116 out tokens · 47885 ms · 2026-05-17T04:16:18.615375+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search

    cs.SE 2026-04 unverdicted novelty 7.0

    AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.

  2. Diagnosing CFG Interpretation in LLMs

    cs.AI 2026-04 unverdicted novelty 6.0

    LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 2 Pith papers · 11 internal anchors

  1. [1]

    Kareem Ahmed, Kai-Wei Chang, and Guy Van den Broeck. 2024. Controllable Generation via Locally Constrained Resampling.arXiv preprint arXiv:2410.13111(2024)

  2. [2]

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv:1907.10902 [cs.LG] https://arxiv.org/abs/1907.10902

  3. [3]

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-Generation Hyperparameter Optimization Framework. InThe 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2623–2631

  4. [5]

    Zhamri Che Ani, Zauridah Abdul Hamid, and Nur Nazifa Zhamri. 2024. The Recent Trends of Research on GitHub Copilot: A Systematic Review. InComputing and Informatics, Nur Haryani Zakaria, Nur Suhaili Mansor, Husniza Husni, and Fathey Mohammed (Eds.). Springer Nature Singapore, Singapore, 355–366

  5. [6]

    Program Synthesis with Large Language Models

    Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRRabs/2108.07732 (2021). arXiv:2108.07732 https://arxiv.org/abs/2108.07732

  6. [7]

    Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. 2025. CRANE: Reasoning with constrained LLM generation.arXiv preprint arXiv:2502.09061(2025)

  7. [8]

    Yehoshua Bar-Hillel, Micha Perles, and Eli Shamir. 1961. On formal properties of simple phrase structure grammars. Sprachtypologie und Universalienforschung14 (1961), 143–172

  8. [9]

    Browne, Edward Powley, Daniel Whitehouse, Simon M

    Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search Methods.IEEE Transactions on Computational Intelligence and AI in Games4, 1 (2012), 1–43. doi:10.1109/TCIAIG.2012. 2186810

  9. [10]

    Godsill, and Eric Moulines

    Olivier Cappe, Simon J. Godsill, and Eric Moulines. 2007. An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo.Proc. IEEE95, 5 (2007), 899–924. doi:10.1109/JPROC.2007.893250

  10. [11]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  11. [12]

    Evaluating Large Language Models Trained on Code

    Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG] https://arxiv.org/abs/2107.03374

  12. [13]

    Xiang Chen, Jiacheng Xue, Xiaofei Xie, Caokai Liang, and Xiaolin Ju. 2025. A Systematic Literature Review on Neural Code Translation. arXiv:2505.07425 [cs.SE] https://arxiv.org/abs/2505.07425

  13. [14]

    Microsoft Corporation. 2025. Pyright: Static Type Checker for Python. https://github.com/microsoft/pyright. Accessed: 2025-11-13

  14. [15]

    DeepSeek-V3 Technical Report

    DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huaj...

  15. [16]

    2001.An Introduction to Sequential Monte Carlo Methods

    Arnaud Doucet, Nando de Freitas, and Neil Gordon. 2001.An Introduction to Sequential Monte Carlo Methods. Springer New York, New York, NY, 3–14. doi:10.1007/978-1-4757-3437-9_1

  16. [17]

    Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, and Robert West. 2024. Sketch-guided constrained decoding for boosting blackbox large language models without logit access.arXiv preprint arXiv:2401.09967(2024)

  17. [18]

    Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. 2024. Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. arXiv:2305.13971 [cs.CL] https://arxiv.org/abs/2305.13971

  18. [19]

    Emmanuel Anaya Gonzalez, Sairam Vaidya, Kanghee Park, Ruyi Ji, Taylor Berg-Kirkpatrick, and Loris D’Antoni

  19. [20]

    arXiv:2506.05754 [cs.AI] https://arxiv.org/abs/2506.05754

    Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective. arXiv:2506.05754 [cs.AI] https://arxiv.org/abs/2506.05754

  20. [21]

    Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. 2025. rStar- Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking. arXiv:2501.04519 [cs.CL] https: //arxiv.org/abs/2501.04519

  21. [22]

    Jianfei He, Shichao Sun, Xiaohua Jia, and Wenjie Li. 2023. Empirical Analysis of Beam Search Curse and Search Errors with Model Errors in Neural Machine Translation. InProceedings of the 24th Annual Conference of the European Association for Machine Translation, Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Sc...

  22. [23]

    Corentin Hokamp and Qun Liu. 2017. Lexically Constrained Decoding for Neural Machine Translation via Grid Beam Search. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). 1535–1546

  23. [24]

    Liang Huang, Kai Zhao, and Mingbo Ma. 2018. When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size). arXiv:1809.00069 [cs.CL] https://arxiv.org/abs/1809.00069

  24. [25]

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.06825...

  25. [26]

    Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando De Freitas, Koray Kavukcuoglu, and Oriol Vinyals

    Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

  26. [28]

    Syntactic and semantic control of large language models via sequential

    Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo. arXiv:2504.13139 [cs.CL] https://arxiv.org/abs/2504.13139

  27. [29]

    Lew, Tim Vieira, and Timothy J

    João Loula, Benjamin LeBrun, Li Du, Ben Lipkin, Clemente Pasti, Gabriel Grand, Tianyu Liu, Yahya Emara, Marjorie Freedman, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Alexander K. Lew, Tim Vieira, and Timothy J. O’Donnell

  28. [30]

    InInternational Conference on Learning Representations (ICLR)

    Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo. InInternational Conference on Learning Representations (ICLR)

  29. [31]

    Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Qingsong Lv, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Xin Su, and Zifei Shan. 2025. LSR-MCTS: Alleviating Long Range Dependency in Code Generation. arXiv:2504.07433 [cs.CL] https://arxiv.org/abs/2504.07433

  30. [32]

    Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2021. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, An...

  31. [33]

    Clara Meister, Tim Vieira, and Ryan Cotterell. 2021. If beam search is the answer, what was the question? arXiv:2010.02650 [cs.CL] https://arxiv.org/abs/2010.02650 TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation 111:23

  32. [34]

    Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. 2024. Grammar-Aligned Decoding. arXiv:2405.21047 [cs.AI] https://arxiv.org/abs/2405.21047

  33. [35]

    Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. 2025. Grammar-aligned decoding. InProceedings of the 38th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’24). Curran Associates Inc., Red Hook, NY, USA, Article 774, 22 pages

  34. [36]

    Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani

  35. [37]

    arXiv:2201.11227 [cs.LG] https: //arxiv.org/abs/2201.11227

    Synchromesh: Reliable code generation from pre-trained language models. arXiv:2201.11227 [cs.LG] https: //arxiv.org/abs/2201.11227

  36. [38]

    Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-Trained Language Models.arXiv preprint arXiv:2204.02805(2022)

  37. [39]

    Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, and Akash Srivastava. 2025. Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods. arXiv:2502.01618 [cs.LG] https://arxiv.org/abs/2502.01618

  38. [40]

    Christopher D. Rosin. 2011. Multi-armed bandits with episode context.Annals of Mathematics and Artificial Intelligence 61, 3 (2011), 203–230. doi:10.1007/s10472-011-9258-6

  39. [41]

    Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nico...

  40. [42]

    Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. arXiv:2109.05093 [cs.CL] https://arxiv.org/abs/2109.05093

  41. [43]

    Torsten Scholak, Nathan Schücher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Autoregressive Decoding from Language Models. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7517–7535

  42. [44]

    Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. 2025. IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking. arXiv:2410.07295 [cs.SE] https://arxiv.org/abs/2410. 07295

  43. [45]

    Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, and Gagandeep Singh. 2024. Improving LLM Code Generation with Grammar Augmentation.CoRRabs/2403.01632 (2024). https://doi.org/10.48550/arXiv.2403.01632

  44. [46]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush

  45. [47]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771 [cs.CL] https: //arxiv.org/abs/1910.03771

  46. [48]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  47. [49]

    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2019. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv:1809.08887 [cs.CL] https://arxiv.org/abs/1809.08887

  48. [50]

    Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, and Hong Mei. 2023. Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models. arXiv:2309.02772 [cs.SE] https://arxiv.org/abs/2309.02772