A Study of LLMs' Preferences for Libraries and Programming Languages
Pith reviewed 2026-05-22 22:47 UTC · model grok-4.3
The pith
Large language models prefer popular libraries like NumPy and default to Python even when other choices are more suitable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study reveals that LLMs exhibit a strong tendency to overuse widely adopted libraries such as NumPy, with this usage being unnecessary in up to 45% of cases and deviating from ground-truth solutions. The models also demonstrate a significant preference for Python as the default language, selecting it in 58% of high-performance project initialization tasks where it is not optimal, and never choosing Rust in those cases. This highlights how LLMs prioritize familiarity and popularity over suitability and task-specific optimality.
What carries the argument
Empirical measurement of library and language choices in generated code, scored against ground-truth solutions across multiple tasks and eight LLMs.
If this is right
- Generated code may be less efficient in performance-critical settings because of language and library biases.
- Targeted fine-tuning and data diversification could reduce unnecessary selection of popular options.
- Evaluation benchmarks for code generation need to measure language and library selection fidelity in addition to correctness.
- Existing correctness-focused tests may overlook design choices that affect real-world code quality.
Where Pith is reading between the lines
- Prompt engineering that explicitly requests consideration of alternative languages might mitigate the observed defaults.
- The bias could slow adoption of efficient languages in AI-assisted projects if not addressed.
- Extending the evaluation to additional domains and languages would test whether the preference pattern holds more broadly.
Load-bearing premise
The ground-truth solutions used for comparison represent the required or optimal choices for the evaluated tasks.
What would settle it
A new task set in which ground-truth solutions require a less popular but more suitable library or language, with LLMs still selecting popular alternatives at the same rates.
Figures
read the original abstract
Despite the rapid progress of large language models (LLMs) in code generation, existing evaluations focus on functional correctness or syntactic validity, overlooking how LLMs make critical design choices such as which library or programming language to use. To fill this gap, we perform the first empirical study of LLMs' preferences for libraries and programming languages when generating code, covering eight diverse LLMs. We observe a strong tendency to overuse widely adopted libraries such as NumPy; in up to 45% of cases, this usage is not required and deviates from the ground-truth solutions. The LLMs we study also show a significant preference toward Python as their default language. For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used once. These results highlight how LLMs prioritise familiarity and popularity over suitability and task-specific optimality; underscoring the need for targeted fine-tuning, data diversification, and evaluation benchmarks that explicitly measure language and library selection fidelity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the first empirical study of library and language selection preferences across eight LLMs in code-generation tasks. It claims that models overuse popular libraries (e.g., NumPy required in ground-truth solutions but used unnecessarily in up to 45% of cases) and default to Python even for high-performance initialization tasks (58% of cases, with Rust never selected), concluding that LLMs systematically favor familiarity and popularity over task-specific optimality and suitability.
Significance. If the central empirical observations hold after addressing the noted methodological gap, the work fills a clear gap in LLM code-generation evaluation by moving beyond functional correctness to design-choice fidelity. The multi-model coverage and concrete deviation percentages provide a useful baseline for future benchmarks and fine-tuning efforts aimed at diversifying language and library usage. The observational design is appropriate for the question posed.
major comments (2)
- [Abstract / Results] Abstract and results sections: The central claim that observed deviations demonstrate prioritization of familiarity over suitability requires that the ground-truth solutions are in fact the required or optimal choices for the tasks. No expert validation, performance benchmarking, or alternative-optimality metric is supplied to rule out the possibility that the LLMs are simply returning other valid (if different) solutions; this assumption is load-bearing for the interpretation.
- [Methodology] Methodology (task and ground-truth definition): Sample sizes, task definitions, and the precise criteria used to label a library or language choice as “not required” or “not optimal” are not detailed enough in the provided abstract to allow independent assessment of whether the 45% and 58% figures support the prioritization conclusion.
minor comments (1)
- [Abstract] Abstract: The sentence beginning “These results highlight…” contains a comma splice before “underscoring”; a semicolon or rephrasing would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our methodology and claims.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results sections: The central claim that observed deviations demonstrate prioritization of familiarity over suitability requires that the ground-truth solutions are in fact the required or optimal choices for the tasks. No expert validation, performance benchmarking, or alternative-optimality metric is supplied to rule out the possibility that the LLMs are simply returning other valid (if different) solutions; this assumption is load-bearing for the interpretation.
Authors: We agree this is a substantive point. The ground-truth solutions were selected based on standard reference implementations that minimize dependencies or optimize for the task constraints (e.g., built-in functions only for library tasks; performance-oriented languages for initialization). However, to make this assumption more robust, we will add a dedicated paragraph in the methodology section describing the task construction process, include references to established performance comparisons for the language tasks, and expand the limitations section to explicitly discuss alternative valid solutions and their potential impact on the reported percentages. revision: yes
-
Referee: [Methodology] Methodology (task and ground-truth definition): Sample sizes, task definitions, and the precise criteria used to label a library or language choice as “not required” or “not optimal” are not detailed enough in the provided abstract to allow independent assessment of whether the 45% and 58% figures support the prioritization conclusion.
Authors: The full manuscript details these elements in Section 3 (Methodology), including sample sizes (100 tasks per library category across 8 models, 50 tasks for language preference), task definitions (e.g., data processing without external libs, high-performance init), and labeling criteria (a choice is labeled 'not required' if the reference solution completes the task using only language builtins or standard library, with no external imports needed). We will revise the abstract to include a concise summary of these details and add explicit cross-references from the results to the methodology section. revision: yes
Circularity Check
No circularity: purely observational empirical comparison with no derivations or self-referential predictions.
full rationale
This paper performs an empirical study by generating code with LLMs and directly comparing library/language choices against provided ground-truth solutions. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the derivation of the central claims. The analysis is self-contained as direct observation of outputs versus external benchmarks (ground-truth), with no reduction of results to the paper's own definitions or prior author work by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected coding tasks and ground-truth solutions are representative of real-world requirements and optimality.
Forward citations
Cited by 9 Pith papers
-
FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models
FormalRewardBench is the first benchmark for reward models in formal theorem proving, consisting of 250 Lean 4 preference pairs that show frontier LLMs scoring 59.8% while specialized provers score only 24.4%.
-
ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences
ReplicatorBench evaluates LLM agents on replicating social and behavioral science claims across retrieval, computation, and interpretation stages, finding strength in experiment execution but weakness in resource retrieval.
-
The software space of science
A network analysis of software mentions in 1.3 million papers identifies 520 tools in eight communities and shows disciplines maintain distinct, stable tool portfolios that are crystallizing toward common sets.
-
CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation
CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.
-
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%,...
-
Task Abstention for Large Language Models in Code Generation
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
-
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
-
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
Empirical analysis of AI refactoring PRs shows quality attribute improvements in 22.5% of cases with new Pylint issues in 24.17% and Bandit findings in 4.7%, yet 73.5% developer acceptance.
Reference graph
Works this paper leans on
-
[1]
Mehmet Akhoroz and Caglar Yildirim. 2025. Conversational AI as a Coding Assistant: Understanding Programmers’ Interactions with and Expectations from Large Language Models for Coding. (Mar. 14, 2025). arXiv: 2503.16508 [cs]. Retrieved July 18, 2025 from http://arxiv.org/abs/2503.16508. Pre- published
-
[2]
Andrew Peng et al. 2023. GPT-3.5 Turbo fine-tuning and API updates. (Aug. 22, 2023). Retrieved Dec. 17, 2024 from https://openai.com/index/gpt-3-5-turbo-fi ne-tuning-and-api-updates/
work page 2023
-
[3]
Anthropic. 2024. Claude 3 Model Card. (Oct. 22, 2024). Retrieved Jan. 22, 2025 from https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Mo del-Card.pdf
work page 2024
-
[4]
Ben Athiwaratkun et al. 2023. Multi-lingual evaluation of code generation models. InProc. ICLR. arXiv: 2210.14868. doi:10.48550/arXiv.2210.14868
-
[5]
Jacob Austin et al. 2021. Program Synthesis with Large Language Models. (Aug. 16, 2021). arXiv: 2108.07732. Retrieved Oct. 18, 2024 from http://arxiv.org /abs/2108.07732. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [6]
-
[7]
William Bugden and Ayman Alahmar. 2022. The Safety and Performance of Prominent Programming Languages.International Journal of Software Engi- neering and Knowledge Engineering, 32, 05, (May 2022), 713–744. doi:10.1142 /S0218194022500231
work page 2022
-
[8]
Liguo Chen et al. 2024. A Survey on Evaluating Large Language Models in Code Generation Tasks. Version 1.Journal of computer science and technology. doi:10.48550/ARXIV.2408.16498
-
[9]
Mark Chen et al. 2021. Evaluating Large Language Models Trained on Code. (July 14, 2021). arXiv: 2107.03374. Retrieved Nov. 18, 2024 from http://arxiv.org /abs/2107.03374. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [10]
- [11]
- [12]
-
[13]
Competitive programming with AlphaCode
2024. Competitive programming with AlphaCode. Google DeepMind. (Dec. 17, 2024). Retrieved Dec. 18, 2024 from https://deepmind.google/discover/blog/co mpetitive-programming-with-alphacode/
work page 2024
-
[14]
Manuel Costanzo et al. 2021. Performance vs programming effort between Rust and C on multicore architectures: case study in n-body. InProc. CLEI, 1–10. doi:10.1109/CLEI53233.2021.9640225
-
[15]
DeepSeek-AI et al. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. (Jan. 5, 2024). arXiv: 2401.02954 [cs]. Retrieved Dec. 17, 2024 from http://arxiv.org/abs/2401.02954. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Kaustubh Dhole et al. 2023. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation.Northern European Journal of Language Tech- nology, 9. Leon Derczynski, (Ed.) doi:10.3384/nejlt.2000-1533.2023.4725
-
[17]
Benedetta Donato et al. 2025. Studying How Configurations Impact Code Generation in LLMs: the Case of ChatGPT. InThe Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension. arXiv, (Feb. 7, 2025). arXiv: 2502.17450[cs]. doi:10.48550/arXiv.2502.17450
-
[18]
Extended Syntax | Markdown Guide
2025. Extended Syntax | Markdown Guide. Retrieved Feb. 7, 2025 from https: //www.markdownguide.org/extended-syntax/
work page 2025
-
[19]
Carlo A. Furia et al. 2024. Towards Causal Analysis of Empirical Software Engi- neering Data: The Impact of Programming Languages on Coding Competitions. ACM Transactions on Software Engineering and Methodology, 33, 1, (Jan. 31, 2024), 1–35. arXiv: 2301.07524[cs]. doi:10.1145/3611667
-
[20]
Isabel O. Gallegos et al. 2024. Bias and Fairness in Large Language Models: A Survey.Computational Linguistics, 50, 3, (Sept. 1, 2024), 1097–1179. doi:10.1162 /coli_a_00524
work page 2024
-
[21]
Yulia Gavrilova. 2023. Pros and Cons of Python. Pros and Cons of Python. (Oct. 31, 2023). Retrieved Dec. 20, 2024 from https://serokell.io/blog/python-pr os-and-cons
work page 2023
-
[22]
Aaron Grattafiori et al. 2024. The Llama 3 Herd of Models. (Nov. 23, 2024). arXiv: 2407.21783 [cs]. Retrieved Dec. 17, 2024 from http://arxiv.org/abs/2407.21783. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Sam Gross. [n. d.] PEP 703 – Making the Global Interpreter Lock Optional in CPython | peps.python.org. Python Enhancement Proposals (PEPs). Retrieved May 29, 2025 from https://peps.python.org/pep-0703/
work page 2025
- [24]
-
[25]
Yufei Guo et al. 2024. Bias in Large Language Models: Origin, Evaluation, and Mitigation. Version 1. (Nov. 16, 2024). arXiv: 2411.10915 [cs]. Retrieved July 9, 2025 from http://arxiv.org/abs/2411.10915. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [26]
- [27]
-
[28]
Dan Hendrycks et al. 2021. Measuring coding challenge competence with APPS. InProc. NeurIPS Datasets and Benchmarks. arXiv: 2105.09938. doi:10.48550/arXi v.2105.09938
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxi 2021
- [29]
-
[30]
Binyuan Hui et al. 2024. Qwen2.5-Coder Technical Report. (Nov. 12, 2024). arXiv: 2409.12186 [cs]. Retrieved Dec. 17, 2024 from http://arxiv.org/abs/2409.12186. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Paul Jansen. 2025. TIOBE Index. TIOBE. Retrieved July 15, 2025 from https://w ww.tiobe.com/tiobe-index/
work page 2025
-
[32]
Albert Q. Jiang et al. 2023. Mistral 7B. (Oct. 10, 2023). arXiv: 2310.06825 [cs]. Retrieved Dec. 17, 2024 from http://arxiv.org/abs/2310.06825. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[33]
Juyong Jiang et al. 2024. A Survey on Large Language Models for Code Gener- ation. (June 1, 2024). arXiv: 2406.00515. Retrieved Oct. 10, 2024 from http://arxi v.org/abs/2406.00515. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez et al. 2024. SWE-bench: can language models resolve real- world github issues? InProc. ICLR. arXiv: 2310.06770. doi:10.48550/arXiv.2310 .06770
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310 2024
-
[35]
Erik Jones and Jacob Steinhardt. 2022. Capturing failures of large language models via human cognitive biases. InProceedings of the 36th International Con- ference on Neural Information Processing Systems(NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, (Nov. 28, 2022), 11785–11799.isbn: 978-1-71387-108-8. 11 Twist et al
work page 2022
-
[36]
Dawid Karczewski. 2021. Python vs C++: Technology Comparison. Retrieved Feb. 18, 2025 from https://www.ideamotive.co/blog/python-vs-cpp-technolog y-comparison
work page 2021
-
[37]
Kendall and Jean Dickinson Gibbons
Maurice G. Kendall and Jean Dickinson Gibbons. 1990. Rank Correlation Meth- ods. (5th ed ed.). 1 online resource (vii, 260 pages) vols. Oxford University Press, New York, NY. https://archive.org/details/rankcorrelationm0000kend
work page 1990
- [38]
-
[39]
Takeshi Kojima et al. 2022. Large language models are zero-shot reasoners. In Proceedings of the 36th International Conference on Neural Information Processing Systems(NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, (Nov. 28, 2022), 22199–22213.isbn: 978-1-71387-108-8. Retrieved Mar. 6, 2025 from
work page 2022
-
[40]
Adrian Kuhn and Robert DeLine. 2012. On Designing Better Tools for Learning APIs. (June 2012). arXiv: 1402.1188 [cs]. Retrieved July 9, 2025 from http://arx iv.org/abs/1402.1188
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[41]
Yuhang Lai et al. 2022. DS-1000: a natural and reliable benchmark for data science code generation. InProc. ICML. arXiv: 2211.11501. doi:10.48550/arXiv.2 211.11501
-
[42]
Decrypt / Jose Antonio Lanz. 2023. Stability AI CEO: There Will Be No (Human) Programmers in Five Years. Decrypt. (July 3, 2023). Retrieved Nov. 13, 2024 from https://decrypt.co/147191/no-human-programmers-five-years-ai-stabi lity-ceo
work page 2023
-
[43]
Enrique Larios-Vargas et al. 2020. Selecting third-party libraries: the practition- ers’ perspective. InProc. ESEC/FSE, 245–256. doi:10.1145/3368089.3409711
- [44]
-
[45]
Junlong Li et al. 2024. Dissecting Human and LLM Preferences. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Lun-Wei Ku et al., (Eds.) Association for Computational Linguistics, Bangkok, Thailand, (Aug. 2024), 1790–1811. doi:10.18653/v1/2024.a cl-long.99
-
[46]
Yujia Li et al. 2022. Competition-level code generation with AlphaCode.Science, 378, 6624, 1092–1097. eprint: https://www.science.org/doi/pdf/10.1126/science .abq1158. doi:10.1126/science.abq1158
-
[47]
Jenny T. Liang et al. 2023. A Qualitative Study on the Implementation Design Decisions of Developers. In45th {IEEE/ACM} International Conference on Soft- ware Engineering, {ICSE} 2023, Melbourne, Australia, May 14-20, 2023. arXiv, (Jan. 24, 2023). arXiv: 2301.09789[cs]. doi:10.48550/arXiv.2301.09789
-
[48]
Mingwei Liu et al. 2023. CodeGen4Libs: A Two-Stage Approach for Library- Oriented Code Generation. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Luxembourg, Luxembourg, (Sept. 11, 2023), 434–445.isbn: 9798350329964. doi:10.1109/ASE56229.2023.001 59
-
[49]
Yan Liu et al. 2023. Uncovering and Quantifying Social Biases in Code Genera- tion. InAdvances in Neural Information Processing Systems 36: Annual Confer- ence on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. arXiv, (May 24, 2023). arXiv: 2305.15377. doi:10.48550/arXiv.2305.15377
-
[50]
Zexiong Ma et al. 2024. Compositional API Recommendation for Library- Oriented Code Generation. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension(ICPC ’24). Association for Computing Machinery, New York, NY, USA, (June 13, 2024), 87–98.isbn: 9798400705861. doi:10.1145/3643916.3644403
- [51]
-
[52]
Vahid Majdinasab et al. 2025. Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code.ACM Transactions on Software Engineering and Methodology, 34, 4, (May 31, 2025), 1–46. arXiv: 2402.09299 [cs]. doi:10.1145/3702980
-
[53]
Ahmad Mohsin et al. 2024. Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs. (June 18, 2024). arXiv: 2406.12513. Retrieved Oct. 24, 2024 from http://arxiv.org/abs/2406.12513. Pre-published
- [54]
-
[55]
Humza Naveed et al. 2024. A Comprehensive Overview of Large Language Models. (Oct. 17, 2024). arXiv: 2307.06435. Retrieved Nov. 18, 2024 from http: //arxiv.org/abs/2307.06435. Pre-published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Need of Dockers and Kubernetes in Modern Software Development - GeakMinds
2024. Need of Dockers and Kubernetes in Modern Software Development - GeakMinds. (May 15, 2024). Retrieved Nov. 18, 2024 from https://geakminds.co m/need-of-dockers-and-kubernetes-in-modern-software-development/
work page 2024
- [57]
-
[58]
Muhammed Nihal. 2024. The Race to Zero Latency: How to Optimize Code for High-Frequency Trading Quant Firms. Medium. (Aug. 13, 2024). Retrieved May 29, 2025 from https://medium.com/@nihal.143/the-race-to-zero-latency- how-to-optimize-code-for-high-frequency-trading-quant-firms-362f828f9 c16
work page 2024
-
[59]
Mbithe Nzomo. 2025. Absolute vs Relative Imports in Python – Real Python. Retrieved Feb. 5, 2025 from https://realpython.com/absolute-vs-relative-pytho n-imports/
work page 2025
-
[60]
OpenAI et al. 2024. GPT-4 Technical Report. (Mar. 4, 2024). arXiv: 2303.08774 [cs]. Retrieved Apr. 9, 2024 from http : / / arxiv . org / abs / 2303 . 08774. Pre- published
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
Arkil Patel et al. 2024. Evaluating In-Context Learning of Libraries for Code Generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), {NAACL} 2024, Mexico City, Mexico, June 16-21, 2024. arXiv, (Apr. 4, 2024). arXiv: 2311.09635. do...
-
[62]
Debalina Ghosh Paul et al. 2024. Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review. In2024 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE Computer Society, (July 1, 2024), 87–94.isbn: 9798350365054. doi:10.1109/AITest62860.2024.00019
-
[63]
Max Peeperkorn et al. 2024. Is Temperature the Creativity Parameter of Large Language Models? InInternational Conference on Computational Creativity. arXiv, (May 1, 2024). arXiv: 2405.00492[cs]. doi:10.48550/arXiv.2405.00492
- [64]
-
[65]
Matthew Renze and Erhan Guven. 2024. The effect of sampling temperature on problem solving in large language models. InFindings of EMNLP. arXiv: 2402.05201. doi:10.48550/arXiv.2402.05201
-
[66]
June Sallou et al. 2024. Breaking the silence: the threats of using LLMs in software engineering. InProc. ICSE-NIER, 102–106. doi:10.1145/3639476.363976 4
-
[67]
Matthew Smith. 2025. AI Vibe Coding: Engineers’ Secret to Fast Development - IEEE Spectrum. IEEE Spectrum. Retrieved May 29, 2025 from https://spectrum .ieee.org/vibe-coding
work page 2025
-
[68]
2016.Software Engineering, Global Edition
Ian Somerville. 2016.Software Engineering, Global Edition. Pearson Education. isbn: 978-1-292-09614-8
work page 2016
-
[69]
GitHub Staff. 2024. Octoverse: AI leads Python to top language as the number of global developers surges. The GitHub Blog. (Oct. 29, 2024). Retrieved Feb. 10, 2025 from https://github.blog/news-insights/octoverse/octoverse-2024/
work page 2024
-
[70]
Kyle Daigle Staff GitHub. 2024. Survey: The AI wave continues to grow on software development teams. The GitHub Blog. (Aug. 20, 2024). Retrieved Nov. 18, 2024 from https://github.blog/news-insights/research/survey-ai-wave -grows/
work page 2024
-
[71]
Minaoar Hossain Tanzil et al. 2024. "How do people decide?": A Model for Soft- ware Library Selection. InProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering. (Apr. 14, 2024), 1–12. arXiv: 2403.16245[cs]. doi:10.1145/3641822.3641865
-
[73]
2025. Top PyPI Packages. Retrieved Feb. 8, 2025 from https://hugovk.github.io /top-pypi-packages/
work page 2025
-
[74]
Usage Statistics and Market Share of Web Servers, November 2024
2024. Usage Statistics and Market Share of Web Servers, November 2024. Re- trieved Nov. 18, 2024 from https://w3techs.com/technologies/overview/web_s erver
work page 2024
-
[75]
Chaozheng Wang et al. 2024. A systematic evaluation of large code models in api suggestion: when, which, and how. InProc. ASE. arXiv: 2409.13178. doi:10.48550/arXiv.2409.13178
- [76]
- [77]
-
[78]
Ruotong Wang et al. 2024. Investigating and designing for trust in ai-powered code generation tools. InProc. ACM FAccT. arXiv: 2305.11248. doi:10.48550/ar Xiv.2305.11248
work page doi:10.48550/ar 2024
-
[79]
Zhiruo Wang et al. 2023. Execution-based evaluation for open-domain code generation. InFindings of EMNLP. arXiv: 2212.10481. doi:10.48550/arXiv.2212.1 0481
-
[80]
Jason Wei et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural 12 A Study of LLMs’ Preferences for Libraries and Programming Languages Information Processing Systems(NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, (Nov. 28, 2022), 24824–24837.isbn: 978-1-7138...
work page 2022
-
[81]
Cheng Xu et al. 2024. Benchmark Data Contamination of Large Language Models: A Survey. (June 6, 2024). arXiv: 2406.04244[cs]. Retrieved July 7, 2025 from http://arxiv.org/abs/2406.04244. Pre-published
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.