A Study of LLMs' Preferences for Libraries and Programming Languages

Lukas Twist , Jie M. Zhang , Mark Harman , Don Syme , Joost Noppen , Helen Yannakoudakis , Detlef Nauck

Authors on Pith no claims yet

classification 💻 cs.SE cs.AI

keywords llmslanguagelibrariesprogrammingcasescodelanguageslibrary

read the original abstract

Despite the rapid progress of large language models (LLMs) in code generation, existing evaluations focus on functional correctness or syntactic validity, overlooking how LLMs make critical design choices such as which library or programming language to use. To fill this gap, we perform the first empirical study of LLMs' preferences for libraries and programming languages when generating code, covering eight diverse LLMs. We observe a strong tendency to overuse widely adopted libraries such as NumPy; in up to 45% of cases, this usage is not required and deviates from the ground-truth solutions. The LLMs we study also show a significant preference toward Python as their default language. For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used once. These results highlight how LLMs prioritise familiarity and popularity over suitability and task-specific optimality; underscoring the need for targeted fine-tuning, data diversification, and evaluation benchmarks that explicitly measure language and library selection fidelity.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models
cs.AI 2026-05 conditional novelty 8.0

FormalRewardBench is the first benchmark for reward models in formal theorem proving, consisting of 250 Lean 4 preference pairs that show frontier LLMs scoring 59.8% while specialized provers score only 24.4%.
The software space of science
cs.DL 2026-04 unverdicted novelty 7.0

A network analysis of software mentions in 1.3 million papers identifies 520 tools in eight communities and shows disciplines maintain distinct, stable tool portfolios that are crystallizing toward common sets.
CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation
cs.SE 2026-04 accept novelty 7.0

CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
cs.CL 2026-05 unverdicted novelty 6.0

FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.