pith. machine review for the scientific record. sign in

arxiv: 2503.17181 · v3 · submitted 2025-03-21 · 💻 cs.SE · cs.AI

Recognition: unknown

A Study of LLMs' Preferences for Libraries and Programming Languages

Authors on Pith no claims yet
classification 💻 cs.SE cs.AI
keywords llmslanguagelibrariesprogrammingcasescodelanguageslibrary
0
0 comments X
read the original abstract

Despite the rapid progress of large language models (LLMs) in code generation, existing evaluations focus on functional correctness or syntactic validity, overlooking how LLMs make critical design choices such as which library or programming language to use. To fill this gap, we perform the first empirical study of LLMs' preferences for libraries and programming languages when generating code, covering eight diverse LLMs. We observe a strong tendency to overuse widely adopted libraries such as NumPy; in up to 45% of cases, this usage is not required and deviates from the ground-truth solutions. The LLMs we study also show a significant preference toward Python as their default language. For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used once. These results highlight how LLMs prioritise familiarity and popularity over suitability and task-specific optimality; underscoring the need for targeted fine-tuning, data diversification, and evaluation benchmarks that explicitly measure language and library selection fidelity.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

    cs.AI 2026-05 conditional novelty 8.0

    FormalRewardBench is the first benchmark for reward models in formal theorem proving, consisting of 250 Lean 4 preference pairs that show frontier LLMs scoring 59.8% while specialized provers score only 24.4%.

  2. The software space of science

    cs.DL 2026-04 unverdicted novelty 7.0

    A network analysis of software mentions in 1.3 million papers identifies 520 tools in eight communities and shows disciplines maintain distinct, stable tool portfolios that are crystallizing toward common sets.

  3. CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

    cs.SE 2026-04 accept novelty 7.0

    CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.

  4. FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.