pith. sign in

arxiv: 2502.15835 · v5 · pith:MPPBMNZInew · submitted 2025-02-20 · 💻 cs.CL · cs.AI· cs.SE

Pragmatic Reasoning improves LLM Code Generation

classification 💻 cs.CL cs.AIcs.SE
keywords pragmaticcodersainstructionsreasoningalternativecandidatescodegeneration
0
0 comments X
read the original abstract

Pragmatic reasoning helps interlocutors infer intended meaning from ambiguous or underspecified messages by considering shared context and counterfactual alternatives. Similar challenges arise in natural language-to-code generation, where user instructions often admit multiple plausible candidate programs. However, direct RSA-style inference is difficult because it requires probability estimation over large spaces of programs and alternative instructions. We propose CodeRSA, an RSA-motivated reranking method that makes pragmatic reasoning tractable through local pragmatic contests among sampled code candidates. CodeRSA constructs candidate-induced alternative instructions and estimates which candidates are most distinctively supported by the original instruction, avoiding global normalization over the full program-instruction space. We evaluate CodeRSA on HumanEval+, MBPP+, and BigCodeBench using four open-weight instruction-following models. CodeRSA achieves the strongest average accuracy in 10 of 12 model-benchmark settings and remains competitive in the remaining cases. Further analyses show that its gains come from combining local pairwise pragmatic comparison with broader global support, suggesting a scalable direction for language-to-code reranking under natural-language uncertainty.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers

    cs.AI 2026-05 unverdicted novelty 7.0

    CP-SynC uses coordinated LLM agents to generate, validate via synthesized checkers, and select MiniZinc models from natural language, substantially outperforming baselines on a 100-problem benchmark.