First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Isolating language-coding from problem-solving: Benchmarking llms with pseudo- eval
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.SE 2years
2026 2roles
background 1polarities
background 1representative citing papers
A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.
citing papers explorer
-
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
-
Large Language Models for Multilingual Code Intelligence: A Survey
A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.