Isolating language-coding from problem-solving: Benchmarking llms with pseudo- eval

Jiarong Wu, Songqiang Chen, Jialun Cao, Hau Ching Lo, Shing-Chi Cheung · 2025 · arXiv 2502.19149

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Demystifying the Silence of Correctness Bugs in PyTorch Compiler

cs.SE · 2026-04-09 · conditional · novelty 8.0

First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.

Large Language Models for Multilingual Code Intelligence: A Survey

cs.SE · 2026-04-27 · unverdicted · novelty 4.0

A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.

citing papers explorer

Showing 2 of 2 citing papers.

Demystifying the Silence of Correctness Bugs in PyTorch Compiler cs.SE · 2026-04-09 · conditional · none · ref 44
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Large Language Models for Multilingual Code Intelligence: A Survey cs.SE · 2026-04-27 · unverdicted · none · ref 10
A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.

Isolating language-coding from problem-solving: Benchmarking llms with pseudo- eval

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer