Canonical reference

Title resolution pending

Ruchir Puri, David S · 2021 · arXiv 2105.12655

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 80% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

Show Your Work: Scratchpads for Intermediate Computation with Language Models

cs.LG · 2021-11-30 · unverdicted · novelty 8.0

Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.

MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

cs.SE · 2026-05-10 · unverdicted · novelty 7.0 · 2 refs

MACAA is a belief-revision multi-agent framework for training-free code authorship verification that reports 89.15% F1 on same-language benchmarks and 80% on cross-language pairs while outperforming baselines.

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

cs.SE · 2026-04-09 · unverdicted · novelty 7.0

LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.

FormulaCode: Evaluating Agentic Optimization on Large Codebases

cs.SE · 2026-03-16 · unverdicted · novelty 7.0

FormulaCode is a new benchmark for repository-level LLM agent optimization using 957 mined bottlenecks, expert patches, and multi-objective metrics from real scientific Python repositories.

NESA: Relational Neuro-Symbolic Static Program Analysis

cs.PL · 2024-12-18 · conditional · novelty 7.0

NESA presents a neuro-symbolic framework that decomposes static analyses into policy-defined sub-problems solved by parsers and LLMs to enable compilation-free customizable analysis with reduced hallucinations.

CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.

Fine-Tuning Code Language Models to Detect Cross-Language Bugs

cs.SE · 2025-07-29 · conditional · novelty 6.0

Fine-tuning 13 CodeLMs on a constructed CLB dataset with nine interaction types improves detection, with UniXcoder-base reaching F1 0.7407 and small models outperforming large ones.

SafeTrans: LLM-assisted Transpilation from C to Rust

cs.CR · 2025-05-15 · accept · novelty 6.0

SafeTrans achieves up to 80% successful C-to-Rust translations via LLM iterative repair on 2653 programs and two real projects, with some C vulnerabilities carrying over to the Rust output.

Neural Code Translation of Legacy Code: APL to C#

cs.SE · 2026-05-12 · unverdicted · novelty 5.0

Guided LLM strategies with custom datasets and execution-based verification enable functional APL-to-C# translation across a range of program complexities.

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

cs.SE · 2026-05-04 · unverdicted · novelty 5.0 · 2 refs

A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

cs.SE · 2026-04-28 · unverdicted · novelty 5.0

Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

cs.LG · 2026-03-03 · unverdicted · novelty 5.0

Contrastive Prompt Tuning raises code accuracy on two of three tested models but produces inconsistent energy-efficiency gains that depend on model, language, and task.

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

cs.SE · 2024-09-30 · unverdicted · novelty 5.0

TransAgent improves LLM code translation by up to 33.3% via multi-agent fine-grained execution alignment on a new benchmark of recent tasks.

Self-Refine: Iterative Refinement with Self-Feedback

cs.CL · 2023-03-30 · unverdicted · novelty 5.0

Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.

Large Language Models for Multilingual Code Intelligence: A Survey

cs.SE · 2026-04-27 · unverdicted · novelty 4.0

A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

cs.SE · 2026-04-17 · unverdicted · novelty 4.0

LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?

cs.SE · 2024-12-05 · unverdicted · novelty 4.0

NL specifications alone do not improve LLM code translation performance, but combining them with source code yields gains in select language pairs with no overall consistent benefit.

citing papers explorer

Showing 18 of 18 citing papers.

Show Your Work: Scratchpads for Intermediate Computation with Language Models cs.LG · 2021-11-30 · unverdicted · none · ref 13
Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.
MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification cs.SE · 2026-05-10 · unverdicted · none · ref 1 · 2 links
MACAA is a belief-revision multi-agent framework for training-free code authorship verification that reports 89.15% F1 on same-language benchmarks and 80% on cross-language pairs while outperforming baselines.
Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation cs.SE · 2026-04-09 · unverdicted · none · ref 57
LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
FormulaCode: Evaluating Agentic Optimization on Large Codebases cs.SE · 2026-03-16 · unverdicted · none · ref 1
FormulaCode is a new benchmark for repository-level LLM agent optimization using 957 mined bottlenecks, expert patches, and multi-objective metrics from real scientific Python repositories.
NESA: Relational Neuro-Symbolic Static Program Analysis cs.PL · 2024-12-18 · conditional · none · ref 34
NESA presents a neuro-symbolic framework that decomposes static analyses into policy-defined sub-problems solved by parsers and LLMs to enable compilation-free customizable analysis with reduced hallucinations.
CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora cs.SE · 2026-04-20 · unverdicted · none · ref 51
CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.
Fine-Tuning Code Language Models to Detect Cross-Language Bugs cs.SE · 2025-07-29 · conditional · none · ref 55
Fine-tuning 13 CodeLMs on a constructed CLB dataset with nine interaction types improves detection, with UniXcoder-base reaching F1 0.7407 and small models outperforming large ones.
SafeTrans: LLM-assisted Transpilation from C to Rust cs.CR · 2025-05-15 · accept · none · ref 32
SafeTrans achieves up to 80% successful C-to-Rust translations via LLM iterative repair on 2653 programs and two real projects, with some C vulnerabilities carrying over to the Rust output.
Neural Code Translation of Legacy Code: APL to C# cs.SE · 2026-05-12 · unverdicted · none · ref 12
Guided LLM strategies with custom datasets and execution-based verification enable functional APL-to-C# translation across a range of program complexities.
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection cs.AI · 2026-05-04 · unverdicted · none · ref 37
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation cs.SE · 2026-05-04 · unverdicted · none · ref 25 · 2 links
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection cs.SE · 2026-04-28 · unverdicted · none · ref 25
Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code cs.LG · 2026-03-03 · unverdicted · none · ref 22
Contrastive Prompt Tuning raises code accuracy on two of three tested models but produces inconsistent energy-efficiency gains that depend on model, language, and task.
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment cs.SE · 2024-09-30 · unverdicted · none · ref 34
TransAgent improves LLM code translation by up to 33.3% via multi-agent fine-grained execution alignment on a new benchmark of recent tasks.
Self-Refine: Iterative Refinement with Self-Feedback cs.CL · 2023-03-30 · unverdicted · none · ref 34
Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.
Large Language Models for Multilingual Code Intelligence: A Survey cs.SE · 2026-04-27 · unverdicted · none · ref 34
A survey of methods, benchmarks, and open challenges for large language models in multilingual code generation and translation.
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning cs.SE · 2026-04-17 · unverdicted · none · ref 18
LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.
Specification-Driven Code Translation Powered by Large Language Models: How Far Are We? cs.SE · 2024-12-05 · unverdicted · none · ref 18
NL specifications alone do not improve LLM code translation performance, but combining them with source code yields gains in select language pairs with no overall consistent benefit.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer