hub Mixed citations

In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27

Allamanis M, Jackson-Flux H, Brockschmidt M ( · 2021 · arXiv 6229.2023

Mixed citation behavior. Most common role is background (67%).

35 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 35 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 15 baseline 2 method 2 dataset 1 other 1

citation-polarity summary

background 14 baseline 2 use method 2 support 1 unclear 1 use dataset 1

representative citing papers

LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

cs.SE · 2026-05-15 · unverdicted · novelty 7.0

LLM-based merge conflict resolution performs well on imbalanced conflicts but struggles with large or non-English inputs, while search-based methods show better generalization and strength on balanced conflicts.

SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

cs.SE · 2026-05-07 · unverdicted · novelty 7.0

SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.

Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs

cs.SE · 2026-04-19 · unverdicted · novelty 7.0

MultiLogBench shows that LLM performance on automated logging varies substantially across programming languages, demonstrating that single-language evidence is insufficient for general claims about model behavior or tool design.

The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

Software engineering scope expands beyond executable code to semi-executable artifacts best diagnosed by the new six-ring Semi-Executable Stack model.

Choose, Don't Label: Multiple-Choice Query Synthesis for Program Disambiguation

cs.PL · 2026-04-09 · unverdicted · novelty 7.0

Multiple-choice queries synthesized from Hoare triples enable more reliable identification of intended programs than labeled-example supervision in active learning for program disambiguation.

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

cs.CR · 2026-04-03 · accept · novelty 7.0 · 2 refs

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

cs.SE · 2026-04-03 · conditional · novelty 7.0

AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.

When Specifications Meet Reality: Uncovering API Inconsistencies in Ethereum Infrastructure

cs.SE · 2026-03-06 · conditional · novelty 7.0 · 2 refs

APIDiffer automatically detects 72 API inconsistencies across 11 Ethereum clients using specification-guided test generation and LLM-based false-positive filtering, with 90% of bugs confirmed by developers.

AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification

cs.SE · 2026-02-03 · conditional · novelty 7.0

AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.

A Methodological Analysis of Empirical Studies in Quantum Software Testing

quant-ph · 2026-01-13 · accept · novelty 7.0 · 2 refs

A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators

cs.SE · 2025-08-28 · conditional · novelty 7.0 · 2 refs

Once4All synthesizes LLM-based generators from extracted SMT grammars and populates formula skeletons to fuzz Z3 and cvc5, discovering 43 confirmed bugs with 40 fixed.

Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

cs.SE · 2026-05-20 · unverdicted · novelty 6.0

Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.

QUTest: A Native Testing Framework for Quantum Programs

quant-ph · 2026-05-19 · unverdicted · novelty 6.0

QUTest is a native OpenQASM testing framework that encodes Arrange/Act/Assert tests and 12 assertion types via pragma comments while remaining compatible with existing tools.

MuMuTestUp: Mutation-based Multi-Agent Test Case Update

cs.SE · 2026-05-19 · unverdicted · novelty 6.0

MuMuTestUp is a mutation-guided multi-agent framework for updating test cases in evolving software that strengthens assertions via surviving mutants, targets specific coverage gaps, and uses semantic search instead of exact matching.

Robust Mutation Analysis of Quantum Programs Under Noise

cs.SE · 2026-05-13 · conditional · novelty 6.0

Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

cs.SE · 2026-05-11 · unverdicted · novelty 6.0

AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.

Quality-Driven Selective Mutation for Deep Learning

cs.SE · 2026-04-24 · unverdicted · novelty 6.0

A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.

QuanForge: A Mutation Testing Framework for Quantum Neural Networks

cs.SE · 2026-04-22 · unverdicted · novelty 6.0 · 2 refs

QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

cs.CR · 2026-04-21 · unverdicted · novelty 6.0 · 2 refs

SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.

Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.

Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference

cs.SE · 2026-04-15 · unverdicted · novelty 6.0

WarpL uses mutation to find and isolate suboptimal instruction sequences causing performance issues in WebAssembly runtimes by comparing machine code of original and non-problematic mutant programs.

Beyond Crash-to-Patch: Patch Evolution for Linux Kernel Repair

cs.SE · 2026-04-04 · unverdicted · novelty 6.0

Reconstructing 6946 syzbot bug-fix lifecycles reveals that accepted kernel patches are non-local and reviewer-constrained, enabling PatchAdvisor to improve automated repair quality over baselines via retrieval and diagnostic guidance.

PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair

cs.SE · 2026-04-03 · unverdicted · novelty 6.0

PAFT improves LLM-based program repair pass rates by up to 65.6% while cutting average edit distance by up to 32.6% through explicit preservation signals and curriculum training.

citing papers explorer

Showing 35 of 35 citing papers.

LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms cs.SE · 2026-05-15 · unverdicted · none · ref 8
LLM-based merge conflict resolution performs well on imbalanced conflicts but struggles with large or non-English inputs, while search-based methods show better generalization and strength on balanced conflicts.
SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models cs.SE · 2026-05-07 · unverdicted · none · ref 42
SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.
Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs cs.SE · 2026-04-19 · unverdicted · none · ref 37
MultiLogBench shows that LLM performance on automated logging varies substantially across programming languages, demonstrating that single-language evidence is insufficient for general claims about model behavior or tool design.
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE cs.SE · 2026-04-16 · unverdicted · none · ref 17
Software engineering scope expands beyond executable code to semi-executable artifacts best diagnosed by the new six-ring Semi-Executable Stack model.
Choose, Don't Label: Multiple-Choice Query Synthesis for Program Disambiguation cs.PL · 2026-04-09 · unverdicted · none · ref 14
Multiple-choice queries synthesized from Hoare triples enable more reliable identification of intended programs than labeled-example supervision in active learning for program disambiguation.
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study cs.CR · 2026-04-03 · accept · none · ref 20 · 2 links
Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits cs.SE · 2026-04-03 · conditional · none · ref 35
AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.
When Specifications Meet Reality: Uncovering API Inconsistencies in Ethereum Infrastructure cs.SE · 2026-03-06 · conditional · none · ref 86 · 2 links
APIDiffer automatically detects 72 API inconsistencies across 11 Ethereum clients using specification-guided test generation and LLM-based false-positive filtering, with 90% of bugs confirmed by developers.
AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification cs.SE · 2026-02-03 · conditional · none · ref 37
AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
A Methodological Analysis of Empirical Studies in Quantum Software Testing quant-ph · 2026-01-13 · accept · none · ref 123 · 2 links
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 32
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators cs.SE · 2025-08-28 · conditional · none · ref 37 · 2 links
Once4All synthesizes LLM-based generators from extracted SMT grammars and populates formula skeletons to fuzz Z3 and cvc5, discovering 43 confirmed bugs with 40 fixed.
Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution cs.SE · 2026-05-20 · unverdicted · none · ref 46
Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.
QUTest: A Native Testing Framework for Quantum Programs quant-ph · 2026-05-19 · unverdicted · none · ref 17
QUTest is a native OpenQASM testing framework that encodes Arrange/Act/Assert tests and 12 assertion types via pragma comments while remaining compatible with existing tools.
MuMuTestUp: Mutation-based Multi-Agent Test Case Update cs.SE · 2026-05-19 · unverdicted · none · ref 15
MuMuTestUp is a mutation-guided multi-agent framework for updating test cases in evolving software that strengthens assertions via surviving mutants, targets specific coverage gaps, and uses semantic search instead of exact matching.
Robust Mutation Analysis of Quantum Programs Under Noise cs.SE · 2026-05-13 · conditional · none · ref 82
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification cs.SE · 2026-05-11 · unverdicted · none · ref 15
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
Quality-Driven Selective Mutation for Deep Learning cs.SE · 2026-04-24 · unverdicted · none · ref 15
A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.
QuanForge: A Mutation Testing Framework for Quantum Neural Networks cs.SE · 2026-04-22 · unverdicted · none · ref 18 · 2 links
QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection cs.CR · 2026-04-21 · unverdicted · none · ref 23 · 2 links
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics cs.SE · 2026-04-20 · unverdicted · none · ref 20
GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.
Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference cs.SE · 2026-04-15 · unverdicted · none · ref 32
WarpL uses mutation to find and isolate suboptimal instruction sequences causing performance issues in WebAssembly runtimes by comparing machine code of original and non-problematic mutant programs.
Beyond Crash-to-Patch: Patch Evolution for Linux Kernel Repair cs.SE · 2026-04-04 · unverdicted · none · ref 11
Reconstructing 6946 syzbot bug-fix lifecycles reveals that accepted kernel patches are non-local and reviewer-constrained, enabling PatchAdvisor to improve automated repair quality over baselines via retrieval and diagnostic guidance.
PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair cs.SE · 2026-04-03 · unverdicted · none · ref 13
PAFT improves LLM-based program repair pass rates by up to 65.6% while cutting average edit distance by up to 32.6% through explicit preservation signals and curriculum training.
Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study cs.SE · 2025-11-29 · unverdicted · none · ref 1 · 2 links
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
Multi-LLM Orchestration for High-Quality Code Generation: Exploiting Complementary Model Strengths cs.SE · 2025-10-01 · conditional · none · ref 21
PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes cs.SE · 2025-05-12 · conditional · none · ref 54
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
A Study of LLMs' Preferences for Libraries and Programming Languages cs.SE · 2025-03-21 · unverdicted · none · ref 48
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
UntrustVul: An Automated Approach for Identifying Untrustworthy Alerts in Vulnerability Detection Models cs.SE · 2025-03-19 · unverdicted · none · ref 66
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants cs.CR · 2025-03-18 · unverdicted · none · ref 61
XOXO is a cross-origin context poisoning attack on AI coding assistants that uses a Cayley Graph search algorithm (GCGS) to find stealthy perturbations, achieving 75.72% average success rate across five tasks and eleven models.
Improving MPI Error Detection and Repair with Large Language Models and Bug References cs.SE · 2026-04-02 · unverdicted · none · ref 21
Augmenting LLMs with bug references, few-shot learning, chain-of-thought, and RAG improves MPI error detection accuracy from 44% to 77% and generalizes across models.
OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine cs.SE · 2025-04-16 · unverdicted · none · ref 38 · 2 links
OpDiffer applies LLMs and static analysis to opcode-level differential testing of EVMs, reporting 26 previously unknown bugs across nine implementations along with coverage gains and an estimate that 7.21% of real contracts could trigger the bugs.
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis cs.SE · 2026-03-18 · unverdicted · none · ref 78 · 2 links
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
MultiMend: Multilingual Program Repair with Context Augmentation and Multi-Hunk Patch Generation cs.SE · 2025-01-27 · unverdicted · none · ref 1
MultiMend augments buggy function context via retrieval and generates multi-hunk patches, fixing 2,227 of 5,501 bugs across six benchmarks in four languages.
To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research cs.SE · 2026-04-30 · unverdicted · none · ref 273
Generative AI suitability in qualitative research depends primarily on the approach (small-q positivist/post-positivist or Big Q non-positivist) along with skills, ethics, and personal preferences.

In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer