hub Mixed citations

Identify and update test cases when production code changes: A transformer- based approach,

Ningke Li, Shenao Wang, Mingxi Feng, Kailong Wang, Meizhen Wang, Haoyu Wang · 2023 · arXiv 6229.2023

Mixed citation behavior. Most common role is background (67%).

46 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 46 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 15 baseline 2 method 2 dataset 1 other 1

citation-polarity summary

background 14 baseline 2 use method 2 support 1 unclear 1 use dataset 1

representative citing papers

Mind your key: An Empirical Study of LLM API Credential Leakage in iOS Apps

cs.SE · 2026-06-10 · unverdicted · novelty 8.0

Empirical analysis of 444 iOS apps using dynamic traffic interception found 282 leaking LLM API keys across ten providers, with only 28% remediation after three months.

Sakura: An Approach for Generating Complex Tests from Natural Language Test Descriptions

cs.SE · 2026-05-30 · unverdicted · novelty 7.0

Sakura is a multi-agent system that generates structurally complex tests from NL descriptions, achieving 50-78% higher compilability and 38-66% higher coverage overlap than baselines on 1,464 scenarios from 20 Apache Commons applications.

LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

cs.SE · 2026-05-15 · unverdicted · novelty 7.0

LLM-based merge conflict resolution performs well on imbalanced conflicts but struggles with large or non-English inputs, while search-based methods show better generalization and strength on balanced conflicts.

SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

cs.SE · 2026-05-07 · unverdicted · novelty 7.0

SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.

Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs

cs.SE · 2026-04-19 · unverdicted · novelty 7.0

MultiLogBench shows that LLM performance on automated logging varies substantially across programming languages, demonstrating that single-language evidence is insufficient for general claims about model behavior or tool design.

The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

Software engineering scope expands beyond executable code to semi-executable artifacts best diagnosed by the new six-ring Semi-Executable Stack model.

Choose, Don't Label: Multiple-Choice Query Synthesis for Program Disambiguation

cs.PL · 2026-04-09 · unverdicted · novelty 7.0

Multiple-choice queries synthesized from Hoare triples enable more reliable identification of intended programs than labeled-example supervision in active learning for program disambiguation.

AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

cs.SE · 2026-04-03 · conditional · novelty 7.0

AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.

When Specifications Meet Reality: Uncovering API Inconsistencies in Ethereum Infrastructure

cs.SE · 2026-03-06 · conditional · novelty 7.0 · 2 refs

APIDiffer automatically detects 72 API inconsistencies across 11 Ethereum clients using specification-guided test generation and LLM-based false-positive filtering, with 90% of bugs confirmed by developers.

AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification

cs.SE · 2026-02-03 · conditional · novelty 7.0

AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.

A Methodological Analysis of Empirical Studies in Quantum Software Testing

quant-ph · 2026-01-13 · accept · novelty 7.0 · 2 refs

A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators

cs.SE · 2025-08-28 · conditional · novelty 7.0 · 2 refs

Once4All synthesizes LLM-based generators from extracted SMT grammars and populates formula skeletons to fuzz Z3 and cvc5, discovering 43 confirmed bugs with 40 fixed.

cs.SE · 2026-06-29 · unverdicted · novelty 6.0 · 3 refs

Large-scale analysis of 200K PyPI packages identifies 1,361 replicated popular packages, 256 replicated vulnerable packages, and 7 new replicated malicious packages, showing replication as a security threat vector.

Finding Compiler-Platform Interaction Bugs in Deep Learning Pipelines via Cross-Layer Constraints

cs.SE · 2026-06-16 · unverdicted · novelty 6.0

XCheck extracts cross-layer constraints to generate test models and monitor behaviors, revealing 2,034 compiler-platform interaction bugs in three DL compilers.

Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

cs.SE · 2026-05-20 · unverdicted · novelty 6.0

Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.

QUTest: A Native Testing Framework for Quantum Programs

quant-ph · 2026-05-19 · unverdicted · novelty 6.0

QUTest is a native OpenQASM testing framework that encodes Arrange/Act/Assert tests and 12 assertion types via pragma comments while remaining compatible with existing tools.

MuMuTestUp: Mutation-based Multi-Agent Test Case Update

cs.SE · 2026-05-19 · unverdicted · novelty 6.0

MuMuTestUp is a mutation-guided multi-agent framework for updating test cases in evolving software that strengthens assertions via surviving mutants, targets specific coverage gaps, and uses semantic search instead of exact matching.

Robust Mutation Analysis of Quantum Programs Under Noise

cs.SE · 2026-05-13 · conditional · novelty 6.0

Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

cs.SE · 2026-05-11 · unverdicted · novelty 6.0

AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.

Quality-Driven Selective Mutation for Deep Learning

cs.SE · 2026-04-24 · unverdicted · novelty 6.0

A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.

QuanForge: A Mutation Testing Framework for Quantum Neural Networks

cs.SE · 2026-04-22 · unverdicted · novelty 6.0 · 2 refs

QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

cs.CR · 2026-04-21 · unverdicted · novelty 6.0 · 2 refs

SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.

Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Choose, Don't Label: Multiple-Choice Query Synthesis for Program Disambiguation cs.PL · 2026-04-09 · unverdicted · none · ref 14
Multiple-choice queries synthesized from Hoare triples enable more reliable identification of intended programs than labeled-example supervision in active learning for program disambiguation.

Identify and update test cases when production code changes: A transformer- based approach,

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer