ClozeMaster masks bracketed structures in historical Rust bug code and uses LLMs to infill them, generating test programs that discovered 27 confirmed bugs in rustc and mrustc while outperforming existing fuzzers.
hub Canonical reference
In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Canonical reference. 79% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
GraphQLify automates REST-to-GraphQL migration via static source code analysis, delivering 100% type-safe conversions on 834 APIs and 2-4x faster performance than REST for multi-call workflows.
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
EyeMulator augments CodeLLM fine-tuning loss with token weights derived from human eye-tracking scan paths, producing large gains on code translation and summarization across StarCoder, Llama-3.2 and DeepSeek-Coder.
QUTest is a native OpenQASM testing framework that encodes Arrange/Act/Assert tests and 12 assertion types via pragma comments while remaining compatible with existing tools.
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
Code language models show no transferable security understanding from code diffs alone, rely on commit messages, miss over 93% of fixes at 0.5% false positive rate, and suffer large drops under group or temporal splits.
A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.
Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.
QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.
CoCoMagic applies constrained cooperative co-evolution to metamorphic and differential testing to find up to 287% more distinct behavioral divergences in an end-to-end ADS than baseline search methods.
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
MR-Scout extracts over 11,000 metamorphic-relation-encoded test cases from 701 OSS projects, codifies 97% of them as high-quality generators, and shows they raise line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests.
Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.
Presents hierarchical adaptive refinement to accelerate near-optimal policy synthesis in MDPs up to 1M states with up to 2x speedup over PRISM and formal error bounds.
Proposes a context-aware unit testing framework for quantum subroutines modeled as parametrized quantum channels, using probabilistic assertions and demonstrated on GHZ preparation and Shor's algorithm subroutines.
DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source projects.
HYDRA is a hybrid model that uses heuristics plus deep embeddings and a VAE to predict latent zero-day vulnerabilities in patched functions from Chrome, Android, and ImageMagick.
citing papers explorer
-
ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs
ClozeMaster masks bracketed structures in historical Rust bug code and uses LLMs to infill them, generating test programs that discovered 27 confirmed bugs in rustc and mrustc while outperforming existing fuzzers.
-
GraphQLify: Automated and Type Safety-Preserving GraphQL API Adoption
GraphQLify automates REST-to-GraphQL migration via static source code analysis, delivering 100% type-safe conversions on 834 APIs and 2-4x faster performance than REST for multi-call workflows.
-
A Methodological Analysis of Empirical Studies in Quantum Software Testing
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
-
EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention
EyeMulator augments CodeLLM fine-tuning loss with token weights derived from human eye-tracking scan paths, producing large gains on code translation and summarization across StarCoder, Llama-3.2 and DeepSeek-Coder.
-
QUTest: A Native Testing Framework for Quantum Programs
QUTest is a native OpenQASM testing framework that encodes Arrange/Act/Assert tests and 12 assertion types via pragma comments while remaining compatible with existing tools.
-
Robust Mutation Analysis of Quantum Programs Under Noise
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
-
Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study
Code language models show no transferable security understanding from code diffs alone, rely on commit messages, miss over 93% of fixes at 0.5% false positive rate, and suffer large drops under group or temporal splits.
-
Quality-Driven Selective Mutation for Deep Learning
A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.
-
Ethics Testing: Proactive Identification of Generative AI System Harms
Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.
-
QuanForge: A Mutation Testing Framework for Quantum Neural Networks
QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.
-
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
-
A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
-
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
-
Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
-
Multi-LLM Orchestration for High-Quality Code Generation: Exploiting Complementary Model Strengths
PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.
-
Constrained Co-evolutionary Metamorphic Differential Testing for Autonomous Systems with an Interpretability Approach
CoCoMagic applies constrained cooperative co-evolution to metamorphic and differential testing to find up to 287% more distinct behavioral divergences in an end-to-end ADS than baseline search methods.
-
UntrustVul: An Automated Approach for Identifying Untrustworthy Alerts in Vulnerability Detection Models
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
-
MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
-
MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases
MR-Scout extracts over 11,000 metamorphic-relation-encoded test cases from 701 OSS projects, codifies 97% of them as high-quality generators, and shows they raise line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests.
-
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair
Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.
-
Accelerating Policy Synthesis in Large-Scale MDPs via Hierarchical Adaptive Refinement
Presents hierarchical adaptive refinement to accelerate near-optimal policy synthesis in MDPs up to 1M states with up to 2x speedup over PRISM and formal error bounds.
-
Context-Aware Unit Testing for Quantum Subroutines
Proposes a context-aware unit testing framework for quantum subroutines modeled as parametrized quantum channels, using probabilistic assertions and demonstrated on GHZ preparation and Shor's algorithm subroutines.
-
DeepFWI: Identifying Bug-Sensitive Warnings with Multi-Modal Code-Warning Semantics
DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source projects.
-
HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions
HYDRA is a hybrid model that uses heuristics plus deep embeddings and a VAE to predict latent zero-day vulnerabilities in patched functions from Chrome, Android, and ImageMagick.
-
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
-
Software Engineering for Self-Adaptive Robotics: A Research Agenda
This paper proposes a research agenda for software engineering of self-adaptive robotic systems along lifecycle stages and enabling technologies, identifying challenges and a roadmap to 2030.