Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.
A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Software Testing, Verification and Reliability, 24(3):219–250
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
FIESTA uses bandit algorithms to adaptively decide how many seeds and splits to run for each candidate model, focusing effort on promising ones while providing guarantees on selecting the optimal model.
CausalSE applies SCMs and propensity score matching to reveal that causal analysis of prompt engineering on GPT-3 code generation often finds no significant effect where associational analysis suggests improvement.
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
MR-Scout extracts over 11,000 metamorphic-relation-encoded test cases from 701 OSS projects, codifies 97% of them as high-quality generators, and shows they raise line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests.
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
citing papers explorer
-
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.