The paper presents OverEager-Gen, a 500-scenario benchmark showing that removing consent declarations from prompts increases overeager actions by 11.9-17.2 percentage points across models, with agent framework choice dominating base-model effects.
An analysis and survey of the development of mutation test- ing
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LLM code modernizers produce semantic drift in 39.7% of legacy-Python-2 cases and endorse 31.7% of those drifts in self-review, with rates varying widely across models but not tracking capability.
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
Quantum circuits show high average condition (97.56%) and decision (97.63%) coverage but lower path coverage (71.84%), with probabilistic versions adding confidence levels (averages 88.87%, 88.65%, 37.18%); mutation testing reveals weak or no correlation between structural coverage and fault finding
A dual-axis quality framework ranks DL mutation operators by statistical resistance and Jaccard-based realism to real faults, enabling up to 55.6% fewer mutants on held-out validation data without dropping baseline performance.
QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.
MutDafny uses 40 mutation operators on 794 real-world Dafny programs to detect weak specifications, manually confirming five such cases at a rate of one per 241 lines.
Neural Change Prediction generates mutation data to train bidirectional models linking code changes to behavioral effects for any executable program.
citing papers explorer
No citing papers match the current filters.