Code language models show no transferable security understanding from code diffs alone, rely on commit messages, miss over 93% of fixes at 0.5% false positive rate, and suffer large drops under group or temporal splits.
Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Realistic noise synthesis incorporating Rician expectation and effective variance into simulated training data reduces bias in supervised ML for diffusion MRI microstructure estimation.
ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
citing papers explorer
-
Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study
Code language models show no transferable security understanding from code diffs alone, rely on commit messages, miss over 93% of fixes at 0.5% false positive rate, and suffer large drops under group or temporal splits.
-
Realistic noise synthesis reduces bias and improves tissue microstructure estimation with supervised machine learning
Realistic noise synthesis incorporating Rician expectation and effective variance into simulated training data reduces bias in supervised ML for diffusion MRI microstructure estimation.
-
Context-Guided Decompilation: A Step Towards Re-executability
ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.
-
Benchmarking Mythos-Linked Bug Rediscovery
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.