A.s.e: A repository-level benchmark for evaluating security in ai-generated code,

· 2025 · arXiv 2508.18106

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

An Empirical Study of Security Calibration in Large Language Models for Code

cs.SE · 2026-06-30 · unverdicted · novelty 7.0

Empirical evaluation of three LLMs finds prevalent overconfidence in insecure code generation, with security calibration outperforming functional calibration but both degrading in repository-level settings.

citing papers explorer

Showing 1 of 1 citing paper.

An Empirical Study of Security Calibration in Large Language Models for Code cs.SE · 2026-06-30 · unverdicted · none · ref 12
Empirical evaluation of three LLMs finds prevalent overconfidence in insecure code generation, with security calibration outperforming functional calibration but both degrading in repository-level settings.

A.s.e: A repository-level benchmark for evaluating security in ai-generated code,

fields

years

verdicts

representative citing papers

citing papers explorer