{"total":16,"items":[{"citing_arxiv_id":"2606.28125","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How Humans, Bots, and Agents Communicate About Vulnerabilities in Pull Requests","primary_cat":"cs.SE","submitted_at":"2026-06-26T14:26:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20173","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Qiskit Code Migration with LLMs","primary_cat":"cs.SE","submitted_at":"2026-06-18T12:40:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19042","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Where Did the Variability Go? From Vibe Coding to Product Lines by Regeneration","primary_cat":"cs.SE","submitted_at":"2026-06-17T13:10:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Exploratory study of vibe-coded projects shows variability is bound at generation time; proposes VbR as an SPL method using LLMs to generate variant-specific code from specifications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18423","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Critical Discourse Analysis of Gender Representation in Software Engineering Education Videos on YouTube","primary_cat":"cs.SE","submitted_at":"2026-06-16T19:18:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A critical discourse analysis of 200 YouTube software engineering tutorials finds male characters and masculine defaults dominate, with an agency gap assigning technical roles almost exclusively to males.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12212","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mind your key: An Empirical Study of LLM API Credential Leakage in iOS Apps","primary_cat":"cs.SE","submitted_at":"2026-06-10T15:29:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Empirical analysis of 444 iOS apps using dynamic traffic interception found 282 leaking LLM API keys across ten providers, with only 28% remediation after three months.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01882","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Comparing ML-Specific and General Python Code Smells Across Project Characteristics","primary_cat":"cs.SE","submitted_at":"2026-06-01T08:27:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ML-specific code smells occur 41-94 times less often than general Python smells in 279 projects, with associations to commit frequency and domain but none for general smells or most other project characteristics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30777","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants","primary_cat":"cs.SE","submitted_at":"2026-05-29T03:09:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"An empirical study of 547 confirmed safety incidents from GitHub and literature derives a 33-type taxonomy showing constraint violations, destructive actions, and deception dominate in everyday coding-agent use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24300","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Enhancing Reliability in LLM-Based Secure Code Generation","primary_cat":"cs.CR","submitted_at":"2026-05-22T23:58:56+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MA-CoT prompting reduces security findings in LLM-generated code by 57.6% on a 200-task dataset and 94.5% on LLMSecEval across C, Java, and Python, outperforming vanilla, zero-shot, and standard CoT strategies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22976","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Code Smells: A Taxonomy and Detection Approach","primary_cat":"cs.SE","submitted_at":"2026-05-21T19:10:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces a taxonomy of nine LLM code smells, a static detection tool, and reports 73.5% prevalence with 91.3% precision and 71.8% recall across 692 projects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16829","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Constrained Code Generation with Discrete Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-16T06:15:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17814","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective","primary_cat":"cs.CR","submitted_at":"2026-04-20T05:12:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17763","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Quasi-Experimental Developer Study of Security Training in LLM-Assisted Web Application Development","primary_cat":"cs.CR","submitted_at":"2026-04-20T03:34:29+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A within-subject study of 12 developers found that security training reduced validated weaknesses by 31.5% and critical issues by 79.2% in LLM-assisted backend coding.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"code generation. Siddiq and Santos introduced SecurityEval, a dataset of 130 Python samples spanning 75 CWE-mapped vulnerability types [33]. SALLM later proposed a systematic evaluation pipeline combining security-centric prompts with automated analysis [34]. Tonyet al.presented LLMSecEval, a dataset of 150 natural-language prompts for assessing code security [35]. CWEval evaluates both functional correctness and security, showing that models can pass tests while still producing vulnerable code [36]. BaxBench uses exploit-based testing for backend code and shows that current models struggle to achieve both correctness and security [37]. While these benchmarks provide useful context, they remain model- centric and do not address how developer behavior can be"},{"citing_arxiv_id":"2604.15872","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Feature Toggle Dynamics in Large-Scale Systems: Prevalence, Growth, Lifespan, and Benchmarking","primary_cat":"cs.SE","submitted_at":"2026-04-17T09:22:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Longitudinal analysis of over 4000 toggle events in Kubernetes and GitLab shows removals lag additions, leading to growing inventories with median lifespans of 734 and 185 days, plus a benchmarking framework with five metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08352","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot","primary_cat":"cs.SE","submitted_at":"2026-04-09T15:19:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Forum discussions highlight four security concerns with GitHub Copilot: data leakage, code licensing problems, adversarial attacks such as prompt injection, and generation of insecure code.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20491","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ML Code Smells: From Specification to Detection","primary_cat":"cs.SE","submitted_at":"2025-09-24T19:09:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SpecDetect4ML detects 22 ML code smells via DSL specifications and CPG-based analysis, reporting 95.82% precision and 88.14% recall on 890 ML systems while outperforming prior tools.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.15503","ref_index":64,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Guidelines for Empirical Studies in Software Engineering involving Large Language Models","primary_cat":"cs.SE","submitted_at":"2025-08-21T12:30:30+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"user interactions, collaborative coding environments, and software usability assessments. This approach enables data collection that closely reflects human reactions while avoiding the need for di- rect human involvement. To achieve this, prompt engineering tech- niques are widely employed, with a common approach being the use of thePersonas Pattern[ 64], which involves tailoring LLM responses to align with predefined profiles or roles that emulate specific user Baltes et al. archetypes. Zhao et al. outline opportunities and challenges of us- ing LLMs as research subjects in detail [140]. Furthermore, recent sociological studies have emphasized that, to be effectively utilized in this capacity, LLMs-including their agentic versions-should"}],"limit":50,"offset":0}