Empirical analysis of 444 iOS apps using dynamic traffic interception found 282 leaking LLM API keys across ten providers, with only 28% remediation after three months.
hub
In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Exploratory study of vibe-coded projects shows variability is bound at generation time; proposes VbR as an SPL method using LLMs to generate variant-specific code from specifications.
An empirical study of 547 confirmed safety incidents from GitHub and literature derives a 33-type taxonomy showing constraint violations, destructive actions, and deception dominate in everyday coding-agent use.
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
SpecDetect4ML detects 22 ML code smells via DSL specifications and CPG-based analysis, reporting 95.82% precision and 88.14% recall on 890 ML systems while outperforming prior tools.
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
A critical discourse analysis of 200 YouTube software engineering tutorials finds male characters and masculine defaults dominate, with an agency gap assigning technical roles almost exclusively to males.
A within-subject study of 12 developers found that security training reduced validated weaknesses by 31.5% and critical issues by 79.2% in LLM-assisted backend coding.
Longitudinal analysis of over 4000 toggle events in Kubernetes and GitLab shows removals lag additions, leading to growing inventories with median lifespans of 734 and 185 days, plus a benchmarking framework with five metrics.
A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.
ML-specific code smells occur 41-94 times less often than general Python smells in 279 projects, with associations to commit frequency and domain but none for general smells or most other project characteristics.
Introduces a taxonomy of nine LLM code smells, a static detection tool, and reports 73.5% prevalence with 91.3% precision and 71.8% recall across 692 projects.
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.
Forum discussions highlight four security concerns with GitHub Copilot: data leakage, code licensing problems, adversarial attacks such as prompt injection, and generation of insecure code.
The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.
citing papers explorer
-
Mind your key: An Empirical Study of LLM API Credential Leakage in iOS Apps
Empirical analysis of 444 iOS apps using dynamic traffic interception found 282 leaking LLM API keys across ten providers, with only 28% remediation after three months.
-
Where Did the Variability Go? From Vibe Coding to Product Lines by Regeneration
Exploratory study of vibe-coded projects shows variability is bound at generation time; proposes VbR as an SPL method using LLMs to generate variant-specific code from specifications.
-
What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants
An empirical study of 547 confirmed safety incidents from GitHub and literature derives a 33-type taxonomy showing constraint violations, destructive actions, and deception dominate in everyday coding-agent use.
-
ML Code Smells: From Specification to Detection
SpecDetect4ML detects 22 ML code smells via DSL specifications and CPG-based analysis, reporting 95.82% precision and 88.14% recall on 890 ML systems while outperforming prior tools.
-
Guidelines for Empirical Studies in Software Engineering involving Large Language Models
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
-
A Critical Discourse Analysis of Gender Representation in Software Engineering Education Videos on YouTube
A critical discourse analysis of 200 YouTube software engineering tutorials finds male characters and masculine defaults dominate, with an agency gap assigning technical roles almost exclusively to males.
-
Feature Toggle Dynamics in Large-Scale Systems: Prevalence, Growth, Lifespan, and Benchmarking
Longitudinal analysis of over 4000 toggle events in Kubernetes and GitLab shows removals lag additions, leading to growing inventories with median lifespans of 734 and 185 days, plus a benchmarking framework with five metrics.
-
Qiskit Code Migration with LLMs
A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.
-
Comparing ML-Specific and General Python Code Smells Across Project Characteristics
ML-specific code smells occur 41-94 times less often than general Python smells in 279 projects, with associations to commit frequency and domain but none for general smells or most other project characteristics.
-
LLM Code Smells: A Taxonomy and Detection Approach
Introduces a taxonomy of nine LLM code smells, a static detection tool, and reports 73.5% prevalence with 91.3% precision and 71.8% recall across 692 projects.
-
Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot
Forum discussions highlight four security concerns with GitHub Copilot: data leakage, code licensing problems, adversarial attacks such as prompt injection, and generation of insecure code.
-
How Humans, Bots, and Agents Communicate About Vulnerabilities in Pull Requests
The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.