arXiv:2502.06215 [cs.SE] https://arxiv.org/abs/2502.06215 Manuscript submitted to ACM

Guo, D · 2025 · arXiv 2502.06215

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering

cs.SE · 2025-07-20 · conditional · novelty 8.0

AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.

Guidelines for Empirical Studies in Software Engineering involving Large Language Models

cs.SE · 2025-08-21 · accept · novelty 7.0 · 2 refs

The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.

BT-APE: A Computationally Light Backtracking Approach to Automatic Prompt Engineering for Requirements Classification

cs.SE · 2026-07-01 · unverdicted · novelty 6.0

BT-APE automates prompt engineering for requirements classification using backtracking search and dynamic examples, matching PE2 accuracy while using 72% fewer tokens and 66% less time than that baseline.

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.

PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

PRISM detects and stops credential leakage during LLM generation in multi-agent pipelines using per-token risk scores from lexical, structural, and behavioral signals, achieving zero observed leaks and F1 of 0.832 on a 2000-task benchmark.

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

cs.CR · 2026-06-17 · unverdicted · novelty 5.0

OpenAnt is an open-source pipeline that uses code decomposition, LLM-based adversarial verification, and automated dynamic testing to find vulnerabilities in large projects like OpenSSL and WordPress while claiming lower false positives.

Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

citing papers explorer

Showing 6 of 6 citing papers after filters.

BT-APE: A Computationally Light Backtracking Approach to Automatic Prompt Engineering for Requirements Classification cs.SE · 2026-07-01 · unverdicted · none · ref 59
BT-APE automates prompt engineering for requirements classification using backtracking search and dynamic examples, matching PE2 accuracy while using 72% fewer tokens and 66% less time than that baseline.
SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models cs.CL · 2026-06-29 · unverdicted · none · ref 5
SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models cs.LG · 2026-06-08 · unverdicted · none · ref 294
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.
PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines cs.AI · 2026-05-11 · unverdicted · none · ref 13
PRISM detects and stops credential leakage during LLM generation in multi-agent pipelines using per-token risk scores from lexical, structural, and behavioral signals, achieving zero observed leaks and F1 of 0.832 on a 2000-task benchmark.
OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing cs.CR · 2026-06-17 · unverdicted · none · ref 15
OpenAnt is an open-source pipeline that uses code decomposition, LLM-based adversarial verification, and automated dynamic testing to find vulnerabilities in large projects like OpenSSL and WordPress while claiming lower false positives.
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt cs.CL · 2026-06-01 · unverdicted · none · ref 104
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

arXiv:2502.06215 [cs.SE] https://arxiv.org/abs/2502.06215 Manuscript submitted to ACM

fields

years

verdicts

representative citing papers

citing papers explorer