archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 16

cs.SE 2026-04-23 reviewed

Models generate correct code without public tests
You Don't Need Public Tests to Generate Correct Code

Kaushitha Silva +1
cs.SE 2026-04-23 reviewed

Bug variants reveal memorization in LLM repair models
A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

Milan De Koning +3
cs.SE 2026-04-23 reviewed

Decomposition plus mutation refines LLM specs to verify real programs
SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification

Lezhi Ma +5
cs.AI 2026-04-23 reviewed

Bounds on neural-net safety probability under random inputs
Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation

Jingyang Li +3
cs.SE 2026-04-23 reviewed

GPT dominates generative AI use in IT project management
A systematic review of generative AI usage for IT project management

Ionut Anghel +1
cs.SE 2026-04-23 reviewed

Ambiguous requirements cut LLM code accuracy
Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

Di Yang +9
cs.SE 2026-04-23 reviewed

IRAP turns vague specs into math functions with 40x gains
Conjecture and Inquiry: Quantifying Software Performance Requirements via Interactive Retrieval-Augmented Preference Elicitation

Shihai Wang +1
cs.CL 2026-04-23 reviewed

Modular checks push GUI agents past human performance on OSWorld
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han +13
cs.LG 2026-04-23 reviewed

The authors adapted their prior mdok method for machine-generated text detection to…
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Adam Skurla +2
cs.CR 2026-04-23 reviewed

Three LLM experts detect code vulnerabilities at 77% F1 for two cents
Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang
cs.SE 2026-04-23 reviewed

SBOM mismatches produce inconsistent vulnerability reports
Hidden Dependencies and Component Variants in SBOM-Based Software Composition Analysis

Shawn Rasheed +4
cs.AI 2026-04-23 reviewed

Meta-predicates flag unsuitable evidence in clinical AI rules upfront
Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages

Michael Bouzinier +4
cs.SE 2026-04-23 reviewed

Execution feedback beats pipeline complexity for 1-3B code models
Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation

Charles Junichi McAndrews
cs.SE 2026-04-22 reviewed

Ground-truth dataset exposes differences in vulnerability detectors
A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

Peter Mandl +3
cs.DC 2026-04-22 reviewed

GPU runs 20,000 GWAS phenotypes in 20 minutes
TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

Xingzhong Zhao +7
cs.AI 2026-04-22 reviewed

POMDP models hidden user states to auto-refine LLM prompts
Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

Gricel V\'azquez +4
cs.SE 2026-04-22 reviewed

37% of AI governance prompts miss key structure
Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

Christo Zietsman
cs.CR 2026-04-22 reviewed

LLM gateways often swap models and misbill users
Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways

Guanjie Lin +5
cs.SE 2026-04-22 reviewed

The paper examines residual security risks in patched code by measuring semantic and…
Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

Mohammad Farhad +1
cs.AI 2026-04-22 reviewed

Value conflict tests show alignment faking in 7B models
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

Inderjeet Nair +2
cs.SE 2026-04-22 reviewed

LLM tool provides 24/7 feedback for software engineering students
Autonomous LLM-generated Feedback for Student Exercises in Introductory Software Engineering Courses

Andreas Metzger
cs.AI 2026-04-22 reviewed

Coding agents retain just 44% of code in real commits
SWE-chat: Coding Agent Interactions From Real Users in the Wild

Joachim Baumann +5
cs.HC 2026-04-22 reviewed

Serverless toolkit builds urban VA prototypes in hours
Autark: A Serverless Toolkit for Prototyping Urban Visual Analytics Systems

Lucas Alexandre +7
cs.SE 2026-04-22 reviewed

High AUC Does Not Ensure Defect Models Beat Random at All Thresholds
Evaluating Software Defect Prediction Models via the Area Under the ROC Curve Can Be Misleading

Luigi Lavazza +2
cs.SE 2026-04-22 reviewed

QuanForge distinguishes QNN test suites and finds weak circuit regions
QuanForge: A Mutation Testing Framework for Quantum Neural Networks

Minqi Shao +2
cs.SE 2026-04-22 reviewed

GNNs spot LLM-written safety cases at F1 0.94
Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

Fariz Ikhwantri +1
cs.SE 2026-04-22 reviewed

LLM regex masks raise log parsing accuracy to 97.6%
DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks

Amir Shetaia +1
cs.SE 2026-04-22 reviewed

LLMs reach 88-89% accuracy on product line blueprint analysis
Early-Stage Product Line Validation Using LLMs: A Study on Semi-Formal Blueprint Analysis

Viet-Man Le +4
cs.SE 2026-04-22 reviewed

Hybrid detector finds 893k eliminable duplicate BDD steps
Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

Ali Hassaan Mughal +2
cs.SE 2026-04-22 reviewed

Security commit messages remain largely uninformative
On the Informativeness of Security Commit Messages: A Large-scale Replication Study

Syful Islam +1
cs.SE 2026-04-22 reviewed

Guardrails from requirements and models stabilize AI agents
Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development -- Initial Findings

Petrus Lipsanen +5
cs.CL 2026-04-22 reviewed

RL trains 7B model to build websites rivaling 671B LLMs
WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Juyong Jiang +6
cs.SE 2026-04-22 reviewed

Reasoning models forecast parallel code races without tool calls
Learning Reasoning World Models for Parallel Code

Gautam Singh +3
cs.SE 2026-04-22 reviewed

LLMs detect logging security issues at 13-52 percent accuracy
Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs

He Yang Yuan +5
cs.SE 2026-04-22 reviewed

Static tool catches invented symbols in LLM API migrations
Hallucination Inspector: A Fact-Checking Judge for API Migration

Marcos Tileria +3
cs.CR 2026-04-22 reviewed

LLM agents confirm 84% of Node.js taint vulnerabilities
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

Ronghao Ni +2
cs.LG 2026-04-22 reviewed

Dual tasks test if LLMs grasp code execution flow
The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov +3
cs.LG 2026-04-22 reviewed

LLM absorbs long contexts into fixed parameters with causal sync
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Zhixin Zhang +4
cs.LG 2026-04-22 reviewed

Joint optimizations cut multi-agent edge latency by 62 percent at 200 agents
A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

Samaresh Kumar Singh +1
quant-ph 2026-04-22 reviewed

Nine quantum-HPC stacks share design patterns for unifying layers
Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey

Amir Shehata +24
cs.SE 2026-04-21 reviewed

Code snippets prove 20 percent more library calls executable
FIKA: Expanding Dependency Reachability with Executability Guarantees

Yogya Gamage +3
cs.SE 2026-04-21 reviewed

Review charts automation routes for quantum software and AI
Automated Quantum Software and AI Engineering

Nazanin Siavash +1
cs.SE 2026-04-21 reviewed

Platform uses containers and supervised AI chat for reproducible biomedical workflows
Biomedical systems biology workflow orchestration and execution with PoSyMed

Simon S\"uwer +5
cs.SE 2026-04-21 reviewed

AI Security PRs Introduce Recurring Flaws but Often Merge
Insights into Security-Related AI-Generated Pull Requests

Md Fazle Rabbi +3
cs.SE 2026-04-21 reviewed

Vision models turn GUI bug videos into replays 72% of the time
ViBR: Automated Bug Replay from Video-based Reports using Vision-Language Models

Sidong Feng +5
cs.SE 2026-04-21 reviewed

LLM GUI code compiles but rarely plays without errors
PlayCoder: Making LLM-Generated GUI Code Playable

Zhiyuan Peng +5
cs.RO 2026-04-21 reviewed

One open codebase trains vision-language-action models end-to-end
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

Jean Mercat +7
cs.SE 2026-04-21 reviewed

Predictive autoscaler holds Node.js latency at 26 ms in ramps
Predictive Autoscaling for Node.js on Kubernetes: Lower Latency, Right-Sized Capacity

Ivan Tymoshenko +2
cs.SE 2026-04-21 reviewed

Reflection and planning lift theorem proving 22% with fixed LLM calls
On Reasoning-Centric LLM-based Automated Theorem Proving

Yican Sun +3
cs.CR 2026-04-21 reviewed

Fine-tuned LLMs raise XSS obfuscation match rate to 0.22
Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

Divyesh Gabbireddy +1