archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 8

cs.SE 2026-05-08 reviewed

RAG with LLMs catches 91 percent of false kernel bug reports
Characterizing and Mitigating False-Positive Bug Reports in the Linux Kernel

Jiashuo Tian +5
cs.SE 2026-05-08 reviewed

Natural-language rewrite lifts code retrieval scores
Do not copy and paste! Rewriting strategies for code retrieval

Andrea Gurioli +2
cs.SE 2026-05-08 reviewed

Scenario models automate VR app tests and catch more failures
System Test Generation for Virtual Reality Applications using Scenario Models

Gerry Longfils +3
cs.RO 2026-05-08 reviewed

Search finds small perturbations that break robot vision 3-7x better
Search-based Robustness Testing of Laptop Refurbishing Robotic Software

Erblin Isaku +4
cs.SE 2026-05-08 reviewed

Iterative refinement boosts LLM quantum solver success
Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

Luciano Baresi +5
cs.SE 2026-05-08 reviewed

Iterative checks boost LLM quantum solver success
Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

Luciano Baresi +5
cs.SE 2026-05-08 reviewed

Prefill signals from small models locate multi-agent failures
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

Yang Liu +3
cs.SE 2026-05-08 reviewed

Prefill signals from small LLMs locate root failures in agent traces
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

Yang Liu +3
cs.SE 2026-05-08 reviewed

Multi-shot prompts boost agreement only for Claude Haiku
Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

Moaath Alshaikh +9
cs.SE 2026-05-08 reviewed

Multi-stage training boosts Java-to-Cangjie code translation 6%
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

Xinyue Liang +4
cs.SE 2026-05-08 reviewed

Unclear roles top ML team challenges in semiconductors
Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry

A. Azamnouri +5
cs.SE 2026-05-08 reviewed

Open-source low-code editor builds and deploys AI web apps
Low-code and no-code with BESSER to create and deploy smart web applications

Iv\'an Alfonso +3
cs.LG 2026-05-08 reviewed

Compile rate misleads on LLM game scene quality
Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate

Hugh Xuechen Liu +1
cs.LG 2026-05-08 reviewed

Dual-space loop refines virtual cell models by routing failures to right level
CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models

Mengran Li +14
cs.SE 2026-05-08 reviewed

AI backends gain one admission seam for governance across requests
Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests

Krti Tallam
cs.SE 2026-05-08 reviewed

LLM agents reach only 30-55% on full repo generation from scratch
RepoZero: Can LLMs Generate a Code Repository from Scratch?

Zhaoxi Zhang +4
cs.SE 2026-05-08 reviewed

Top LLM agents complete only 30-55% of code repositories from scratch
RepoZero: Can LLMs Generate a Code Repository from Scratch?

Zhaoxi Zhang +4
cs.CL 2026-05-08 reviewed

Framework ties agent architecture to lifecycle for reliable CUAs
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

Zejian Chen +8
cs.SE 2026-05-08 reviewed

Authority transfer, not task performance, defines agentic CI/CD
From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines

Marcus Emmanuel Barnes +2
cs.SE 2026-05-07 reviewed

Replay script matches frontier models on computer-use benchmarks
Computer Use at the Edge of the Statistical Precipice

Pierluca D'Oro +8
cs.SE 2026-05-07 reviewed

LLM agents fix under half of architectural code smells
SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

Ion George Dinu +7
cs.SE 2026-05-07 reviewed

LLM agents fix under half of architectural code smells
SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

Ion George Dinu +7
cs.RO 2026-05-07 reviewed

Language descriptions become solvable constraints for AV tests
Traffic Scenario Orchestration from Language via Constraint Satisfaction

Frieda Rong +3
cs.SE 2026-05-07 reviewed

This paper reviews studies linking lack of belonging to higher burnout in software…
Guidelines for Cultivating a Sense of Belonging to Reduce Developer Burnout

Bianca Trinkenreich +3
cs.SE 2026-05-07 reviewed

MySQL and PostgreSQL top DBMS use in open-source Java history
Analyzing the Adoption of Database Management Systems Throughout the History of Open Source Projects

Camila A. Paiva +10
cs.SE 2026-05-07 reviewed

Best coding agents pass under 16 percent of Java framework migrations
ScarfBench: A Benchmark for Cross-Framework Application Migration in Enterprise Java

Advait Pavuluri +8
cs.SE 2026-05-07 reviewed

Agents pass only 15% of Java framework migration tests
ScarfBench: A Benchmark for Cross-Framework Application Migration in Enterprise Java

Advait Pavuluri +8
cs.SE 2026-05-07 reviewed

AI code needs fewer updates than human code
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study

Shota Sawada +5
cs.SE 2026-05-07 reviewed

AI code receives less maintenance than human code
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study

Shota Sawada +5
cs.SE 2026-05-07 reviewed

LLM agents drop 30 points on backend tasks with full constraints
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

Francesco Dente +2
cs.AI 2026-05-07 reviewed

DAG replay preserves AI work state exactly with zero churn
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

Josh Rosen +1
cs.SE 2026-05-07 reviewed

LLMs pick vulnerable library versions in 37-56% of tasks
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Chengjie Wang +4
cs.SE 2026-05-07 reviewed

LLM-based method repairs sibling code bugs across locations
SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

Xinyu Liu +5
cs.SE 2026-05-07 reviewed

Self-healing framework raises LLM agent success rates
A Self-Healing Framework for Reliable LLM-Based Autonomous Agents

Cheonsu Jeong +1
cs.SE 2026-05-07 reviewed

Symbolic traces train 8B model to beat 32B on code violation detection
Teaching LLMs Program Semantics via Symbolic Execution Traces

Jonas Bayer +5
cs.SE 2026-05-07 reviewed

0.1% of PyPI packages carry 80% of maintenance impact
Modeling Dependency-Propagated Ecosystem Impact of Changes in Maintenance Activities: Evaluating Support Strategies in the PyPI Network

Alexandros Tsakpinis +2
cs.SE 2026-05-07 reviewed

0.1% of PyPI packages carry 80% of ecosystem impact
Modeling Dependency-Propagated Ecosystem Impact of Changes in Maintenance Activities: Evaluating Support Strategies in the PyPI Network

Alexandros Tsakpinis +2
cs.AI 2026-05-07 reviewed

LLM judges flip up to 9% of safety verdicts on equivalent policy rewordings
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

Shihao Weng +2
cs.SE 2026-05-07 reviewed

Protocol tests agent effort to recover design intent from code
BUILD-AND-FIND: An Effort-Aware Protocol for Evaluating Agent-Managed Codebases

Jhen-Ke Lin
cs.SE 2026-05-07 reviewed

Agents top out near 47% F1 on updating project tests after changes
Breaking, Stale, or Missing? Benchmarking Coding Agents on Project-Level Test Evolution

Ye Shang +5
cs.SE 2026-05-07 reviewed

One model beats coding specialists by 9% with utility-driven RL
Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

Yujia Chen +4
cs.SE 2026-05-07 reviewed

AST patterns identify algorithms more accurately than LLMs or clone detectors
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

Denis Neum\"uller +3
cs.CR 2026-05-07 reviewed

Tool detects how LLMs create risks in GitHub CI workflows
Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

Bonan Ruan +5
cs.AI 2026-05-07 reviewed

Multi-agent workflow lifts AI coding success by 6.5 percent
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

Yuliang Xu +4
cs.AI 2026-05-07 reviewed

Multi-agent workflow lifts algorithmic solving by 6.5 percent
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

Yuliang Xu +4
cs.SE 2026-05-07 reviewed

Automatic metrics fail to judge non-English code comments
Evaluating Non-English Developer Support in Machine Learning for Software Engineering

Jonathan Katzy +7
cs.SE 2026-05-07 reviewed

AI code security fixes often create new weaknesses
On Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategies

Ali Soltanian Fard Jahromi +3
cs.SE 2026-05-07 reviewed

Ontology guides agent for better requirements interviews
From Chat to Interview: Agentic Requirements Elicitation with an Experience Ontology

Dongming Jin +7
cs.SE 2026-05-07 reviewed

Real IDE traces expose overestimation in simulated coding assistant tests
An Empirical Study of Proactive Coding Assistants in Real-World Software Development

Lehui Li +3
cs.SE 2026-05-07 reviewed

Coding agents need insight policy quality
Agentic Coding Needs Proactivity, Not Just Autonomy

Nghi D. Q. Bui +1