archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 11

cs.SE 2026-05-04 reviewed

LLM repair models drop over 50% on minor code tweaks
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

Fazle Rabbi +1
cs.SE 2026-05-04 reviewed

Evaluation issues cause many false failures in LLM code translation
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

Fazle Rabbi +2
cs.SE 2026-05-04 reviewed

Evaluation errors inflate LLM code translation failure rates
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

Fazle Rabbi +2
cs.SE 2026-05-04 reviewed

Agentic critic loop keeps code docs synced to changes
DocSync: Agentic Documentation Maintenance via Critic-Guided Reflexion

Sidhesh Badrinarayan +1
cs.CR 2026-05-04 reviewed

Binary patching works via decompile-repair-recompile
SCRIBE: Practical Static Binary Patching via Binary-Aware Recompilation of Decompiled Code

Han Dai +4
cs.SE 2026-05-04 reviewed

Datalog DSL in Lean translates queries to provable theorems
A Shallow Embedding of Datalog in Lean

Ramy Shahin
cs.SE 2026-05-03 reviewed

Foundation models detect Java refactoring bugs at 93.8% accuracy
Foundation Models as Oracles for Refactoring Correctness Detection

Rohit Gheyi +4
cs.SE 2026-05-03 reviewed

GitHub Actions audit finds 28% compliance with LLM hybrid checks
How Compliant Are GitHub Actions Workflows? A Checklist-Based Study with LLM-Assisted Auditing

Edward Abrokwah +1
cs.SE 2026-05-03 reviewed

This paper evaluates training-free classification of conventional commit messages using…
Conventional Commit Classification using Large Language Models and Prompt Engineering

H. M. Sazzad Quadir +2
cs.AI 2026-05-03 reviewed

ACDL standardizes precise descriptions of LLM agent contexts
A Language for Describing Agentic LLM Contexts

Noga Peleg Pelc +2
cs.CR 2026-05-03 reviewed

LLM agents cut false positives in security scans by 88 percent
QASecClaw: A Multi-Agent LLM Approach for False Positive Reduction in Static Application Security Testing

Mohd Ruhul Ameen +2
cs.LG 2026-05-03 reviewed

Declarative framework cuts RAG tuning code changes by 95%
AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines

Xintan Zeng +3
cs.SE 2026-05-03 reviewed

QSAF turns 34 circuit primitives into reusable hybrid-system components
Quantum Software Architecture Framework (QSAF): A Component-Based Framework for Designing Hybrid Quantum-Classical Systems

Arvind W. Kiwelekar +5
cs.CR 2026-05-03 reviewed

Expert patterns boost LLM vulnerability repair accuracy
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

Jia Li +3
cs.SE 2026-05-02 reviewed

Sprint simulation teaches empirical control in Scrum projects
A Lightweight Scrum Sprint Simulation to Help Learners Traverse the Empirical Process Control Threshold Concept

Eduardo Miranda +2
cs.SE 2026-05-02 reviewed

Safety-gated memory for RL coding agents hits 80% accuracy
Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

Mehmet Iscan
cs.SE 2026-05-02 reviewed

Neuro-symbolic agents block invalid requirements by design
Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse

Ahmed Ibrahim
cs.SE 2026-05-02 reviewed

Genetic programming evolves scaling policies that cut microservice resource use
Genetic Programming for Self-Adaptive Auto-Scaling of Microservices

Jia Li +2
cs.SE 2026-05-02 reviewed

Unrestricted autonomy breaks LLM test repair in enterprise UIs
Practical Limits of Autonomous Test Repair: A Multi-Agent Case Study with LLM-Driven Discovery and Self-Correction

Hyukjoo Lee
cs.SE 2026-05-02 reviewed

LLM spec accuracy drops 20 percent after removing deceptive outputs
LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation

Dong Xu +11
cs.SE 2026-05-02 reviewed

ChatGPT supports nine categories of software design tasks
Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

Yifei Wang +7
cs.DC 2026-05-02 reviewed

Turing machine extension defines context-awareness
On defining and modeling context-awareness

Panteleimon Rodis
cs.SE 2026-05-02 reviewed

LLM feedback agents improve test coverage on C and Python code
FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback

Kushal Jasti +4
cs.SE 2026-05-02 reviewed

Interactive agents clarify vague specs before STL generation
ClarifySTL: An Interactive LLM Agent Framework for STL Transformation through Requirements Clarification

Yue Fang +5
cs.SE 2026-05-01 reviewed

AI code output rises but reliability lags without strong specs
The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development

Sabry E. Farrag
cs.SE 2026-05-01 reviewed

DDD simulator runs same microservice code under multiple consistency models
A Domain-Driven Design Simulator for Business Logic-Rich Microservice Systems

Daniel da Palma Pereira +1
cs.SE 2026-05-01 reviewed

Platform links every AI prompt to its code edits for replay
RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions

Keyu He +4
cs.SE 2026-05-01 reviewed

ProMoTA links high-level models to code with full traceability
ProMoTA: a model-driven framework for end-to-end traceability analysis

Sadaf Mustafiz +2
cs.SE 2026-05-01 reviewed

Shor ECDLP oracle in Qrisp breaks control semantics
Semantics-Based Verification of an Implemented Shor Oracle for ECDLP in Qrisp

Lei Zhang +1
cs.SE 2026-05-01 reviewed

LLM agents reproduce materials findings at 54 percent
Can Coding Agents Reproduce Findings in Computational Materials Science?

Ziyang Huang +17
cs.SE 2026-05-01 reviewed

GeoContra lifts LLM GIS correctness by 26 percent via contracts
GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Yinhao Xiao +2
cs.SE 2026-05-01 reviewed

350k code preference pairs train multi-criteria reward models
Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul +2
cs.SE 2026-05-01 reviewed

350k code preferences train flexible multilingual reward models
Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul +2
cs.LG 2026-05-01 reviewed

Pass-rate rewards fail to beat binary rewards in code RL
Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation

Xin-Ye Li +5
cs.SE 2026-05-01 reviewed

Practitioners identify gaps in end-to-end autonomous driving tests
From Research to Practice: An Interactive Rapid Review of Autonomous Driving System Testing in Industry

Qunying Song +3
cs.SE 2026-05-01 reviewed

ML predicts energy of code blocks from static features
EnCoDe: Energy Estimation of Source Code At Design-Time

Shailender Goyal +2
cs.SE 2026-05-01 reviewed

Dataset shows API recommenders weaken on deep calls
Q-ARE: An Evaluation Dataset for Query Based API Recommendation

Shenglong Wu +2
cs.SE 2026-05-01 reviewed

Dense retrieval beats sparse for issue-commit links
Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval

Cole Morgan +3
cs.SE 2026-05-01 reviewed

PPO agent picks prompts for higher test coverage
PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation

Gourisetty Venkata Sai Koushik +5
cs.SE 2026-05-01 reviewed

Curriculum training lifts LLM code generation accuracy
Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

Shouyu Yin +3
cs.CR 2026-05-01 reviewed

Agent skills remain untrusted until verified by runtime
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere
cs.CR 2026-05-01 reviewed

Agent skills stay untrusted until they pass verification tests
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere
cs.SE 2026-05-01 reviewed

LLMs infill masked bug reports to uncover 27 Rust compiler bugs
ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs

Hongyan Gao +5
cs.SE 2026-05-01 reviewed

Fairness monitor agent cuts bias in LLM code by 65 percent
Social Bias in LLM-Generated Code: Benchmark and Mitigation

Fazle Rabbi +3
cs.SE 2026-05-01 reviewed

Agile team embeds log-based fraud alerts via weekly iterations
Integrating Log-Based Security Analytics in Agile Workflows: A Real-World Experience Report

Arpit Thool +1
cs.SE 2026-05-01 reviewed

Code model released openly after risk checks find no new threats
Code World Model Preparedness Report

Daniel Song +23
cs.SE 2026-05-01 reviewed

Code model cleared for open release after risk checks
Code World Model Preparedness Report

Daniel Song +23
cs.CR 2026-04-30 reviewed

Encrypted string operations enable private conformance checking
A Privacy-Preserving Approach to Conformance Checking

Luis Rodr\'iguez-Flores +3
cs.SE 2026-04-30 reviewed

Software leadership is managerial and interpersonal
What Characterizes a Software Leader? Identifying Leadership Practices from Practitioners Social Media

Murilo Coelho +5
cs.SE 2026-04-30 reviewed

Deptex finds true vulnerability reach by combining graphs and language models
DEPTEX: Organization-First, Open Source Dependency Risk Monitoring

Henry Ruckman-Utting +5