archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 2

cs.SE 2026-05-20 reviewed

Fortran scientific codes harbor many undefined-behavior-like defects
RSE of a Quantum Transport Code and its Effects

Christoph Conrads +1
cs.SE 2026-05-20 reviewed

LLMs turn technical privacy details into clear reports for workers
Transforming Privacy Artifacts into Accessible Reports for Non-Technical Stakeholders

Zoe Pfister +6
cs.SE 2026-05-20 reviewed

27% of Dockerfile SATD admissions couple with other files
Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

Wei Minn +7
cs.LG 2026-05-20 reviewed

RL fine-tuning lifts code generation pass@1 by 19% on MBPP
Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

Erfan Aghadavoodi Jolfaei +4
cs.CR 2026-05-20 reviewed

Spectral distances flag Trojaned DNN updates after one step
Detecting Trojaned DNNs via Spectral Regression Analysis

Samuele Pasini +2
cs.CL 2026-05-20 reviewed

Small classifier beats LLMs at pulling exact text from papers
ACL-Verbatim: hallucination-free question answering for research

G\'abor Recski +4
cs.SE 2026-05-20 reviewed

Refusal rate misranks LLMs on bio safety
RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

Lukas Weidener +4

4 Piths
cs.AI 2026-05-20 reviewed

Five checkpoints enforce policy in generalist agents
Governance by Construction for Generalist Agents

Segev Shlomov +9
cs.SE 2026-05-20 reviewed

Bioinformatics bug detection rises 30-38% with new full-context dataset
BioDefect: The First Dataset for Defect Detection in Bioinformatics Software

Tianxiang Xu +5
cs.SE 2026-05-20 reviewed

LLMs endorse 32% of their own behavior-changing code rewrites
Articulate but Wrong: Self-Review Failures in LLM-Based Code Modernization

Gokul Chandra Purnachandra Reddy +2
cs.SE 2026-05-20 reviewed

Contextual data makes code smell detection more actionable
An Event-Driven Tool for Context-Aware Code Smell Detection Using SmellDSL

Matheus dos Santos Viegas +3
cs.MA 2026-05-19 reviewed

State management beats workspace isolation in multi-agent tasks
Multi-agent Collaboration with State Management

Mengyang Liu +4
cs.AI 2026-05-19 reviewed

LLM agent accuracy drops to 0.54-0.62 without labels
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

Parsa Mazaheri +1
cs.SE 2026-05-19 reviewed

Privacy views raise coaching adherence from 0.48 to 0.74
Privacy-by-Design Adaptive Group Assignment for Digital Lifestyle Coaching at Scale

Nariman Mani +1
cs.PL 2026-05-19 reviewed

Frama-C plugin checks non-functional rules for automotive C
Contract Based Verification of Non-functional Requirements for Embedded Automotive C Code

Jesper Amilon +3
cs.SE 2026-05-19 reviewed

LLM tests catch all 16 anomalies where manual checks find only 7
A Multi-Layer Testing Framework for Automated Data Quality Assurance in Cloud-Native ELT Pipelines

Ismail Gargouri +1
cs.SE 2026-05-19 reviewed

Code gen picks winner by clustering behaviors on auto-generated inputs
Code Generation by Differential Test Time Scaling

Yifeng He +4
cs.SE 2026-05-19 reviewed

Agentic AI coding improves with structured verification loops
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

Christopher Koch
cs.SE 2026-05-19 reviewed

Methodology turns Bodies of Knowledge into assessable competencies
A Semantic-Web Oriented Competency Model for Engineering Programs

Nicolas Evain (LIUPPA) +2
cs.AI 2026-05-19 reviewed

Four-part SDB contract organizes LLM agent runtimes
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

Vasundra Srinivasan
cs.SE 2026-05-19 reviewed

Taxonomy organizes 248 studies on combined program analyses
Combined Program Analysis Techniques: A Systematic Mapping Study

Pietro Braione +5
cs.SE 2026-05-19 reviewed

Staged analysis improves LLM recovery of ROS 2 architectures
Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction

Dominique Briechle +7
cs.SE 2026-05-19 reviewed

Cleaner code reduces agent token use by 7-8% with no change in success
Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study

Priyansh Trivedi +1
cs.SE 2026-05-19 reviewed

Agent skills from expert methods beat docs for PostgreSQL tuning
A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

Hongyu Lin +6
cs.SE 2026-05-19 reviewed

Health data lakehouse shown usable for mixed-skill teams
OpenHealth Lake: Designing and testing a data lakehouse platform for health applications

Danilo Silva +5
cs.SE 2026-05-19 reviewed

LLMs Simplify OOD but Omit Key Abstractions
Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?

Zushuai Zhang +2
cs.AI 2026-05-19 reviewed

LLMs optimize code via priors
Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

Dmitry Redko (1) +9
cs.AI 2026-05-19 reviewed

Hard-coded verifiers beat LLM judges at matching human evaluations
OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Jinbiao Wei +6
quant-ph 2026-05-19 reviewed

Quantum tests can live inside .qasm circuit files
QUTest: A Native Testing Framework for Quantum Programs

Jos\'e Campos
cs.CR 2026-05-19 reviewed

Agent fixes 89% of flaws in source-free industrial software
SCARA: A Semantics-Constrained Autonomous Remediation Agent for Opaque Industrial Software Vulnerabilities

Bowei Ning +6
cs.SE 2026-05-19 reviewed

Criterion-level pairwise judgments lift code judge accuracy to 66.3%
CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

Zhenyu Li +3
cs.SE 2026-05-19 reviewed

Study catalogs 301 real tile-program bugs from GitHub
Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection

Ravishka Rathnasuriya +6
cs.HC 2026-05-19 reviewed

Single-file AI tools push accessibility boundaries outward
The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

Rizwan Jahangir +1
cs.CL 2026-05-19 reviewed

One LLM system optimizes text to beat specialists on six tasks
optimize_anything: A Universal API for Optimizing any Text Parameter

Lakshya A Agrawal +13
cs.AI 2026-05-19 reviewed

Governance recipe lifts LLM skill-library performance from 0.26 to 0.58
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Xing Zhang +6
cs.SE 2026-05-19 reviewed

MILP solves fairness repair for neural networks with formal guarantees
Provable Fairness Repair for Deep Neural Networks

Jianan Ma +3
cs.SE 2026-05-19 reviewed

Dependency repair shrinks programs 52 percent more than syntax-only reducers
DRReduce: Enhancing Syntax-Guided Program Reduction with Dependency Reconstruction

Qiong Feng +4
cs.SE 2026-05-19 reviewed

Code models now decide when to answer and when to defer
When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions

Ravishka Rathnasuriya +1
cs.SE 2026-05-19 reviewed

Input adaptation cuts code model mispredictions without retraining
On-the-Fly Input Adaptation for Reliable Code Intelligence

Ravishka Rathnasuriya +1
cs.AI 2026-05-19 reviewed

MOCHA improves agent skill correctness on every task
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

Md Mehrab Tanjim +8
cs.SE 2026-05-19 reviewed

Multi-agent system hardens test updates with mutations
MuMuTestUp: Mutation-based Multi-Agent Test Case Update

Dawei Tian (1) +9
cs.SE 2026-05-19 reviewed

Self-healing web apps detect faults at 90.7% and recover 56% faster
When Web Apps Heal Themselves: A MAPE-K Based Approach to Fault Tolerance and Adaptive Recovery

Sales Aribe Jr +1
cs.SE 2026-05-18 reviewed

LLM agents turn switch manuals into graphs at 97-99% accuracy
Supporting System Testing with a Multi-Agent LLM-based Framework for Knowledge Graph Extraction: A Case Study with Ethernet Switch Systems

Rongqi Pan +5
cs.SE 2026-05-18 reviewed

AI restructures open source docs to cut cognitive overload
Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

Zixuan Feng +4
cs.SE 2026-05-18 reviewed

RL agent refines prompts to boost LLM code pass rates
Prompt Optimization for LLM Code Generation via Reinforcement Learning

Ali Mohammadi Esfahani +2
cs.SE 2026-05-18 reviewed

Multi-agent pipeline extracts traceable specs from legacy code
Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Sanderson Oliveira de Macedo +1
cs.SE 2026-05-18 reviewed

Stripping consent declarations raises overeager rate in coding agents
Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Yubin Qu +6
cs.IR 2026-05-18 reviewed

q-log odds lift BM25 NDCG@10 by 89% on code search
Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

Santosh Kumar Radha +1
cs.SE 2026-05-18 reviewed

One Engineer With AI Agents Finishes Four-Person Job In Half The Time
One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise

Marcelo Vilas Boas +4
cs.SE 2026-05-18 reviewed

One engineer with AI agents finishes four-person project in half the time
One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise

Marcelo Vilas Boas +4