archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 9

cs.SE 2026-05-07 reviewed

Developers put ethical rules for AI agents into repo files
Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files

Christoph Treude +2
cs.SE 2026-05-07 reviewed

Semi-supervised models flag unrelated CI build failures
Is this Build Failure Related to my Patch? An Empirical Study of Unrelated Build Failures in Continuous Integration

Andie Huang +3
cs.SE 2026-05-06 reviewed

Two hours of prep lets AI agents build full-stack platform in parallel
Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology

Andrew Zigler
cs.CR 2026-05-06 reviewed

Security detection rules keep oscillating between goals
Evolution of Log-Based Detection Rules in Public Repositories

Minjun Long +1
cs.CR 2026-05-06 reviewed

Detection rules keep adding and removing conditions instead of stabilizing
Evolution of Log-Based Detection Rules in Public Repositories

Minjun Long +1
cs.SE 2026-05-06 reviewed

Claude leads public single-file HTML generation tests
The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking

Diego Cabezas Palacios
cs.CR 2026-05-06 reviewed

Policy gating blocks cross-tenant leaks in shared AI retrieval
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

Francisco Javier Arceo +1
cs.DC 2026-05-06 reviewed

Nine-dimension model explains root causes in five of twelve DeFi incidents
Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach

Eva Oberholzer +3
cs.SE 2026-05-06 reviewed

Retrieval scaffolding aligns AI services with production rules
Architectural Constraints Alignment in AI-assisted, Platform-based Service Development

Julius Irion +7
cs.SE 2026-05-06 reviewed

Symbolic conflict essences detect rule interference exactly
Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version

Alexander Lauer +3
cs.AI 2026-05-06 reviewed

Grid overlay beats semantic prompts for LLM chart reading
Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

Andrei Lazarev +2
cs.SE 2026-05-06 reviewed

Syntax routing lets small code models exceed large-model accuracy at 58 percent lower cost
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

Kishanthan Thangarajah +2
cs.SE 2026-05-06 reviewed

LLM agents classify repo artifacts competitively without context limits
Agentic Repository Mining: A Multi-Task Evaluation

Johannes H\"artel
cs.SE 2026-05-06 reviewed

Case study maps SIL rules and memory limits in real car software
Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework

Tobias Denzinger (CARIAD SE) +2
cs.SE 2026-05-06 reviewed

Developers accept most LLM refactoring suggestions unchanged
Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions

David Sch\"on +6
cs.SE 2026-05-06 reviewed

Bug tools need more than accuracy to be adopted
Toward an Understanding of Developer Behaviour while Using Bug Localization Tools

Pablo Diaz Pedreira +2
cs.SE 2026-05-06 reviewed

GenAI speeds up coding but leaves learning unchanged
A meta-analysis of the effect of generative AI on productivity and learning in programming

Sebastian Maier +4
cs.SE 2026-05-06 reviewed

Function chunking lowers RAG code completion performance
How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study

Xinjian Wu +3
cs.CR 2026-05-06 reviewed

Spec-guided fuzzer finds 24 new bugs in industrial protocols
AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing

Jiaying Meng +4
cs.HC 2026-05-06 reviewed

AI pipeline with teacher review upgrades peer feedback quality
AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education

Alvaro Becerra +2
cs.HC 2026-05-06 reviewed

AI tool scales rubric feedback for student slides
AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations

Alvaro Becerra +2
cs.LG 2026-05-06 reviewed

Controlled protocols shrink attention-model gains in PKT
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

Jaewook Kim +1
cs.AR 2026-05-06 reviewed

LLM framework builds UVM testbenches in 4.5 hours at 95.65% coverage
UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification

Junhao Ye +9
cs.SE 2026-05-06 reviewed

Training data flaws cause code defects in LLMs via 18 paths
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

Kaifeng He +9
cs.SE 2026-05-06 reviewed

LLM evolution with runtime targets yields 15x Java speedup
CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

Ajay Krishna Borra +11
cs.CY 2026-05-06 reviewed

Digital twin trust maps to four integration patterns across domains
Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons

Chi Fai David Lam (1) +3
cs.MA 2026-05-06 reviewed

AI app builders oversimplify specs and skip secure backends
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

Siddhant Saxena +2
cs.AI 2026-05-06 reviewed

Screening cuts agent-repair leaderboard rank shifts by 62%
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

Yuelin Hu +4
cs.SE 2026-05-06 reviewed

Fine-tuned reranker improves code search across three tasks
Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Siqiao Xue +6
cs.SE 2026-05-06 reviewed

Fine-tuned reranker lifts code search on all three tasks
Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Siqiao Xue +6
cs.SE 2026-05-06 reviewed

AI coding tools shift responsibility for errors to users
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

Christoph Treude
cs.SE 2026-05-06 reviewed

Declarative YAML lets AI agents run any scientific workflow
PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation

Yuchen Wang +1
cs.SE 2026-05-06 reviewed

Governance routines cut enterprise tech modernization effort by 30 percent
EMRGF: A Practitioner Framework for Governance-Driven Enterprise Technology Modernization

Harveen Punihani
cs.AI 2026-05-06 reviewed

Model benchmarks cannot confirm deployed AI alignment
Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

Varad Vishwarupe +3
cs.SE 2026-05-06 reviewed

Automatic framework fixes failures in LLM reinforcement tuning
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

Lingzhe Zhang +8
cs.AI 2026-05-05 reviewed

Context hurts design exploration on some tasks by 46%
When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

Saranyan Vigraham
cs.SE 2026-05-05 reviewed

Benchmark and adapted tool generate Java reproduction tests from issues
Reproduction Test Generation for Java SWE Issues

Toufique Ahmed +3
cs.SE 2026-05-05 reviewed

Adapted tool generates reproduction tests for Java issues
Reproduction Test Generation for Java SWE Issues

Toufique Ahmed +3
cs.CR 2026-05-05 reviewed

Token n-grams and code metrics triage C vulnerabilities at PR-AUC 0.64
Lightweight Vulnerability Detection from Code Metrics and Token Features

Chun Yin Chiu
cs.SE 2026-05-05 reviewed

EngThrive pairs speed, ease and quality metrics with wellbeing guardrails
EngThrive: Make It Fast and Easy to Do Great Work

Brian Houck +3
cs.CR 2026-05-05 reviewed

Kumushi steers LLMs to root causes for deeper vulnerability fixes
Root-Cause-Driven Automated Vulnerability Repair

Hulin Wang +15
cs.SE 2026-05-05 reviewed

AI extracts PDF data to audit every transaction
Automated Population-Level Audit Assurance via AI-Based Document Intelligence

Santosh Vasudevan +1
cs.HC 2026-05-05 reviewed

50 testing tools converge on similar output formats
Exploring the Output of Software Testing Tools through a Visual Comparative Analysis

Brandon Lit +2
cs.SE 2026-05-05 reviewed

Agents negotiate stable software module decompositions
A Multi-Agent Consensus Protocol for Stable Software Remodularization

Ahmed F. Ibrahim
cs.CR 2026-05-05 reviewed

Embeddings link source code to decompiled binaries without names
Identifier-Free Code Embedding Models for Scalable Search

Eric Wolos +1
cs.CY 2026-05-05 reviewed

NeurIPS should require reproducible evidence for AI safety claims
NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims

Varad Vishwarupe +3
cs.SE 2026-05-05 reviewed

RL agent raises Rust static analysis precision from 26% to 59%
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

P Akilesh +3
cs.SE 2026-05-05 reviewed

RL agent cuts false positives in Rust safety checks
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

P Akilesh +3
cs.SE 2026-05-05 reviewed

Requirements engineering divides into two isolated pathways
Two Integration Pathways in Human-Centered Requirements Engineering: A Systematic Mapping Study of Structural Gaps

Imen Benzarti +4
cs.SE 2026-05-05 reviewed

Brick-Circuit generator spans quantum states more uniformly at low depth
Randomized and Diverse Input State Generation for Quantum Program Testing

Maryse Ernzer +3