archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 18

cs.SE 2026-04-20 reviewed

Two-agent system repairs LLM agent bugs more effectively
SelfHeal: Empirical Fix Pattern Analysis and Bug Repair in LLM Agents

Niful Islam +2
cs.SE 2026-04-19 reviewed

Three patterns mark how teams respond to GitHub Actions failures
Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption

Ali Khatami +2
cs.AI 2026-04-19 reviewed

Hugging Face data drives dynamic AI model card updates
Toward Reusability of AI Models Using Dynamic Updates of AI Documentation

Peter Bajcsy +1
cs.SE 2026-04-19 reviewed

AI code shows 1.8 times more quiet-failure risks than human code
AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code

William M. Parris
cs.SE 2026-04-19 reviewed

Logging tools need multilingual checks to be reliable
Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs

Renyi Zhong +5
cs.SE 2026-04-19 reviewed

QRisk cuts quantum noise 45% by avoiding recurring error patterns
Isolating Recurring Execution-Dependent Abnormal Patterns on NISQ Quantum Devices

Zhenyu Qi +4
cs.SE 2026-04-19 reviewed

Analysis extracts unit tests from integration tests
Augmenting unit test suites from integration tests

Katerina Paltoglou +1
cs.SE 2026-04-19 reviewed

Technology research software forms its own overlooked category
Technology Research Software: An Often Overlooked Category of Research Software

Wilhelm Hasselbring +2
cs.SE 2026-04-19 reviewed

Reverse-engineered specs yield 94% APR success on Defects4J
Project Prometheus: Bridging the Intent Gap in Agentic Program Repair via Reverse-Engineered Executable Specifications

Yongchao Wang +1
cs.CY 2026-04-19 reviewed

Adaptive AI personas teach coding tool use
Agentic Education: Using Claude Code to Teach Claude Code

Zain Naboulsi
cs.SE 2026-04-19 reviewed

Modeling projects as networks provides more consistent estimates of resilience to key…
Project resilience as network robustness

Sebastiano A. Piccolo +1
cs.SE 2026-04-19 reviewed

ML automation targets RISC-V certification costs for cars
RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification

Nick Andreasyan +4
cs.SE 2026-04-19 reviewed

Models pass tests by regenerating code
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Wang Bill Zhu +7
cs.SE 2026-04-19 reviewed

LLMs pass 76% of tests but edit with under 45% precision
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Wang Bill Zhu +7
cs.SE 2026-04-19 reviewed

LLMs detect design patterns with promising accuracy
A Pilot Study on Detecting Software Design Patterns with Large Language Models: An Empirical Evaluation

Oishik Chowdhury +2
cs.SE 2026-04-19 reviewed

KnowPilot improves domain text generation by merging priors
KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks

Zekun Xi +7
cs.SE 2026-04-19 reviewed

T2MRec matches tasks to MCP servers via semantic and structural cues
From Language to Action: Enhancing LLM Task Efficiency with Task-Aware MCP Server Recommendation

Shiyu He +5
cs.SE 2026-04-19 reviewed

Kimi-K2.5 at 3 bits tops models on React Native app task
React-ing to Grace Hopper 200: Five Open-Weights Coding Models, One React Native App, One GH200, One Weekend

Alex Potanin
cs.SE 2026-04-19 reviewed

Personas in requirements engineering align clinical AI trainers with real practice
Persona-Based Requirements Engineering for Explainable Multi-Agent Educational Systems: A Scenario Simulator for Clinical Reasoning Training

Weibing Zheng +5
cs.SE 2026-04-19 reviewed

Adaptive router lifts LLM code repair accuracy by 32 percent
SynthFix: Adaptive Neuro-Symbolic Code Vulnerability Repair

Yifan Zhang +4
cs.SE 2026-04-19 reviewed

MoE routing overlaps 11x random even for different code tokens
Layer-wise MoE Routing Locality under Shared-Prefix Code Generation: Token-Identity Decomposition and Compile-Equivalent Fork Redundancy

Shun-ichiro Hayashi +3
cs.SE 2026-04-18 reviewed

Agentic AI governance misses links from rules to provable actions
Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI

Christopher Koch +1
cs.SE 2026-04-18 reviewed

Real token tracking matches AI dev costs within 2%
AI Observability for Developer Productivity Tools: Bridging Cost Awareness and Code Quality

Happy Bhati +1
cs.SE 2026-04-18 reviewed

Local command center unifies dev tools and raises AI readiness
Workstream: A Local-First Developer Command Center for the AI-Augmented Engineering Workflow

Happy Bhati
cs.SE 2026-04-18 reviewed

Transfer from C++ improves Ruby and Rust repair Pass@1 by 17 points
HELO-APR: Enhancing Low-Resource Program Repair through Cross-Lingual Knowledge Transfer

Zhipeng Wang +7
cs.SE 2026-04-18 reviewed

Memory cascade resolves 86% of Python dependency issues
MEMRES: A Memory-Augmented Resolver with Confidence Cascade for Agentic Python Dependency Resolution

Dao Sy Duy Minh +5
cs.SE 2026-04-18 reviewed

Co-versioning run-time behavior with code reveals hidden changes
Treating Run-time Execution History as a First-Class Citizen: Co-Versioning Run-time Behavior alongside Code

Marcus Kessel
cs.SE 2026-04-18 reviewed

Gleaner sampler raises RCA accuracy above full dataset at 1 percent rate
Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics

Yifan Yang (1) +4
cs.SE 2026-04-18 reviewed

Prompt tweaks flip LLM judge verdicts on identical code
Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering

Zixiao Zhao +2
cs.SE 2026-04-18 reviewed

App reviews flag persistent ethical barriers in mobile apps
Exploring Ethical Concerns of Mobile Applications from App Reviews: A Literature Survey

Aakash Sorathiya +1
cs.SE 2026-04-18 reviewed

Prompt method halves AI bias sensitivity in software tasks
Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

Francesco Sovrano +2
cs.SE 2026-04-17 reviewed

AI slop creates a tragedy of the commons in software
AI Slop and the Software Commons

Sebastian Baltes +2
cs.AI 2026-04-17 reviewed

This paper empirically tests 22 agentic AI frameworks on three reasoning benchmarks and…
Agentic Frameworks for Reasoning Tasks: An Empirical Study

Zeeshan Rasheed +5
cs.HC 2026-04-17 reviewed

Conversational agents help high school students with CSP
Investigating Conversational Agents to Support Secondary School Students Learning CSP

Matthew Frazier +2
cs.SE 2026-04-17 reviewed

Survey of 280 researchers diagnoses barriers to cumulative knowledge in software
From Papers to Progress: Rethinking Knowledge Accumulation in Software Engineering

Jason Cusati +1
cs.SE 2026-04-17 reviewed

Fixing requirement mismatches raises LLM code success
Bridging the Gap between User Intent and LLM: A Requirement Alignment Approach for Code Generation

Jia Li +9
cs.SE 2026-04-17 reviewed

Multi-modal verifier raises certified synthesis success rate
Certified Program Synthesis with a Multi-Modal Verifier

Yueyang Feng +7
cs.SE 2026-04-17 reviewed

Contrastive training lifts LLM code detection accuracy to 78 percent
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Mahir Labib Dihan +1
cs.SE 2026-04-17 reviewed

The paper identifies a 'Keyword Shortcut' bias in existing code localization benchmarks…
Neurosymbolic Repo-level Code Localization

Xiufeng Xu +3
cs.AR 2026-04-17 reviewed

MLIR unifies equivalence checking from algorithms to netlists
EquivFusion: Unifying Hardware Equivalence Checking from Algorithms to Netlists via MLIR

Jiaying Zhu +6
cs.SE 2026-04-17 reviewed

The paper introduces flowR, a VS Code and Positron extension that builds dataflow graphs…
Supporting the Comprehension of Data Analysis Scripts

Florian Sihler +4
cs.SE 2026-04-17 reviewed

Small programs can have up to 76 configuration options
Small Yet Configurable: Unveiling Null Variability in Software

Xhevahire T\"ernava +3
cs.SE 2026-04-17 reviewed

Removals lag additions so toggle counts keep rising in large systems
Feature Toggle Dynamics in Large-Scale Systems: Prevalence, Growth, Lifespan, and Benchmarking

Xhevahire T\"ernava
cs.SE 2026-04-17 reviewed

QMutBench gives 700k quantum mutants to benchmark tests
QMutBench: A Dataset of Quantum Circuit Mutants

E\~naut Mendiluze Usandizaga +3
cs.SE 2026-04-17 reviewed

Tool pairs LLMs with symbolic checks to create Python contracts
SpecPylot: Python Specification Generation using Large Language Models

Ragib Shahariar Ayon +1
cs.SE 2026-04-17 reviewed

LLM evolves coding skill by generating its own failure tests
ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

Yixu Huang +2
cs.SE 2026-04-17 reviewed

One LLM improves code by making its own adversarial tests
ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

Yixu Huang +2
cs.SE 2026-04-17 reviewed

Model unites text, code and images in one retrieval system
CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

Jiahui Geng +3
quant-ph 2026-04-17 reviewed

The paper models quantum error budget allocation as a potential game among logical…
A Game Theoretic Approach for Optimizing Quantum Error Budget Distribution

Asif Akhtab Ronggon +1
cs.SE 2026-04-16 reviewed

Symbolic guardrails enforce 74% of agent safety policies
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong +4