archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 14

cs.SE 2026-04-28 reviewed

Brief role-model stories in lectures support belonging in software courses
Supporting Belonging in Software Engineering Through Role Models Exposure

Ronnie de Souza Santos
cs.AI 2026-04-27 reviewed

Intent compilation turns partial goals into binding AI artifacts
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents

Maximiliano Armesto +1
cs.SE 2026-04-27 reviewed

Product context retrieval lifts AI coding compliance from 46% to 95%
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%

Drew Dillon +1
cs.HC 2026-04-27 reviewed

Speculative societies prompt OSS practitioners to rethink designer roles
What If We Work Together? Fostering Reflections on Designer Inclusion in Open Source Software Through Speculative Design

Rozhan Hozhabri Nezhad +2
cs.CL 2026-04-27 reviewed

Evidence rules stop research agents at the right time
Don\'t Stop Early: Scalable Enterprise Deep Research with Controlled Information Flow and Evidence-Aware Termination

Prafulla Kumar Choubey +7
cs.SE 2026-04-27 reviewed

LLMs biased to Python limit multilingual code tasks
Large Language Models for Multilingual Code Intelligence: A Survey

Chao Jiang +8
cs.CL 2026-04-27 reviewed

LLM auditors find fatal errors in agent benchmarks
BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks

Xinming Tu +5
cs.CY 2026-04-27 reviewed

Fine-tuning shifts AI safety scores in unpredictable ways
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

Emaan Bilal Khan +3
cs.SE 2026-04-27 reviewed

The paper introduces FGDM, a four-agent framework that converts code into flow graphs and…
FGDM: Reasoning Aware Multi-Agentic Framework for Software Bug Detection using Chain of Thought and Tree of Thought Prompting

Srita Padmanabhuni +4
cs.SE 2026-04-27 reviewed

Under-specified prompts raise code correctness on rich tasks
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

Amal Akli +3
cs.SE 2026-04-27 reviewed

Small finetuned model detects bad LLM code prompts at F1 0.80
Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli +3
cs.SE 2026-04-27 reviewed

Fine-tuned LLMs hit 1.00 structural fidelity on multi-file DSL edits
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

Sivajeet Chand +3
cs.SE 2026-04-27 reviewed

SLMs on phones work only when given the smallest tasks
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

William Oliveira
cs.SE 2026-04-27 reviewed

Mobile AI works reliably only when models do the least
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

William Oliveira
cs.SE 2026-04-27 reviewed

LLM tools break standard evaluation rules in software engineering
Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

Utku Boran Torun +3
cs.SE 2026-04-27 reviewed

Markov chains predict LLM agent success times from traces
Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

Phat T. Tran-Truong +1
cs.SE 2026-04-27 reviewed

Pipeline migrates monoliths to serverless with 100% deployment success
Mono2Sls: Automated Monolith-to-Serverless Migration via Multi-Stage Pipeline with Static Analysis

Xingyan Chen +4
cs.SE 2026-04-27 reviewed

Review of 80 studies charts transformer use for finding code vulnerabilities
A systematic literature Review for Transformer-based Software Vulnerability detection

Fiza Naseer +4
cs.SE 2026-04-27 reviewed

Automated checks match developer labels only 44-62% for code review bots
Understanding the Limits of Automated Evaluation for Code Review Bots in Practice

Veli Karakaya +3
cs.SE 2026-04-27 reviewed

Survey maps student AI use across capstone projects
How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study

Michael Mircea +3
cs.SE 2026-04-27 reviewed

Structured knowledge turns LLM training into debuggable code
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Chenkai Pan +8
cs.HC 2026-04-27 reviewed

Tool generates personas to boost OSS developer empathy
Putting a Face to the Issue: Fostering User Empathy of Open Source Software Developers With PersonaFlow

Boniface Bahati Tadjuidje +2
cs.SE 2026-04-27 reviewed

More reviewer bot comments slow agentic PR resolution
On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories

Syeda Kaneez Fatima +5
cs.SE 2026-04-27 reviewed

Models reach only 74% on code questions linking definitions to calls
SWE-QA: A Dataset and Benchmark for Complex Code Understanding

La\"ila Elkoussy (LRE +3
cs.CR 2026-04-27 reviewed

Multi-agent SZZ raises F1 scores for vulnerability commit detection by up to 65%
MAS-SZZ: Multi-Agentic SZZ Algorithm for Vulnerability-Inducing Commit Identification

Sicong Cao +6
cs.SE 2026-04-27 reviewed

Humans drive creativity in design even when using LLMs
Exploring Creativity in Human-Human-LLM Collaborative Software Design

Victoria Jackson +3
cs.LG 2026-04-27 reviewed

One plugin interface unifies controls across diffusion models
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

Zhongjie Duan +2
cs.SE 2026-04-27 reviewed

Evolving memory boosts private library code generation by 16%
MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

Mofei Li +3
cs.SE 2026-04-27 reviewed

Dynamic agents hit 95% success generating hardware reference models
RefEvo: Agentic Design with Co-Evolutionary Verification for Agile Reference Model Generation

Yifan Zhang +3
cs.SE 2026-04-27 reviewed

Basic agent with ADI fixes 63.8% of SWE-bench tasks
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis

Jiahong Xiang +4
cs.SE 2026-04-27 reviewed

Software framework lets AI close the business experimentation loop
Closing the Loop: A Software Framework for AI to Support Business Decision Making

Jeffrey Wong +1
cs.CR 2026-04-27 reviewed

Go projects contain 7,473 crypto API misuses with uneven detector coverage
Evaluating Cryptographic API Misuse Detectors for Go

Vivi Andersson +1
cs.SE 2026-04-27 reviewed

Developers link to full migration guides in 83% of pull requests
How Do Developers Use Migration Guides? A Case Study of Log4j

Takahiro Monno +4
cs.SE 2026-04-27 reviewed

Developers link to full migration guides in 83 percent of pull requests
How Do Developers Use Migration Guides? A Case Study of Log4j

Takahiro Monno +4
cs.AI 2026-04-27 reviewed

Benchmark plus sentiment predicts AI agent adoption
AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment

Yuxuan Gao +2
cs.SE 2026-04-27 reviewed

Linking bug reports to fixes lifts vulnerability detection to 0.941 F1
Vulnerability Identification by Harnessing Inter-connected Multi-Source Information

Liyou Chen +5
cs.SE 2026-04-27 reviewed

Multi-agent constraints make decompiled binaries executable in 84-97% of cases
Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery

Yifan Zhang +4
cs.PF 2026-04-26 reviewed

Optimas automates GPU code optimization with 100% correctness
Optimas: An Intelligent Analytics-Informed Generative AI Framework for Performance Optimization

Mohammad Zaeed +2
cs.CL 2026-04-26 reviewed

LLM system automates 45% of support sessions from copilot corrections
Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows

Nikita Borovkov +6
cs.SE 2026-04-26 reviewed

6-33% of code review comments in scientific software are not useful
Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

Sharif Ahmed +1
cs.SE 2026-04-26 reviewed

Five-layer AI agent matches top coding tools on benchmarks
KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant

Koushik Sen
cs.SE 2026-04-26 reviewed

Fine-tuned LLMs answer code queries with focused UML diagrams
Query2Diagram: Answering Developer Queries with UML Diagrams

Oleg Baryshnikov (1) +7
cs.CV 2026-04-26 reviewed

Frontier agents succeed in only 20% of multi-day coworker tasks
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Fanqing Meng +48
cs.SE 2026-04-26 reviewed

LLMs classify code review comments using comment and diff
Automated Classification of Human Code Review Comments with Large Language Models

Semih \c{C}a\u{g}lar +2
cs.SE 2026-04-26 reviewed

DAG modeling doubles agent failure detection over end-to-end checks
AgentEval: DAG-Structured Step-Level Evaluation for Agentic Workflows with Error Propagation Tracking

Dongxin Guo +2
cs.SE 2026-04-26 reviewed

Grammar loop aligns CPS safety rules with simulations
Grammar-Constrained Refinement of Safety Operational Rules Using Language in the Loop: What Could Go Wrong

Khouloud Gaaloul +3
cs.SE 2026-04-26 reviewed

Requirements guide tests to detect 22-25 more business logic bugs
Uncovering Business Logic Bugs via Semantics-Driven Unit Test Generation

Chen Yang +1
cs.SE 2026-04-26 reviewed

LLM uncertainty propagates across workflows and people
Uncertainty Propagation in LLM-Based Systems

Boming Xia +5
cs.SE 2026-04-25 reviewed

Agents link browser symptoms to backend causes at 19.7% accuracy
CUJBench: Benchmarking LLM-Agent on Cross-Modal Failure Diagnosis from Browser to Backend

Haoming Meng
cs.IR 2026-04-25 reviewed

Prompt chaining lifts LLM accuracy on scientific text classification
Automating Categorization of Scientific Texts with In-Context Learning and Prompt-Chaining in Large Language Models

Gautam Kishore Shahi +1