archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 3

cs.SE 2026-05-18 reviewed

Framework choice reverses meaning of agent behavior signals
Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Wei Ma +5
cs.SE 2026-05-18 reviewed

CommitDistill hits 0.75 retrieval rate from git history at 256-char budget
CommitDistill: A Lightweight Knowledge-Centric Memory Layer for Software Repositories

Divya Chukkapalli +4
cs.SE 2026-05-18 reviewed

Debating LLMs catch more code vulnerabilities
Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection

Xin Peng +7
cs.SE 2026-05-18 reviewed

Multi-model feedback doubles AI solves on contest problems
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback

Anika Tabassum +4
cs.SE 2026-05-18 reviewed

ProcBench detects process defects in LLM coding agents missed by outcome scores
ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

Jiawei He +6
cs.SE 2026-05-18 reviewed

Process benchmark catches mid-task defects in LLM coding agents
ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

Jiawei He +6
cs.CL 2026-05-18 reviewed

Tool localizes node errors in multi-agent LLM workflows
PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

Kazuki Kawamura +2
cs.LG 2026-05-18 reviewed

Two-level router cuts log QA latency 55%
LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems

Mert Coskuner +2
cs.SE 2026-05-18 reviewed

Verify gate turns agent completion into inspectable admission control
Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

Hai-Duong Nguyen +1
cs.SE 2026-05-18 reviewed

Verify gate renders multi-agent completions inspectable and fail-closed
Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

Hai-Duong Nguyen +1
cs.SE 2026-05-18 reviewed

Agentic RAG reaches 78% top-1 file bug localization
BLAgent: Agentic RAG for File-Level Bug Localization

Md Afif Al Mamun +1
cs.SE 2026-05-18 reviewed

Call-site context lifts code model pass rates
Contextualized Code Pretraining for Code Generation

Chen Liu +5
cs.SE 2026-05-18 reviewed

Two-stage LLM workflow verifies code against natural language rules
LLM-Based Static Verification of Code Against Natural-Language Requirements: An Industrial Experience Report

Zhi Quan Zhou +2
cs.LO 2026-05-18 reviewed

Retrieval system compresses Lean proofs over 70 percent
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Jialin Lu +6
cs.AI 2026-05-17 reviewed

AI feedback helps Scrum Masters spot their own negative emotions live
EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

Jingni Huang +1
cs.SE 2026-05-17 reviewed

Framework keeps AI-assisted scientific code traceable under NQA-1
Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability

Chaitanya Bhave +5
cs.LG 2026-05-17 reviewed

Guided checks at code boundaries boost translation pass rates
Verifier-Guided Code Translation via Meta-Step Decoding

Tianyang Zhou +4
cs.SE 2026-05-17 reviewed

CFS and GA tuning lift fault prediction accuracy to 88.4%
A Feature-Driven Framework for Software Fault Prediction

Ahmad Nauman Ghazi +5
cs.SE 2026-05-17 reviewed

LLMs subclassify invalid bug root causes and generate fixes
Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

Mahmut Furkan Gon +3
cs.SE 2026-05-17 reviewed

Inverted API exploration yields verified tool-call data
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs

Yuxuan Lu +14
cs.SE 2026-05-17 reviewed

Five-stage AI workflow could ease the code review bottleneck
Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review

H\"useyin \"Ozg\"ur Kamal{\i} +3
cs.SE 2026-05-17 reviewed

Multi-agent setup with graphs keeps business rules in legacy modernization
AgentModernize: Preserving Business Logic in Legacy Modernization with Multi-Agent LLMs and Behavioral Specification Graphs

Sheikh Nazib Ahmed +1
cs.SE 2026-05-17 reviewed

Agents fail 95% of SaaS tasks before business logic
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

Qingnan Ren +13
cs.SE 2026-05-17 reviewed

LLM Agent Builds Formal Models by Repairing Verification Errors
Event-B Agent: Towards LLM Agent for Formal Model Synthesis and Repair

Hongshu Wang +5
cs.SE 2026-05-17 reviewed

ContraFix fixes 84% of C/C++ vulnerabilities at low cost
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse

Simiao Liu +4
cs.SE 2026-05-17 reviewed

Memory layers raise repo vulnerability repair to 58%
MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair

Simiao Liu +5
cs.SE 2026-05-17 reviewed

Diagnostic probes recover 45-62% of mislabeled GUI failures
DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

Sirui Hong +5
cs.SE 2026-05-17 reviewed

DiagEval recovers 45-62% of misattributed GUI failures
DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

Sirui Hong +5
cs.SE 2026-05-17 reviewed

Models hit only 6 Mythos bug targets out of 54 attempts with files supplied
Benchmarking Mythos-Linked Bug Rediscovery

Isaac David +1
cs.SE 2026-05-17 reviewed

PLC-BinX predicts PLC binary toolchains with 100 percent accuracy
One Step Further: Understanding PLC Binaries Through Cross-Platform Reverse Engineering and Function-Level Semantic Analysis

Ang Jia +5
cs.SE 2026-05-17 reviewed

PLC-BinX predicts toolchain from binaries with 100% accuracy
One Step Further: Understanding PLC Binaries Through Cross-Platform Reverse Engineering and Function-Level Semantic Analysis

Ang Jia +5
cs.SE 2026-05-17 reviewed

Ontology organizes foundations of software languages
Towards an Ontology for the Foundations of Software Languages

Ralf L\"ammel
cs.SE 2026-05-17 reviewed

Block-level slicing triples LLM bug finds in 19K-line processor
Debug Like a Human: Scaling LLM-based Fault Localization to Processor Design via Block-Level Instruction-Oriented Slicing

Zizhen Liu +8
cs.SE 2026-05-17 reviewed

No LLM clears 80 percent on observation contract compliance
ContractBench: Can LLM Agents Preserve Observation Contracts?

Jicheng Wang +5
cs.SE 2026-05-17 reviewed

Context graphs guide LLMs to resolve code merge conflicts better
Rover: Context-aware Conflict Resolution with LLM

Qingyu Zhang +4
cs.SE 2026-05-17 reviewed

Automated TDD lifts AI web app success by 34-48 points
From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

Yuxuan Wan +5
cs.SE 2026-05-16 reviewed

Static checks boost diffusion code RL performance
Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

Shuyin Ouyang +4
cs.PL 2026-05-16 reviewed

Region allocators keep locality edge on modern hardware
Reconsidering "Reconsidering Custom Memory Allocation"

Nicolas van Kempen +1
cs.CR 2026-05-16 reviewed

LLM package hallucinations shrink to 4.6-6.1% but 127 names stay common
The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

Aleksandr Churilov (Independent Researcher)
cs.SE 2026-05-16 reviewed

LLMs skip hallucination-prone code tasks via execution checks
Task Abstention for Large Language Models in Code Generation

Yanke Zhou +6
cs.SE 2026-05-16 reviewed

Low-code DevOps speeds tasks but adds security and governance risks
Low-Code Paradox in DevOps: Security and Governance Insights from Practitioners

Muhammad Azeem Akbar +2
cs.CR 2026-05-16 reviewed

FIDO Times Firmware Inputs at Availability Checks to Lift Coverage
Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input Delivery

Shandian Shen +5
cs.SE 2026-05-15 reviewed

78% of open source AI policies allow GenAI contributions
AI Policy, Disclosure, and Human in the Loop: How Are Contribution Guidelines Adapting to GenAI?

Andre Hora +1
cs.SE 2026-05-15 reviewed

GitHub projects standardize on README
What's Inside a GitHub Repository? An Empirical Study on the Contents of 10K Projects

Andre Hora +2
cs.SE 2026-05-15 reviewed

Core compiler reuse via LSP powers fast IDE for Move
Optimizing an IDE for an Evolving Language Ecosystem

Adam Welc +4
cs.SE 2026-05-15 reviewed

LLM and search methods trade off strengths in fixing merge conflicts
LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

Heleno de Souza Campos Junior +1
cs.SE 2026-05-15 reviewed

AR test framework tracks stable areas in videos for 55.8% coverage
TARIPlay: A Test Framework for AR Applications based on Interactive Area Tracking in Playback Videos

Seyed Amir Mousavi +1
cs.SE 2026-05-15 reviewed

Gemini on trillion internal tokens cuts developer iterations 23%
Customizing an LLM for Enterprise Software Engineering

Aditya Kini +17
cs.SE 2026-05-15 reviewed

Adapted LLM cuts developer iterations by 23 percent
Customizing an LLM for Enterprise Software Engineering

Aditya Kini +17
cs.CR 2026-05-15 reviewed

Manufacturing ransomware recovery goes beyond backups
From Backup Restoration to Minimum Viable Factory Recovery: A Systematization of Ransomware Recovery in Manufacturing Systems

Chun Yin Chiu