archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 7

cs.SE 2026-05-10 reviewed

Regional zoom beats global Pareto in 84-89% of SE tasks
Zoom, Don't Wander: Why Regional Search Outperforms Pareto Reasoning and Global Optimization in Budget-Constrained SBSE

Kishan Kumar Ganguly +1
cs.MA 2026-05-10 reviewed

LLM smart contracts score 8.29 points above human versions
SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

Abhinav Goel +3
cs.SE 2026-05-10 reviewed

ConCovUp lifts concurrency test coverage from 37% to 68%
ConCovUp: Effective Agent-Based Test Driver Generation for Concurrency Testing

Yuandao Cai +5
cs.SE 2026-05-10 reviewed

Belief-revision agents verify code authorship without training
MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

Jingwei Ye +7
cs.SE 2026-05-10 reviewed

Multi-agent belief revision verifies code authors without training
MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

Jingwei Ye +7
cs.SE 2026-05-10 reviewed

Multi-agent system verifies code authorship without training
MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

Jingwei Ye +7
cs.SE 2026-05-10 reviewed

Ethical safeguards prioritized in cost model for LLM education use
Prediction Model of Motivators and Demotivators of Integrating Large Language Models in Software Engineering Education: An Empirical Study

Maryam Khan +3
cs.SE 2026-05-10 reviewed

Model optimizes cost-efficient LLM integration in software engineering classes
Prediction Model of Motivators and Demotivators of Integrating Large Language Models in Software Engineering Education: An Empirical Study

Maryam Khan +3
cs.SE 2026-05-10 reviewed

Execution traces create first noise-free test for LLM code understanding
An Execution-Verified Multi-Language Benchmark for Code Semantic Reasoning

Yikun Li +9
cs.LG 2026-05-10 reviewed

LLM sim code runs but solves wrong physics
Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code

Zhenghan Song +6
cs.SE 2026-05-10 reviewed

Merlin turns natural language into CodeQL queries that raise accuracy 3.8x
Generating Complex Code Analyzers from Natural Language Questions

Amirmohammad Nazari +5
quant-ph 2026-05-10 reviewed

Memoized heuristics scale ion-trap qubit mapping
Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

Brent Russon +3
cs.DB 2026-05-09 reviewed

Krone decomposes logs into entity-action-status units for modular anomaly detection
Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

Lei Ma +7
cs.AI 2026-05-09 reviewed

Line-level rewards raise program repair success to 40.7% on SWE-bench
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

Yuanhao Li +5
cs.AI 2026-05-09 reviewed

Line-level credit in RL lifts program repair to 40.7% on SWE-bench
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

Yuanhao Li +5
cs.AI 2026-05-09 reviewed

Dual rewards boost code repair to 40.7% on SWE-bench
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

Yuanhao Li +5
cs.SE 2026-05-09 reviewed

Developer reviews expose LLM code flaws missed by benchmarks
Evaluating LLM-Generated Code: A Benchmark and Developer Study

Joanna Szych +1
cs.SE 2026-05-09 reviewed

Fuzzer finds 64 inconsistencies in Solidity compilers
ParityFuzz: Finding Inconsistencies across Solidity Compilers via Fine-Grained Mutation and Differential Analysis

Bowei Su +4
cs.AI 2026-05-09 reviewed

AI safety guarantees proven in the framework
Containment Verification: AI Safety Guarantees Independent of Alignment

Royce Moon +1
cs.SE 2026-05-09 reviewed

Semantic distance beats disagreement counts for LLM code uncertainty
Using Semantic Distance to Estimate Uncertainty in LLM-Based Code Generation

Weilin He +2
cs.SE 2026-05-09 reviewed

Skill drift is contract violation in LLM agent libraries
Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

Linfeng Fan +3
cs.SE 2026-05-09 reviewed

Three-layer gate turns agent failures into bounded fixes
Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Chenyu Zhao +9
cs.SE 2026-05-09 reviewed

LLMs mine tactics that let CoqHammer prove 24% more theorems
A Learning Method for Symbolic Systems Using Large Language Models

Jian Fang +2
cs.SE 2026-05-09 reviewed

Execution fingerprints beat text voting for LLM code
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation

Shan Jiang +2
cs.LG 2026-05-09 reviewed

Sketching strategies outperforms flat sampling for code at fixed budget
Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

Shan Jiang +2
cs.SE 2026-05-09 reviewed

EvidenT repairs 54% of RISC-V package build failures
EvidenT: An Evidence-Preserving Framework for Iterative System-Level Package Repair

Chenyu Zhao +7
cs.SE 2026-05-08 reviewed

Models reach 92 percent on code but only 5 percent on provable code
VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

Zichen Xie +7
cs.LG 2026-05-08 reviewed

Benchmark reveals CUDA LLM fixers often degenerate code for tests
CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

Shiyang Li +3
cs.SE 2026-05-08 reviewed

Dataset collects 15k configs for AI coding tools
A Dataset of Agentic AI Coding Tool Configurations

Matthias Galster +6
cs.SE 2026-05-08 reviewed

AI agents omit runtime details in their own technical talks
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

Junyu Huo +3
cs.SE 2026-05-08 reviewed

AI Agents Talk Security and Trust More Than Specific Code Issues
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

Junyu Huo +3
cs.LG 2026-05-08 reviewed

Benchmark scores coding agents on engineering quality beyond bug fixes
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

Mohit Raghavendra +14
cs.CR 2026-05-08 reviewed

Hardware attestation signs build provenance without trusting operators
Kettle: Attested builds for verifiable software provenance

Amean Asad +1
cs.LG 2026-05-08 reviewed

Cyclic tuning raises RAG quality by up to 54 percent
CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

Pengzhou Chen +1
cs.SE 2026-05-08 reviewed

AI agents start most PRs but humans keep merge authority
Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Young Jo (seph) Chung +1
cs.SE 2026-05-08 reviewed

Collaborator AIs open most PRs while humans keep merge control
Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Young Jo (seph) Chung +1
cs.CL 2026-05-08 reviewed

Adding one vector switches which tool a language model calls
Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu (1 +9
cs.SE 2026-05-08 reviewed

Similar past faults annotated to guide LLMs in test code
Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

Golnaz Gharachorlu +4
cs.SE 2026-05-08 reviewed

Trace comparison creates a score for design conformance
Evaluating Design Conformance Through Trace Comparison

Reid Anderson +1
cs.SE 2026-05-08 reviewed

One rules engine powers play
Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development

Timothy C. Cogan
cs.SE 2026-05-08 reviewed

Bidirectional analysis finds 118 unsafe flows in 87 MCP servers
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

Xinyi Hou +2
cs.CR 2026-05-08 reviewed

Security designs link to code checks in only a few ways
Can I Check What I Designed? Mapping Security Design DSLs to Code Analyzers

Sven Peldszus +5
cs.SE 2026-05-08 reviewed

Unified AST labels and graph matching link equivalent code across languages
Bridging the Programming Language Gap: Constructing a Multilingual Shared Semantic Space through AST Unification and Graph Matching

Junhao Chen +4
cs.SE 2026-05-08 reviewed

Agents patch code on 35-65% of already-fixed bugs
Coding Agents Don't Know When to Act

Thibaud Gloaguen +4
cs.SE 2026-05-08 reviewed

Neuro-symbolic method detects threats in stripped industrial binaries
Securing the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Software

Bowei Ning +6
cs.SE 2026-05-08 reviewed

SARC enforces agent constraints at runtime for zero hard violations
SARC: A Governance-by-Architecture Framework for Agentic AI Systems

Gaston Besanson
cs.SE 2026-05-08 reviewed

Manifesto recasts scaled agile around AI as first-class participant
The AI-Native Large-Scale Agile Software Development Manifesto

Ricardo Britto +3
cs.SE 2026-05-08 reviewed

Manifesto puts AI at core of large-scale agile development
The AI-Native Large-Scale Agile Software Development Manifesto

Ricardo Britto +3
cs.SE 2026-05-08 reviewed

Search tunes LLMs to cut harmful responses
SafeTune: Search-based Harmfulness Minimisation for Large Language Models

Giordano d'Aloisio +5
cs.LG 2026-05-08 reviewed

First benchmark supplies real data for LLM hyperparameter tuning
LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems

Siyu Wu +5