archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 5

cs.CR 2026-05-13 reviewed

Tool finds 545 reference counting bugs in Linux kernel drivers
Automatic Detection of Reference Counting Bugs in Linux Kernel Drivers

Joe Hattori +2
cs.CR 2026-05-13 reviewed

DrvHorn uncovers 545 reference counting bugs in Linux v6.6 drivers
Automatic Detection of Reference Counting Bugs in Linux Kernel Drivers

Joe Hattori +2
cs.AI 2026-05-13 reviewed

Contrastive semantic model improves code translation
Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

Yuhan Wu +5
cs.SE 2026-05-13 reviewed

LLMs lag experts on system-level performance code
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Huihao Jing +7
cs.SE 2026-05-13 reviewed

Toolkit standardizes benchmarks for screenshot-to-code models
UIBenchKit: A unified toolkit for design-to-code model evaluation

Chinh T. Le +4
cs.SE 2026-05-13 reviewed

Code agents solve far fewer issues in full cycles than isolated tasks
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle

Hao Guan +10
cs.SE 2026-05-13 reviewed

Code models miss over 93% of fixes from changes alone
Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose +4
cs.CR 2026-05-13 reviewed

Bonuses for security scans cut issue density in team code
Security Incentivization: An Empirical Study of how Micropayments Impact Code Security

Stefan Rass +7
cs.CL 2026-05-13 reviewed

LLM JSON stays valid inside tight token budgets
TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints

Yoshio Kato +1
cs.SE 2026-05-13 reviewed

Deeper thought per algorithm beats more candidates under fixed tokens
Effective Harness Engineering for Algorithm Discovery with Coding Agents

Yoichi Ishibashi +2
cs.SE 2026-05-13 reviewed

Protocols govern generated code via invariants and evidence chains
Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

Jun He +1
cs.SE 2026-05-13 reviewed

Protocols admit generated code only via signed compliance evidence
Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

Jun He +1
cs.SE 2026-05-13 reviewed

Protocols, not code, decide if generated software is admissible
Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

Jun He +1
cs.SE 2026-05-13 reviewed

10.7% of SWE-agent passes are lucky trial-and-error
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Priyam Sahoo +6
cs.SE 2026-05-13 reviewed

Metadata layer turns legacy SAS reports into AI-ready data
A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study

Jaime Yan
cs.SE 2026-05-12 reviewed

Open-source projects follow product life cycles
Project Life Cycles in Open-Source Software

Sanjiv Das +5
cs.SE 2026-05-12 reviewed

cozy is a comparative binary analysis tool that uses symbolic execution to find…
Finding a Crab in the C: Assured Translation via Comparative Symbolic Execution

Caleb Helbling +2
eess.SY 2026-05-12 reviewed

Natural language runs grid analyses in under two minutes
Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics

Boming Liu +2
cs.SE 2026-05-12 reviewed

Lattice structures LLM judgments for reliable program analysis
Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

Jacqueline L. Mitchell +1
cs.SE 2026-05-12 reviewed

LLMs match human accuracy in spotting usability requirements in reviews
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

Cedric Wellhausen +2
cs.SE 2026-05-12 reviewed

Fine-tuned open LLM matches ChatGPT on code feedback quality
Fine-Tuning Models for Automated Code Review Feedback

Smitha S Kumar +3
eess.SY 2026-05-12 reviewed

Docker container makes Basilisk GN&C simulations reproducible
Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference

Anubhav Gupta
cs.SE 2026-05-12 reviewed

Nine LLM audits on prompts found 51 defects and converged to zero
Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance

Elias Calboreanu
cs.SE 2026-05-12 reviewed

MinTEJ terminal editor for Julia uses less memory than VS Code
Minimalistic Terminal Editor for Julia Programming -- MinTEJ: A Friendly Approach for a Scientific Programmer

Poornachandratejasvi Laxman Bhattar +3
cs.SE 2026-05-12 reviewed

LLMs fail most at strategy in GitHub issue fixes
Characterizing the Failure Modes of LLMs in Resolving Real-World GitHub Issues

Yanjie Jiang +5
cs.SE 2026-05-12 reviewed

Partial programs control risk in LLM code generation
Uncertainty Quantification for LLM-based Code Generation

Senrong Xu +8
cs.SE 2026-05-12 reviewed

Dataset delivers 449 reproducible locator breaks in web GUI tests
ReproBreak: A Dataset of Reproducible Web Locator Breaks

Thiago Santos de Moura +3
cs.SE 2026-05-12 reviewed

Dataset supplies 2440 proprietary industrial repositories
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research

Vladislav Savenkov
cs.SE 2026-05-12 reviewed

Harness design stabilizes small language models at 95 percent success
It's Not the Size: Harness Design Determines Operational Stability in Small Language Models

Yong-eun Cho
cs.SE 2026-05-12 reviewed

Metamorphic testing and LLMs strengthen each other for AI quality checks
Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey

Zheng Zheng +4
cs.SE 2026-05-12 reviewed

Framework embeds values in CPS human monitoring rules
HM-Req: A Framework for Embedding Values within CPS Human Monitoring Requirements

Zoe Pfister +2
cs.PL 2026-05-12 reviewed

Diversified replicas detect correlated faults by ignoring addresses
Divergent Multi-Version Execution (DME): Canonical Instruction-Trace Fault Detection via Structural Address-Space Decorrelation

Petro Baran Yrievich
cs.AI 2026-05-12 reviewed

Microservices process thousands of documents per hour with OCR and LLMs
Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

Yao Fehlis +11
cs.SE 2026-05-12 reviewed

Agent decision traces vary up to 43 points in completeness across SDKs
Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes

Oleg Solozobov
cs.SE 2026-05-12 reviewed

Guided LLMs translate APL legacy code to working C#
Neural Code Translation of Legacy Code: APL to C#

Abdulrahman Ramadan +4
cs.SE 2026-05-12 reviewed

Print statements teach code models to reason step by step
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Hao Wang +3
cs.SE 2026-05-12 reviewed

Value and popularity drive OSS survival
The Death Spiral of Open Source Projects: A Post-Mortem Analysis of Pull Request Workflow Dynamics

Mohit Kaushik +1
cs.AI 2026-05-12 reviewed

Compiled interfaces cut agent token use by 57%
SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

Duling Xu +6
cs.SE 2026-05-12 reviewed

Bug localization replication fails after fixing data leak
An Extensive Replication Study of the ABLoTS Approach for Bug Localization

Feifei Niu +7
cs.SE 2026-05-12 reviewed

SMT-LLM resolves Python deps at 83.6 percent
Breaking the Dependency Chaos: A Constraint-Driven Python Dependency Resolution Strategy with Selective LLM Imputation

Kowshik Chowdhury +2
cs.SE 2026-05-12 reviewed

Seminar sets six research priorities for agents and software engineering
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar

Davide Taibi +17
cs.CR 2026-05-12 reviewed

597-line harness supports fair comparisons of LLM pen-testing agents
Cochise: A Reference Harness for Autonomous Penetration Testing

Andreas Happe +1
cs.SE 2026-05-12 reviewed

Compiler feedback lifts neural decompilation success to 83.9 percent
Decaf: Improving Neural Decompilation with Automatic Feedback and Search

Alexander Shypula +2
cs.SE 2026-05-12 reviewed

Mined tokens lift LLM flaky test F1-score to 69.34%
NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification

Khondaker Tasnia Hoque +1
cs.CR 2026-05-12 reviewed

Risk lattice turns consent clicks into reusable options
Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

Ying Li +6
cs.SE 2026-05-11 reviewed

LLMs generate natural language specs to verify code compositionally
Natural Language based Specification and Verification

Zhaorui Li +1
cs.LG 2026-05-11 reviewed

Ranking own code attempts boosts single-sample accuracy
Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

Yizhu Jiao +5
cs.LG 2026-05-11 reviewed

Ranking own code attempts boosts single-rollout accuracy to match Best-of-4
Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

Yizhu Jiao +5
cs.SE 2026-05-11 reviewed

SysML model drives hardware verification directly via server link
SHIA: A Direct SysML-Hardware Interface Architecture for Model-Centric Verification

Charles Lewis +2
cs.CR 2026-05-11 reviewed

4714 GitHub workflows hijackable via crafted comments
Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

Neil Fendley +4