archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 10

cs.CR 2026-05-05 reviewed

LLM method generates PoV tests showing feasible attacks in 55 percent of cases
Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Shravya Kanchi +4
cs.CR 2026-05-05 reviewed

Staged tickets induce vulnerable code in coding agents at 53-86% rates
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg +1
cs.SE 2026-05-05 reviewed

LLM linting outperforms rules for quantum programs
Beyond Rules: LLM-Powered Linting for Quantum Programs

Pietro Cassieri +4
cs.LG 2026-05-05 reviewed

ICU risk prediction improves as clinical pathways unfold
From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

Pasquale Ardimento +3
cs.LG 2026-05-05 reviewed

Clinical risk estimates improve with each new patient event
From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

Pasquale Ardimento +3
cs.SE 2026-05-05 reviewed

Dynamic knowledge base generates resilient Rust formal proofs
KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

Yuwei Liu +5
cs.SE 2026-05-05 reviewed

AI Advocates catalyze squad shift to human-AI hybrid teams
AI Advocate: Educational Path to Transform Squads to the Future

Carla Soares +5
cs.CR 2026-05-05 reviewed

Public firmware for crypto miners reveals exploitable flaws
Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners

Pierre Pouliquen +4
cs.CR 2026-05-05 reviewed

Public firmware reveals remote attack paths in most ASIC miners
Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners

Pierre Pouliquen +4
cs.DC 2026-05-05 reviewed

HPC workflows pause for human input without idling compute resources
A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments

Sergio Mendoza +7
cs.SE 2026-05-05 reviewed

Graph features fused into LLM layers lift code generation scores
Deep Graph-Language Fusion for Structure-Aware Code Generation

Mert Tiftikci +2
cs.SE 2026-05-05 reviewed

Better-connected U.S
Geographic Variation in Stack Overflow Code Quality: Evidence from a Cross-Regional Study of Coding Practices

Elijah Zolduoarrati +2
physics.soc-ph 2026-05-05 reviewed

Commit time series alpha flags software stability
Long-Range Correlation in Code Commit Dynamics as a Novel Indicator of Software Product Stability: A Detrended Fluctuation Analysis Study

Goran Mitevski
cs.SE 2026-05-05 reviewed

No language model fully rebuilds any program from scratch
ProgramBench: Can Language Models Rebuild Programs From Scratch?

John Yang +11
cs.SE 2026-05-05 reviewed

LLM agents use tree search to find root causes in microservices
Multi-Agent Systems for Root Cause Analysis in Microservices

Alexander Naakka +2
cs.CR 2026-05-05 reviewed

Zorya detects seven bugs in gc Go binaries
From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries

Karolina Gorna +4
cs.CR 2026-05-05 reviewed

Zorya detects seven bugs in real-world Go binaries
From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries

Karolina Gorna +4
cs.SE 2026-05-05 reviewed

AI models recover semantics from legacy database code
Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

Christian Mancas +1
cs.CR 2026-05-05 reviewed

Provenance graph auditing cuts LLM agent injection success to 3.8%
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng +5
cs.SE 2026-05-05 reviewed

Benchmark shows LLMs miss complete postconditions on real code
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

Gehao Zhang +1
cs.CR 2026-05-05 reviewed

Three cryptographic layers block dependency confusion attacks
Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems

Alan L. McCann
cs.SE 2026-05-05 reviewed

Procedure turns abstract SE theories into testable hypotheses
Operationalizing Software Engineering Theories for Practical Validation

Isaque Alves +3
cs.SE 2026-05-05 reviewed

Sustainable scientific software shows higher test coverage
Exploring Sustainability in Scientific Software through Code Quality & Test Coverage Metrics

Sheikh Md. Mushfiqur Rahman +2
cs.SE 2026-05-05 reviewed

Semantic matching raises project assignment quality to 0.74 cosine similarity
TeamUp: Semantic Project Matching and Team Formation for Learning at Scale

Dhruv Gulwani +3
cs.SE 2026-05-04 reviewed

YAML descriptions cut LLM tool context 142 times
DADL: A Declarative Description Language for Enterprise Tool Libraries in LLM Agent Systems

Axel Dunkel
cs.SE 2026-05-04 reviewed

Kerncap turns full AMD GPU apps into isolated kernel reproducers in one command
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

Cole Ramos +1
cs.SE 2026-05-04 reviewed

Kerncap extracts isolated kernels from 30 GB AMD GPU apps in one command
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

Cole Ramos +1
cs.AI 2026-05-04 reviewed

4B model matches frontier LLMs at terminal execution
Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

Spandan Garg +2
cs.CR 2026-05-04 reviewed

Five LLMs label 1,554 prompts as executable malicious code requests
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Richard J. Young +1
cs.AI 2026-05-04 reviewed

Few traces validate complex agent behavior accurately
Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

Reshabh K Sharma +2
cs.SE 2026-05-04 reviewed

Data-flow graph lifts agent repair success 4.7 points
ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair

Shahd Seddik +1
cs.SE 2026-05-04 reviewed

ARIS pairs executor with cross-family reviewer to verify research claims
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Ruofeng Yang +2
cs.CR 2026-05-04 reviewed

Knowledge graph lets AI generate real DeFi exploits at 96 percent success
EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

Ruichao Liang +7
cs.AI 2026-05-04 reviewed

Stabilized distillation makes compact models reliable for cross-language code clones
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

Mohamad Khajezade +2
cs.DC 2026-05-04 reviewed

Workflow templates speed sensor app prototyping for non-experts
From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Komal Thareja +2
cs.DC 2026-05-04 reviewed

AI reuses sensor workflow template to cut dev time to 1-2 days
(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Komal Thareja +2
cs.AI 2026-05-04 reviewed

Tunable rules for human-AI tasks cut fatigue while raising output
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

Vicente Pelechano +3
cs.AI 2026-05-04 reviewed

Tighter governance lifts manufacturing output and cuts fatigue
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

Vicente Pelechano +3
cs.SE 2026-05-04 reviewed

AI code volume predicts structural decay almost perfectly
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

Yuecai Zhu +2
cs.SE 2026-05-04 reviewed

Schema compiler lifts small LLMs to 84% tool accuracy at scale
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

Furkan Sakizli
cs.SE 2026-05-04 reviewed

Structured specs let LLMs build whole repositories
LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering

Shuzhao Feng +3
cs.SE 2026-05-04 reviewed

Causal models replace correlations for software decisions
Causal Software Engineering: A Vision and Roadmap

Roberto Pietrantuono +4
cs.SE 2026-05-04 reviewed

Blackboard MCTS lifts LLM Pass@1 on contest programming benchmarks
ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation

Minnan Wei +3
cs.SE 2026-05-04 reviewed

Symbolic index gives LLMs zero-defect view of large codebases
AOCI: Symbolic-Semantic Indexing for Practical Repository-Scale Code Understanding with LLMs

Jinshi Liu +5
cs.RO 2026-05-04 reviewed

LLM tool helps map uncertainties in self-adaptive robots
Human-in-the-Loop Uncertainty Analysis in Self-Adaptive Robots Using LLMs

Hassan Sartaj +4
cs.SE 2026-05-04 reviewed

MDE user models found disconnected and mostly static
A Low-Code Approach for the Automatic Personalization of Conversational Agents

Aaron Conrardy +2
cs.SE 2026-05-04 reviewed

AI pull requests mostly get AI reviews or none
These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests

Kacper Duma (1) +6
cs.SE 2026-05-04 reviewed

63,533-commit benchmark aids AI for commit messages
CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

Zirui Wan +5
cs.SE 2026-05-04 reviewed

Triadic data unlocks long-horizon work for engineering agents
The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents

Yelin Kim
cs.SE 2026-05-04 reviewed

LLM repair models drop over 50% on minor code tweaks
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

Fazle Rabbi +1