archive
Every paper Pith has read. Search by title, abstract, or pith.
1797 papers in cs.SE · page 10
-
LLM method generates PoV tests showing feasible attacks in 55 percent of cases
Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software
-
Staged tickets induce vulnerable code in coding agents at 53-86% rates
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
-
LLM linting outperforms rules for quantum programs
Beyond Rules: LLM-Powered Linting for Quantum Programs
-
ICU risk prediction improves as clinical pathways unfold
From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways
-
Clinical risk estimates improve with each new patient event
From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways
-
Dynamic knowledge base generates resilient Rust formal proofs
KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code
-
AI Advocates catalyze squad shift to human-AI hybrid teams
AI Advocate: Educational Path to Transform Squads to the Future
-
Public firmware for crypto miners reveals exploitable flaws
Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners
-
Public firmware reveals remote attack paths in most ASIC miners
Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners
-
HPC workflows pause for human input without idling compute resources
A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments
-
Graph features fused into LLM layers lift code generation scores
Deep Graph-Language Fusion for Structure-Aware Code Generation
-
Better-connected U.S
Geographic Variation in Stack Overflow Code Quality: Evidence from a Cross-Regional Study of Coding Practices
-
Commit time series alpha flags software stability
Long-Range Correlation in Code Commit Dynamics as a Novel Indicator of Software Product Stability: A Detrended Fluctuation Analysis Study
-
No language model fully rebuilds any program from scratch
ProgramBench: Can Language Models Rebuild Programs From Scratch?
-
LLM agents use tree search to find root causes in microservices
Multi-Agent Systems for Root Cause Analysis in Microservices
-
Zorya detects seven bugs in gc Go binaries
From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries
-
Zorya detects seven bugs in real-world Go binaries
From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries
-
AI models recover semantics from legacy database code
Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI
-
Provenance graph auditing cuts LLM agent injection success to 3.8%
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
-
Benchmark shows LLMs miss complete postconditions on real code
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
-
Three cryptographic layers block dependency confusion attacks
Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems
-
Procedure turns abstract SE theories into testable hypotheses
Operationalizing Software Engineering Theories for Practical Validation
-
Sustainable scientific software shows higher test coverage
Exploring Sustainability in Scientific Software through Code Quality & Test Coverage Metrics
-
Semantic matching raises project assignment quality to 0.74 cosine similarity
TeamUp: Semantic Project Matching and Team Formation for Learning at Scale
-
YAML descriptions cut LLM tool context 142 times
DADL: A Declarative Description Language for Enterprise Tool Libraries in LLM Agent Systems
-
Kerncap turns full AMD GPU apps into isolated kernel reproducers in one command
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs
-
Kerncap extracts isolated kernels from 30 GB AMD GPU apps in one command
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs
-
4B model matches frontier LLMs at terminal execution
Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?
-
Five LLMs label 1,554 prompts as executable malicious code requests
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
-
Few traces validate complex agent behavior accurately
Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents
-
Data-flow graph lifts agent repair success 4.7 points
ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair
-
ARIS pairs executor with cross-family reviewer to verify research claims
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
-
Knowledge graph lets AI generate real DeFi exploits at 96 percent success
EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs
-
Stabilized distillation makes compact models reliable for cross-language code clones
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection
-
Workflow templates speed sensor app prototyping for non-experts
From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications
-
AI reuses sensor workflow template to cut dev time to 1-2 days
(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications
-
Tunable rules for human-AI tasks cut fatigue while raising output
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems
-
Tighter governance lifts manufacturing output and cuts fatigue
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems
-
AI code volume predicts structural decay almost perfectly
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development
-
Schema compiler lifts small LLMs to 84% tool accuracy at scale
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
-
Structured specs let LLMs build whole repositories
LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering
-
Causal models replace correlations for software decisions
Causal Software Engineering: A Vision and Roadmap
-
Blackboard MCTS lifts LLM Pass@1 on contest programming benchmarks
ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation
-
Symbolic index gives LLMs zero-defect view of large codebases
AOCI: Symbolic-Semantic Indexing for Practical Repository-Scale Code Understanding with LLMs
-
LLM tool helps map uncertainties in self-adaptive robots
Human-in-the-Loop Uncertainty Analysis in Self-Adaptive Robots Using LLMs
-
MDE user models found disconnected and mostly static
A Low-Code Approach for the Automatic Personalization of Conversational Agents
-
AI pull requests mostly get AI reviews or none
These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests
-
63,533-commit benchmark aids AI for commit messages
CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation
-
Triadic data unlocks long-horizon work for engineering agents
The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents
-
LLM repair models drop over 50% on minor code tweaks
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair