archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 12

cs.SE 2026-04-30 reviewed

Util files show 2.75x higher vulnerability rates in mature projects
Unsafe and Unused? A History of Utility Code in Mature Open Source Projects

Brandon Keller +3
cs.SE 2026-04-30 reviewed

Leading LLM agent completes only 67% of live workflow tasks
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Chenxin Li +10
cs.SE 2026-04-30 reviewed

AI suitability in qualitative research depends on positivist versus non-positivist stance
To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research

Katja Karhu +2
cs.SE 2026-04-30 reviewed

Transformer fault diagnosis reaches 0.96 AUROC with graph method
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures

Sigma Jahan +3
cs.CY 2026-04-30 reviewed

AI trust can be measured via pillars and agentic interfaces
I hope we don't do to trust what advertising has done to love

Jade Alglave
cs.CY 2026-04-30 reviewed

AI trust needs pillars and vectors to stay meaningful
I hope we don't do to trust what advertising has done to love

Jade Alglave
cs.SE 2026-04-30 reviewed

Communication and teamwork top soft skills in 25 years of agile studies
Beyond Code, We Are People: A Systematic Mapping of 25 Years of Literature on Soft Skills in Agile Development Teams

Israely Lima +4
cs.AI 2026-04-30 reviewed

Four patterns split AI vision into fast reflexes and slow supervision
A Pattern Language for Resilient Visual Agents

Habtom Kahsay Gidey +2
cs.SE 2026-04-30 reviewed

Models generate Verilog from circuit diagrams without using the diagrams
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

Guang Yang +3
cs.SE 2026-04-30 reviewed

Tool detects 11 Angular code smells with over 88% accuracy
An Empirical Evaluation of Code Smell Detection in Angular Applications

Maykon Nunes +3
cs.CR 2026-04-30 reviewed

Zero-knowledge sets let consumers check SBOMs for specific risks privately
zkSBOM: Privacy-Preserving SBOM Sharing with Zero-Knowledge Sets

Tom Sorger +5
cs.SE 2026-04-30 reviewed

Evolving specs build requirements debt in AI car perception
Requirements Debt in AI-Enabled Perception Systems Development: An Industrial RE4AI Perspective

Hina Saeeda +1
cs.SE 2026-04-30 reviewed

Four-phase method flags NFT migration incompatibilities in advance
Feature-Centric Methodology for Analyzing Cross-Chain NFT Migration Compatibility

Mohd Sameen Chishti +2
cs.SE 2026-04-30 reviewed

Deployers gate LLM updates with contracts and targeted tests
Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti +2
cs.SE 2026-04-30 reviewed

AI supply chains hide four integrity gaps across 11,500 packages
The Grand Software Supply Chain of AI Systems

Carmine Cesarano +1
cs.SE 2026-04-30 reviewed

Technical and social heroes overlap by only 10 percent in Apache projects
Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity

Amit Kumar +4
cs.SE 2026-04-30 reviewed

Rubric framework makes LLM judges comparable in coding co-creation
LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding

Md Faizul Ibne Amin +5
cs.CR 2026-04-30 reviewed

Code representation choice drives LLM false positives across languages
How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen +5
cs.SE 2026-04-30 reviewed

Nearly half of template engine bugs cause silent wrong output
Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns

Kai Gao +2
cs.SE 2026-04-30 reviewed

Watermarking code datasets achieves 100% verification success
PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Haocheng Huang +7
cs.SE 2026-04-30 reviewed

N-version models lift API recommendation reliability to 83.8%
Tail-aware N-version Machine Learning Models for Reliable API Recommendation

Aoi Matsuda +2
cs.SE 2026-04-30 reviewed

UTAUT plus Bayesian analysis spots GenAI barriers in software teams
GenAI in Software Engineering: The Role of Technology Acceptance Models

Oscar Johansson +2
cs.SE 2026-04-30 reviewed

Newcomer GFI pull request merge rates fell from 62% to 42%
A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects

Hirotatsu Hoshikawa +4
cs.SE 2026-04-30 reviewed

ScaleBox scales accurate code verification for LLM training
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Jiasheng Zheng +10
cs.SE 2026-04-30 reviewed

Nygard's ADR template outperforms MADR in student usability test
One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

Fernando Nogueira +2
cs.CR 2026-04-30 reviewed

New benchmark standardizes LLM tests on stripped binary tasks
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won +3
cs.SE 2026-04-30 reviewed

Hybrid LLM and tool system creates explainable process models
Pragmos: A Process Agentic Modeling System

Pedro-Aar\'on Hern\'andez-\'Avalos +1
cs.SE 2026-04-30 reviewed

Adaptive diffs match full code edits at 30% lower cost
To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing

Wei Cheng +6
cs.SE 2026-04-29 reviewed

Agents evolve their goals and code on their own
Self-Evolving Software Agents

Marco Robol +1
cs.SE 2026-04-29 reviewed

CS Curricula Must Reframe Algorithms as Foundations for AI Systems
Now's the Time: Computer Science Must Evolve to Emphasize Software and Systems Engineering with Artificial Intelligence (AI)

Chandra N. Sekharan +1
cs.SE 2026-04-29 reviewed

Controller keeps AI research software aligned across 400 commits
Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Halley Young +1
cs.SE 2026-04-29 reviewed

Benchmark tests code repairs by re-running original CI workflows
CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

Rabeya Khatun Muna +2
cs.SE 2026-04-29 reviewed

Emote raises modular testing coverage by 15 percent
On the Effectiveness of Modular Testing in EvoSuite

Elizabeth Dinella
cs.AI 2026-04-29 reviewed

Bayesian calibration tunes LLM metrics to human ratings for model swaps
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

Emma Casey +3
cs.SE 2026-04-29 reviewed

Tool recreates 92% of failing embedded CI builds
Where did we fail? -- Reproducing build failures in embedded open source software

Han Fu +5
cs.SE 2026-04-29 reviewed

LLMs reach only 45.6% on class-level code benchmark
ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen +6
cs.SE 2026-04-29 reviewed

Hot fixes skip most tests and reviews
Hot Fixing in the Wild

Carol Hanna +5
cs.SE 2026-04-29 reviewed

AI coding tools erode engineers' root-cause skills
Cognitive Atrophy and Systemic Collapse in AI-Dependent Software Engineering

Frank Ginac
cs.DC 2026-04-29 reviewed

Test taxonomy with CI ecosystem improves HPC fault detection
A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

Petter Sand{\aa}s +3
cs.SE 2026-04-29 reviewed

RAPL tools add up to 47% time overhead at 1 kHz polling
What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

Jeremy Diamond +1
cs.SE 2026-04-29 reviewed

LLM-guided search finds efficient inference params in 3.4 prompts
LLM-Guided Runtime Parameter Optimization for Energy-Efficient Model Inference

Katelyn Crumpacker +1
cs.SE 2026-04-29 reviewed

Move cuts smart contract security checks by 60 percent
Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience

Matteo Vaccargiu +3
cs.SE 2026-04-29 reviewed

Move cuts explicit security checks by 60% in smart contracts
Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience

Matteo Vaccargiu +3
cs.SE 2026-04-29 reviewed

Model editing adapts service recommendations without full retraining
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

Guodong Fan +6
cs.SE 2026-04-29 reviewed

21% of Defects4J defects fail strict APR reproducibility checks
Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

Adam Krafczyk +1
cs.SE 2026-04-29 reviewed

Post-release bugs cluster in old
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

Domenico Cotroneo +3
cs.SE 2026-04-29 reviewed

Asymmetric service-host faults favor heterogeneous graphs for root cause ID
Which Types of Heterogeneity Matter for Root Cause Localization in Microservice Systems ?

Runzhou Wang +7
cs.SE 2026-04-29 reviewed

Metric models catch 85-90% of Python residual defects
Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

Giuseppe De Rosa +1
cs.SE 2026-04-29 reviewed

UK software jobs want design skills universities underteach
Understanding the Skills Gap between Higher Education Institutions and the Software Engineering Industry

Huy Phan +2
cs.SE 2026-04-29 reviewed

TDD manifesto embedded in prompts stabilizes LLM code outputs
TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Tarlan Hasanli +5