archive
Every paper Pith has read. Search by title, abstract, or pith.
1797 papers in cs.SE · page 12
-
Util files show 2.75x higher vulnerability rates in mature projects
Unsafe and Unused? A History of Utility Code in Mature Open Source Projects
-
Leading LLM agent completes only 67% of live workflow tasks
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
-
AI suitability in qualitative research depends on positivist versus non-positivist stance
To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research
-
Transformer fault diagnosis reaches 0.96 AUROC with graph method
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
-
AI trust can be measured via pillars and agentic interfaces
I hope we don't do to trust what advertising has done to love
-
AI trust needs pillars and vectors to stay meaningful
I hope we don't do to trust what advertising has done to love
-
Communication and teamwork top soft skills in 25 years of agile studies
Beyond Code, We Are People: A Systematic Mapping of 25 Years of Literature on Soft Skills in Agile Development Teams
-
Four patterns split AI vision into fast reflexes and slow supervision
A Pattern Language for Resilient Visual Agents
-
Models generate Verilog from circuit diagrams without using the diagrams
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
-
Tool detects 11 Angular code smells with over 88% accuracy
An Empirical Evaluation of Code Smell Detection in Angular Applications
-
Zero-knowledge sets let consumers check SBOMs for specific risks privately
zkSBOM: Privacy-Preserving SBOM Sharing with Zero-Knowledge Sets
-
Evolving specs build requirements debt in AI car perception
Requirements Debt in AI-Enabled Perception Systems Development: An Industrial RE4AI Perspective
-
Four-phase method flags NFT migration incompatibilities in advance
Feature-Centric Methodology for Analyzing Cross-Chain NFT Migration Compatibility
-
Deployers gate LLM updates with contracts and targeted tests
Test Before You Deploy: Governing Updates in the LLM Supply Chain
-
AI supply chains hide four integrity gaps across 11,500 packages
The Grand Software Supply Chain of AI Systems
-
Technical and social heroes overlap by only 10 percent in Apache projects
Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity
-
Rubric framework makes LLM judges comparable in coding co-creation
LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding
-
Code representation choice drives LLM false positives across languages
How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection
-
Nearly half of template engine bugs cause silent wrong output
Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns
-
Watermarking code datasets achieves 100% verification success
PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models
-
N-version models lift API recommendation reliability to 83.8%
Tail-aware N-version Machine Learning Models for Reliable API Recommendation
-
UTAUT plus Bayesian analysis spots GenAI barriers in software teams
GenAI in Software Engineering: The Role of Technology Acceptance Models
-
Newcomer GFI pull request merge rates fell from 62% to 42%
A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects
-
ScaleBox scales accurate code verification for LLM training
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models
-
Nygard's ADR template outperforms MADR in student usability test
One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption
-
New benchmark standardizes LLM tests on stripped binary tasks
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)
-
Hybrid LLM and tool system creates explainable process models
Pragmos: A Process Agentic Modeling System
-
Adaptive diffs match full code edits at 30% lower cost
To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing
-
Agents evolve their goals and code on their own
Self-Evolving Software Agents
-
CS Curricula Must Reframe Algorithms as Foundations for AI Systems
Now's the Time: Computer Science Must Evolve to Emphasize Software and Systems Engineering with Artificial Intelligence (AI)
-
Controller keeps AI research software aligned across 400 commits
Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves
-
Benchmark tests code repairs by re-running original CI workflows
CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows
-
Emote raises modular testing coverage by 15 percent
On the Effectiveness of Modular Testing in EvoSuite
-
Bayesian calibration tunes LLM metrics to human ratings for model swaps
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
-
Tool recreates 92% of failing embedded CI builds
Where did we fail? -- Reproducing build failures in embedded open source software
-
LLMs reach only 45.6% on class-level code benchmark
ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation
-
Hot fixes skip most tests and reviews
Hot Fixing in the Wild
-
AI coding tools erode engineers' root-cause skills
Cognitive Atrophy and Systemic Collapse in AI-Dependent Software Engineering
-
Test taxonomy with CI ecosystem improves HPC fault detection
A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC
-
RAPL tools add up to 47% time overhead at 1 kHz polling
What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools
-
LLM-guided search finds efficient inference params in 3.4 prompts
LLM-Guided Runtime Parameter Optimization for Energy-Efficient Model Inference
-
Move cuts smart contract security checks by 60 percent
Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience
-
Move cuts explicit security checks by 60% in smart contracts
Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience
-
Model editing adapts service recommendations without full retraining
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation
-
21% of Defects4J defects fail strict APR reproducibility checks
Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset
-
Post-release bugs cluster in old
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study
-
Asymmetric service-host faults favor heterogeneous graphs for root cause ID
Which Types of Heterogeneity Matter for Root Cause Localization in Microservice Systems ?
-
Metric models catch 85-90% of Python residual defects
Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems
-
UK software jobs want design skills universities underteach
Understanding the Skills Gap between Higher Education Institutions and the Software Engineering Industry
-
TDD manifesto embedded in prompts stabilizes LLM code outputs
TDD Governance for Multi-Agent Code Generation via Prompt Engineering