pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 12

  1. cs.SE 2026-04-30 reviewed
    Util files show 2.75x higher vulnerability rates in mature projects

    Unsafe and Unused? A History of Utility Code in Mature Open Source Projects

    Brandon Keller +3

  2. cs.SE 2026-04-30 reviewed
    Leading LLM agent completes only 67% of live workflow tasks

    Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

    Chenxin Li +10

  3. cs.SE 2026-04-30 reviewed
    AI suitability in qualitative research depends on positivist versus non-positivist stance

    To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research

    Katja Karhu +2

  4. cs.SE 2026-04-30 reviewed
    Transformer fault diagnosis reaches 0.96 AUROC with graph method

    DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures

    Sigma Jahan +3

  5. cs.CY 2026-04-30 reviewed
    AI trust can be measured via pillars and agentic interfaces

    I hope we don't do to trust what advertising has done to love

    Jade Alglave

  6. cs.CY 2026-04-30 reviewed
    AI trust needs pillars and vectors to stay meaningful

    I hope we don't do to trust what advertising has done to love

    Jade Alglave

  7. cs.SE 2026-04-30 reviewed
    Communication and teamwork top soft skills in 25 years of agile studies

    Beyond Code, We Are People: A Systematic Mapping of 25 Years of Literature on Soft Skills in Agile Development Teams

    Israely Lima +4

  8. cs.AI 2026-04-30 reviewed
    Four patterns split AI vision into fast reflexes and slow supervision

    A Pattern Language for Resilient Visual Agents

    Habtom Kahsay Gidey +2

  9. cs.SE 2026-04-30 reviewed
    Models generate Verilog from circuit diagrams without using the diagrams

    From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

    Guang Yang +3

  10. cs.SE 2026-04-30 reviewed
    Tool detects 11 Angular code smells with over 88% accuracy

    An Empirical Evaluation of Code Smell Detection in Angular Applications

    Maykon Nunes +3

  11. cs.CR 2026-04-30 reviewed
    Zero-knowledge sets let consumers check SBOMs for specific risks privately

    zkSBOM: Privacy-Preserving SBOM Sharing with Zero-Knowledge Sets

    Tom Sorger +5

  12. cs.SE 2026-04-30 reviewed
    Evolving specs build requirements debt in AI car perception

    Requirements Debt in AI-Enabled Perception Systems Development: An Industrial RE4AI Perspective

    Hina Saeeda +1

  13. cs.SE 2026-04-30 reviewed
    Four-phase method flags NFT migration incompatibilities in advance

    Feature-Centric Methodology for Analyzing Cross-Chain NFT Migration Compatibility

    Mohd Sameen Chishti +2

  14. cs.SE 2026-04-30 reviewed
    Deployers gate LLM updates with contracts and targeted tests

    Test Before You Deploy: Governing Updates in the LLM Supply Chain

    Mohd Sameen Chishti +2

  15. cs.SE 2026-04-30 reviewed
    AI supply chains hide four integrity gaps across 11,500 packages

    The Grand Software Supply Chain of AI Systems

    Carmine Cesarano +1

  16. cs.SE 2026-04-30 reviewed
    Technical and social heroes overlap by only 10 percent in Apache projects

    Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity

    Amit Kumar +4

  17. cs.SE 2026-04-30 reviewed
    Rubric framework makes LLM judges comparable in coding co-creation

    LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding

    Md Faizul Ibne Amin +5

  18. cs.CR 2026-04-30 reviewed
    Code representation choice drives LLM false positives across languages

    How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

    Maofei Chen +5

  19. cs.SE 2026-04-30 reviewed
    Nearly half of template engine bugs cause silent wrong output

    Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns

    Kai Gao +2

  20. cs.SE 2026-04-30 reviewed
    Watermarking code datasets achieves 100% verification success

    PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

    Haocheng Huang +7

  21. cs.SE 2026-04-30 reviewed
    N-version models lift API recommendation reliability to 83.8%

    Tail-aware N-version Machine Learning Models for Reliable API Recommendation

    Aoi Matsuda +2

  22. cs.SE 2026-04-30 reviewed
    UTAUT plus Bayesian analysis spots GenAI barriers in software teams

    GenAI in Software Engineering: The Role of Technology Acceptance Models

    Oscar Johansson +2

  23. cs.SE 2026-04-30 reviewed
    Newcomer GFI pull request merge rates fell from 62% to 42%

    A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects

    Hirotatsu Hoshikawa +4

  24. cs.SE 2026-04-30 reviewed
    ScaleBox scales accurate code verification for LLM training

    ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

    Jiasheng Zheng +10

  25. cs.SE 2026-04-30 reviewed
    Nygard's ADR template outperforms MADR in student usability test

    One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

    Fernando Nogueira +2

  26. cs.CR 2026-04-30 reviewed
    New benchmark standardizes LLM tests on stripped binary tasks

    REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

    Jun Yeon Won +3

  27. cs.SE 2026-04-30 reviewed
    Hybrid LLM and tool system creates explainable process models

    Pragmos: A Process Agentic Modeling System

    Pedro-Aar\'on Hern\'andez-\'Avalos +1

  28. cs.SE 2026-04-30 reviewed
    Adaptive diffs match full code edits at 30% lower cost

    To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing

    Wei Cheng +6

  29. cs.SE 2026-04-29 reviewed
    Agents evolve their goals and code on their own

    Self-Evolving Software Agents

    Marco Robol +1

  30. cs.SE 2026-04-29 reviewed
    CS Curricula Must Reframe Algorithms as Foundations for AI Systems

    Now's the Time: Computer Science Must Evolve to Emphasize Software and Systems Engineering with Artificial Intelligence (AI)

    Chandra N. Sekharan +1

  31. cs.SE 2026-04-29 reviewed
    Controller keeps AI research software aligned across 400 commits

    Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

    Halley Young +1

  32. cs.SE 2026-04-29 reviewed
    Benchmark tests code repairs by re-running original CI workflows

    CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

    Rabeya Khatun Muna +2

  33. cs.SE 2026-04-29 reviewed
    Emote raises modular testing coverage by 15 percent

    On the Effectiveness of Modular Testing in EvoSuite

    Elizabeth Dinella

  34. cs.AI 2026-04-29 reviewed
    Bayesian calibration tunes LLM metrics to human ratings for model swaps

    When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

    Emma Casey +3

  35. cs.SE 2026-04-29 reviewed
    Tool recreates 92% of failing embedded CI builds

    Where did we fail? -- Reproducing build failures in embedded open source software

    Han Fu +5

  36. cs.SE 2026-04-29 reviewed
    LLMs reach only 45.6% on class-level code benchmark

    ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

    Yeheng Chen +6

  37. cs.SE 2026-04-29 reviewed
    Hot fixes skip most tests and reviews

    Hot Fixing in the Wild

    Carol Hanna +5

  38. cs.SE 2026-04-29 reviewed
    AI coding tools erode engineers' root-cause skills

    Cognitive Atrophy and Systemic Collapse in AI-Dependent Software Engineering

    Frank Ginac

  39. cs.DC 2026-04-29 reviewed
    Test taxonomy with CI ecosystem improves HPC fault detection

    A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

    Petter Sand{\aa}s +3

  40. cs.SE 2026-04-29 reviewed
    RAPL tools add up to 47% time overhead at 1 kHz polling

    What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

    Jeremy Diamond +1

  41. cs.SE 2026-04-29 reviewed
    LLM-guided search finds efficient inference params in 3.4 prompts

    LLM-Guided Runtime Parameter Optimization for Energy-Efficient Model Inference

    Katelyn Crumpacker +1

  42. cs.SE 2026-04-29 reviewed
    Move cuts smart contract security checks by 60 percent

    Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience

    Matteo Vaccargiu +3

  43. cs.SE 2026-04-29 reviewed
    Move cuts explicit security checks by 60% in smart contracts

    Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience

    Matteo Vaccargiu +3

  44. cs.SE 2026-04-29 reviewed
    Model editing adapts service recommendations without full retraining

    When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

    Guodong Fan +6

  45. cs.SE 2026-04-29 reviewed
    21% of Defects4J defects fail strict APR reproducibility checks

    Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

    Adam Krafczyk +1

  46. cs.SE 2026-04-29 reviewed
    Post-release bugs cluster in old

    What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

    Domenico Cotroneo +3

  47. cs.SE 2026-04-29 reviewed
    Asymmetric service-host faults favor heterogeneous graphs for root cause ID

    Which Types of Heterogeneity Matter for Root Cause Localization in Microservice Systems ?

    Runzhou Wang +7

  48. cs.SE 2026-04-29 reviewed
    Metric models catch 85-90% of Python residual defects

    Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

    Giuseppe De Rosa +1

  49. cs.SE 2026-04-29 reviewed
    UK software jobs want design skills universities underteach

    Understanding the Skills Gap between Higher Education Institutions and the Software Engineering Industry

    Huy Phan +2

  50. cs.SE 2026-04-29 reviewed
    TDD manifesto embedded in prompts stabilizes LLM code outputs

    TDD Governance for Multi-Agent Code Generation via Prompt Engineering

    Tarlan Hasanli +5