pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 4

  1. cs.SE 2026-05-15 reviewed
    Non-self-fixed ATD lingers longer with many developers' changes

    The Dangers of Non-Self-Fixed Architecture Technical Debt and Its Impact on Time-to-Fix

    Edi Sutoyo +2

  2. cs.SE 2026-05-15 reviewed
    Concept alignment lifts code search accuracy 15x on new data

    XSearch: Explainable Code Search via Concept-to-Code Alignment

    Yiming Liu +9

  3. cs.SE 2026-05-15 reviewed
    Small open LLMs match large ones at grammar-based DSL generation

    From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs

    Junaid Baber +4

  4. cs.SE 2026-05-15 reviewed
    AI agents solve at most 39% of real version upgrade tasks

    RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades

    Xinbo Xu +15

    1 Piths
  5. cs.SE 2026-05-15 reviewed
    BootstrapAgent distills repo setup into reusable contracts

    BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

    Sihan Fu +4

  6. cs.SE 2026-05-15 reviewed
    Early QA in annotation pipelines cuts costs more than late checks

    Position: Early-Stage Quality Assurance in Annotation Pipelines Is More Cost-Effective Than Late-Stage Validation

    Sunil Kothari +10

  7. cs.AR 2026-05-15 reviewed
    Intra-thread duplication catches 39% more defective servers

    ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions

    Ioanna Vavelidou +5

  8. cs.SE 2026-05-15 reviewed
    Bayesian sequential tests cut quantum verification costs

    Bayesian Sequential Verification for Budget-Aware Quantum Program Testing

    Lei Zhang

  9. cs.CR 2026-05-15 reviewed
    Chained mutators mostly interfere but some synergize in LLM jailbreaks

    Compositional Jailbreaking: An Empirical Analysis of Mutator Chain Interactions in Aligned LLMs

    Reinelle Jan Bugnot +3

  10. cs.CR 2026-05-15 reviewed
    LLM agent finds 24 zero-day privilege escalations in microservices

    Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis

    Penghui Li +3

  11. cs.SE 2026-05-14 reviewed
    Runtime structure cuts retry costs in agentic coding by 51.7%

    Runtime-Structured Task Decomposition for Agentic Coding Systems

    Shubhi Asthana +4

  12. cs.LG 2026-05-14 reviewed
    Agent turns I/O examples into code via guided evolutionary search

    From I/O to Code with Discovery Agent

    Yihong Dong +9

  13. cs.SE 2026-05-14 reviewed
    Semantically grounded agents detect memory bugs in binaries

    Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

    Xinran Zheng +4

  14. cs.SE 2026-05-14 reviewed
    Viverra adds verified assertions to LLM-generated C code

    Viverra: Text-to-Code with Guarantees

    Haoze Wu +3

  15. cs.SE 2026-05-14 reviewed
    Test generation uncovers 2.56x more privacy leaks in code LLMs

    Probing Privacy Leaks in LLM-based Code Generation via Test Generation

    Yifei Ge +9

  16. cs.SE 2026-05-14 reviewed
    Agentic AI matures fastest where outputs can be tested automatically

    Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle

    Spyridon Alvanakis Apostolou +2

  17. cs.SE 2026-05-14 reviewed
    Architecture docs let agents migrate eight C repos to Rust

    Documentation-Guided Agentic Codebase Migration from C to Rust

    Minh Le-Anh +3

  18. cs.SE 2026-05-14 reviewed
    Documentation blueprint enables full C-to-Rust repo migration

    Documentation-Guided Agentic Codebase Migration from C to Rust

    Minh Le-Anh +3

  19. cs.SE 2026-05-14 reviewed
    ML classifier beats rules at spotting BDD refactoring chances

    Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

    Ali Hassaan Mughal +2

  20. cs.SE 2026-05-14 reviewed
    Memory agent keeps repo documentation consistent

    Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

    Suyoung Bae +4

  21. cs.SE 2026-05-14 reviewed
    Retriever beats generator in RAG for code tasks

    Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

    Qiang Ke +4

  22. cs.SE 2026-05-14 reviewed
    Stale code snippets make models output outdated helpers

    When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

    Haojun Weng +4

  23. cs.CR 2026-05-14 reviewed
    Disguised compliance rules let attackers hijack LLM agents

    Exploiting LLM Agent Supply Chains via Payload-less Skills

    Xinyu Liu +3

  24. cs.SE 2026-05-14 reviewed
    Multi-agent system automates full library fuzzing lifecycle

    FuzzAgent: Multi-Agent System for Evolutionary Library Fuzzing

    Yunlong Lyu +5

  25. cs.SE 2026-05-14 reviewed
    Agents resolve 45 percent of chained package upgrades

    SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

    Man Ho Lam +7

  26. cs.SE 2026-05-14 reviewed
    Size filter trims 80 percent of tokens from LLM repo inputs

    Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints

    Shweta Mishra

  27. cs.SE 2026-05-14 reviewed
    Valid microservice APIs often fail for AI agents

    Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System

    Rayfran Rocha Lima +2

  28. cs.SE 2026-05-14 reviewed
    Hydra cuts LLM code gen latency up to 71% with rollback repairs

    Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support

    Alexander Du +3

  29. cs.CR 2026-05-14 reviewed
    Web agents should plan before seeing page content

    Web Agents Should Adopt the Plan-Then-Execute Paradigm

    Julien Piet +7

  30. cs.SE 2026-05-14 reviewed
    Failure-guided fuzzing beats random testing for HQC programs

    Failure-Guided Fuzzing for Hybrid Quantum-Classical Programs

    Lei Zhang

  31. cs.SE 2026-05-13 reviewed
    Prompt strategy explains more variation in test diversity than model size when using LLMs…

    LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

    Hrushitha Goud Tigulla +1

  32. cs.SE 2026-05-13 reviewed
    Constrained edits merge checkpoints to lift code agent scores

    CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

    Mingzhi Zhu +3

  33. cs.SE 2026-05-13 reviewed
    AI agents speed creation of digital music instruments

    Case Studies and Reflections on Agentic Software Engineering for Rapid Development of Digital Music Instruments

    Matthew John Yee-King

  34. cs.SE 2026-05-13 reviewed
    Method-level change-proneness beats class-level for test minimization

    Method-level Change-proneness: A Better Metric for Black-box Test Suite Minimization

    Md Siam +1

  35. cs.SE 2026-05-13 reviewed
    Benchmark shows AI agents recall 42-83 percent of property-based testing bugs

    PBT-Bench: Benchmarking AI Agents on Property-Based Testing

    Lucas Jing +3

  36. cs.SE 2026-05-13 reviewed
    LLMs detect 42-83% of semantic bugs with property-test prompts

    PBT-Bench: Benchmarking AI Agents on Property-Based Testing

    Lucas Jing +3

  37. cs.SE 2026-05-13 reviewed
    LLM with SMT solver audits natural-language requirements

    Neurosymbolic Auditing of Natural-Language Software Requirements

    Bethel Hall +1

  38. cs.SE 2026-05-13 reviewed
    LLMs reach only 52% accuracy on HMSC semantic tasks

    (How) Do Large Language Models Understand High-Level Message Sequence Charts?

    Mohammad Reza Mousavi

  39. cs.SE 2026-05-13 reviewed
    LLMs reach only 52% accuracy on HMSC formal semantics

    (How) Do Large Language Models Understand High-Level Message Sequence Charts?

    Mohammad Reza Mousavi

  40. cs.RO 2026-05-13 reviewed
    CARS attributes AV collisions to driver faults

    Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles

    Yizhuo Xiao +7

  41. cs.SE 2026-05-13 reviewed
    SkillOps is a plug-in framework that maintains LLM agent skill libraries by representing…

    SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

    Hongji Pu +2

  42. cs.SE 2026-05-13 reviewed
    Quantifier rewrites and non-alias specs speed GPU verification ninefold

    Scalable Deductive Verification of Data-Level Parallel Programs

    Lars B. van den Haak +2

  43. cs.AR 2026-05-13 reviewed
    AI agents drop 37-58% on hardware vs software tasks

    Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

    Qingyun Zou +4

  44. cs.RO 2026-05-13 reviewed
    Open standards let one agent model run consistently in three simulators

    Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles

    Christian Geller +3

  45. cs.SE 2026-05-13 reviewed
    Runtime pruning cuts tokens 49% for local LLM fault localization

    SieveFL: Hierarchical Runtime-Aware Pruning for Scalable LLM-Based Fault Localization

    Mahdi Farzandway +1

  46. cs.SE 2026-05-13 reviewed
    Call stack data improves RL game testing agents

    CA2: Code-Aware Agent for Automated Game Testing

    Valliappan Chidambaram Adaikkappan +3

  47. cs.SE 2026-05-13 reviewed
    Runtime harness mediates AI agent actions on code projects

    AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

    Hailin Zhong +1

  48. cs.SE 2026-05-13 reviewed
    This paper finds that code generated by large language models has overall readability…

    The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

    Hengzhi Ye +3

  49. cs.SE 2026-05-13 reviewed
    Noise reshapes mutant detection in quantum programs

    Robust Mutation Analysis of Quantum Programs Under Noise

    Sophie Fortz +4

  50. cs.SE 2026-05-13 reviewed
    Readiness metrics show near-zero link to research software execution success

    ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment

    Sheeba Samuel +4