pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 8

  1. cs.SE 2026-05-08 reviewed
    RAG with LLMs catches 91 percent of false kernel bug reports

    Characterizing and Mitigating False-Positive Bug Reports in the Linux Kernel

    Jiashuo Tian +5

  2. cs.SE 2026-05-08 reviewed
    Natural-language rewrite lifts code retrieval scores

    Do not copy and paste! Rewriting strategies for code retrieval

    Andrea Gurioli +2

  3. cs.SE 2026-05-08 reviewed
    Scenario models automate VR app tests and catch more failures

    System Test Generation for Virtual Reality Applications using Scenario Models

    Gerry Longfils +3

  4. cs.RO 2026-05-08 reviewed
    Search finds small perturbations that break robot vision 3-7x better

    Search-based Robustness Testing of Laptop Refurbishing Robotic Software

    Erblin Isaku +4

  5. cs.SE 2026-05-08 reviewed
    Iterative refinement boosts LLM quantum solver success

    Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

    Luciano Baresi +5

  6. cs.SE 2026-05-08 reviewed
    Iterative checks boost LLM quantum solver success

    Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

    Luciano Baresi +5

  7. cs.SE 2026-05-08 reviewed
    Prefill signals from small models locate multi-agent failures

    MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

    Yang Liu +3

  8. cs.SE 2026-05-08 reviewed
    Prefill signals from small LLMs locate root failures in agent traces

    MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

    Yang Liu +3

  9. cs.SE 2026-05-08 reviewed
    Multi-shot prompts boost agreement only for Claude Haiku

    Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

    Moaath Alshaikh +9

  10. cs.SE 2026-05-08 reviewed
    Multi-stage training boosts Java-to-Cangjie code translation 6%

    Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

    Xinyue Liang +4

  11. cs.SE 2026-05-08 reviewed
    Unclear roles top ML team challenges in semiconductors

    Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry

    A. Azamnouri +5

  12. cs.SE 2026-05-08 reviewed
    Open-source low-code editor builds and deploys AI web apps

    Low-code and no-code with BESSER to create and deploy smart web applications

    Iv\'an Alfonso +3

  13. cs.LG 2026-05-08 reviewed
    Compile rate misleads on LLM game scene quality

    Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate

    Hugh Xuechen Liu +1

  14. cs.LG 2026-05-08 reviewed
    Dual-space loop refines virtual cell models by routing failures to right level

    CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models

    Mengran Li +14

  15. cs.SE 2026-05-08 reviewed
    AI backends gain one admission seam for governance across requests

    Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests

    Krti Tallam

  16. cs.SE 2026-05-08 reviewed
    LLM agents reach only 30-55% on full repo generation from scratch

    RepoZero: Can LLMs Generate a Code Repository from Scratch?

    Zhaoxi Zhang +4

  17. cs.SE 2026-05-08 reviewed
    Top LLM agents complete only 30-55% of code repositories from scratch

    RepoZero: Can LLMs Generate a Code Repository from Scratch?

    Zhaoxi Zhang +4

  18. cs.CL 2026-05-08 reviewed
    Framework ties agent architecture to lifecycle for reliable CUAs

    Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

    Zejian Chen +8

  19. cs.SE 2026-05-08 reviewed
    Authority transfer, not task performance, defines agentic CI/CD

    From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines

    Marcus Emmanuel Barnes +2

  20. cs.SE 2026-05-07 reviewed
    Replay script matches frontier models on computer-use benchmarks

    Computer Use at the Edge of the Statistical Precipice

    Pierluca D'Oro +8

  21. cs.SE 2026-05-07 reviewed
    LLM agents fix under half of architectural code smells

    SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

    Ion George Dinu +7

  22. cs.SE 2026-05-07 reviewed
    LLM agents fix under half of architectural code smells

    SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

    Ion George Dinu +7

  23. cs.RO 2026-05-07 reviewed
    Language descriptions become solvable constraints for AV tests

    Traffic Scenario Orchestration from Language via Constraint Satisfaction

    Frieda Rong +3

  24. cs.SE 2026-05-07 reviewed
    This paper reviews studies linking lack of belonging to higher burnout in software…

    Guidelines for Cultivating a Sense of Belonging to Reduce Developer Burnout

    Bianca Trinkenreich +3

  25. cs.SE 2026-05-07 reviewed
    MySQL and PostgreSQL top DBMS use in open-source Java history

    Analyzing the Adoption of Database Management Systems Throughout the History of Open Source Projects

    Camila A. Paiva +10

  26. cs.SE 2026-05-07 reviewed
    Best coding agents pass under 16 percent of Java framework migrations

    ScarfBench: A Benchmark for Cross-Framework Application Migration in Enterprise Java

    Advait Pavuluri +8

  27. cs.SE 2026-05-07 reviewed
    Agents pass only 15% of Java framework migration tests

    ScarfBench: A Benchmark for Cross-Framework Application Migration in Enterprise Java

    Advait Pavuluri +8

  28. cs.SE 2026-05-07 reviewed
    AI code needs fewer updates than human code

    To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study

    Shota Sawada +5

  29. cs.SE 2026-05-07 reviewed
    AI code receives less maintenance than human code

    To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study

    Shota Sawada +5

  30. cs.SE 2026-05-07 reviewed
    LLM agents drop 30 points on backend tasks with full constraints

    Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

    Francesco Dente +2

  31. cs.AI 2026-05-07 reviewed
    DAG replay preserves AI work state exactly with zero churn

    From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

    Josh Rosen +1

  32. cs.SE 2026-05-07 reviewed
    LLMs pick vulnerable library versions in 37-56% of tasks

    Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

    Chengjie Wang +4

  33. cs.SE 2026-05-07 reviewed
    LLM-based method repairs sibling code bugs across locations

    SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

    Xinyu Liu +5

  34. cs.SE 2026-05-07 reviewed
    Self-healing framework raises LLM agent success rates

    A Self-Healing Framework for Reliable LLM-Based Autonomous Agents

    Cheonsu Jeong +1

  35. cs.SE 2026-05-07 reviewed
    Symbolic traces train 8B model to beat 32B on code violation detection

    Teaching LLMs Program Semantics via Symbolic Execution Traces

    Jonas Bayer +5

  36. cs.SE 2026-05-07 reviewed
    0.1% of PyPI packages carry 80% of maintenance impact

    Modeling Dependency-Propagated Ecosystem Impact of Changes in Maintenance Activities: Evaluating Support Strategies in the PyPI Network

    Alexandros Tsakpinis +2

  37. cs.SE 2026-05-07 reviewed
    0.1% of PyPI packages carry 80% of ecosystem impact

    Modeling Dependency-Propagated Ecosystem Impact of Changes in Maintenance Activities: Evaluating Support Strategies in the PyPI Network

    Alexandros Tsakpinis +2

  38. cs.AI 2026-05-07 reviewed
    LLM judges flip up to 9% of safety verdicts on equivalent policy rewordings

    Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

    Shihao Weng +2

  39. cs.SE 2026-05-07 reviewed
    Protocol tests agent effort to recover design intent from code

    BUILD-AND-FIND: An Effort-Aware Protocol for Evaluating Agent-Managed Codebases

    Jhen-Ke Lin

  40. cs.SE 2026-05-07 reviewed
    Agents top out near 47% F1 on updating project tests after changes

    Breaking, Stale, or Missing? Benchmarking Coding Agents on Project-Level Test Evolution

    Ye Shang +5

  41. cs.SE 2026-05-07 reviewed
    One model beats coding specialists by 9% with utility-driven RL

    Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

    Yujia Chen +4

  42. cs.SE 2026-05-07 reviewed
    AST patterns identify algorithms more accurately than LLMs or clone detectors

    Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

    Denis Neum\"uller +3

  43. cs.CR 2026-05-07 reviewed
    Tool detects how LLMs create risks in GitHub CI workflows

    Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

    Bonan Ruan +5

  44. cs.AI 2026-05-07 reviewed
    Multi-agent workflow lifts AI coding success by 6.5 percent

    MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

    Yuliang Xu +4

  45. cs.AI 2026-05-07 reviewed
    Multi-agent workflow lifts algorithmic solving by 6.5 percent

    MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

    Yuliang Xu +4

  46. cs.SE 2026-05-07 reviewed
    Automatic metrics fail to judge non-English code comments

    Evaluating Non-English Developer Support in Machine Learning for Software Engineering

    Jonathan Katzy +7

  47. cs.SE 2026-05-07 reviewed
    AI code security fixes often create new weaknesses

    On Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategies

    Ali Soltanian Fard Jahromi +3

  48. cs.SE 2026-05-07 reviewed
    Ontology guides agent for better requirements interviews

    From Chat to Interview: Agentic Requirements Elicitation with an Experience Ontology

    Dongming Jin +7

  49. cs.SE 2026-05-07 reviewed
    Real IDE traces expose overestimation in simulated coding assistant tests

    An Empirical Study of Proactive Coding Assistants in Real-World Software Development

    Lehui Li +3

  50. cs.SE 2026-05-07 reviewed
    Coding agents need insight policy quality

    Agentic Coding Needs Proactivity, Not Just Autonomy

    Nghi D. Q. Bui +1