pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 3

  1. cs.SE 2026-05-18 reviewed
    Framework choice reverses meaning of agent behavior signals

    Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

    Wei Ma +5

  2. cs.SE 2026-05-18 reviewed
    CommitDistill hits 0.75 retrieval rate from git history at 256-char budget

    CommitDistill: A Lightweight Knowledge-Centric Memory Layer for Software Repositories

    Divya Chukkapalli +4

  3. cs.SE 2026-05-18 reviewed
    Debating LLMs catch more code vulnerabilities

    Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection

    Xin Peng +7

  4. cs.SE 2026-05-18 reviewed
    Multi-model feedback doubles AI solves on contest problems

    A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback

    Anika Tabassum +4

  5. cs.SE 2026-05-18 reviewed
    ProcBench detects process defects in LLM coding agents missed by outcome scores

    ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

    Jiawei He +6

  6. cs.SE 2026-05-18 reviewed
    Process benchmark catches mid-task defects in LLM coding agents

    ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

    Jiawei He +6

  7. cs.CL 2026-05-18 reviewed
    Tool localizes node errors in multi-agent LLM workflows

    PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

    Kazuki Kawamura +2

  8. cs.LG 2026-05-18 reviewed
    Two-level router cuts log QA latency 55%

    LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems

    Mert Coskuner +2

  9. cs.SE 2026-05-18 reviewed
    Verify gate turns agent completion into inspectable admission control

    Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

    Hai-Duong Nguyen +1

  10. cs.SE 2026-05-18 reviewed
    Verify gate renders multi-agent completions inspectable and fail-closed

    Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study

    Hai-Duong Nguyen +1

  11. cs.SE 2026-05-18 reviewed
    Agentic RAG reaches 78% top-1 file bug localization

    BLAgent: Agentic RAG for File-Level Bug Localization

    Md Afif Al Mamun +1

  12. cs.SE 2026-05-18 reviewed
    Call-site context lifts code model pass rates

    Contextualized Code Pretraining for Code Generation

    Chen Liu +5

  13. cs.SE 2026-05-18 reviewed
    Two-stage LLM workflow verifies code against natural language rules

    LLM-Based Static Verification of Code Against Natural-Language Requirements: An Industrial Experience Report

    Zhi Quan Zhou +2

  14. cs.LO 2026-05-18 reviewed
    Retrieval system compresses Lean proofs over 70 percent

    Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

    Jialin Lu +6

  15. cs.AI 2026-05-17 reviewed
    AI feedback helps Scrum Masters spot their own negative emotions live

    EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

    Jingni Huang +1

  16. cs.SE 2026-05-17 reviewed
    Framework keeps AI-assisted scientific code traceable under NQA-1

    Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability

    Chaitanya Bhave +5

  17. cs.LG 2026-05-17 reviewed
    Guided checks at code boundaries boost translation pass rates

    Verifier-Guided Code Translation via Meta-Step Decoding

    Tianyang Zhou +4

  18. cs.SE 2026-05-17 reviewed
    CFS and GA tuning lift fault prediction accuracy to 88.4%

    A Feature-Driven Framework for Software Fault Prediction

    Ahmad Nauman Ghazi +5

  19. cs.SE 2026-05-17 reviewed
    LLMs subclassify invalid bug root causes and generate fixes

    Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

    Mahmut Furkan Gon +3

  20. cs.SE 2026-05-17 reviewed
    Inverted API exploration yields verified tool-call data

    Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs

    Yuxuan Lu +14

  21. cs.SE 2026-05-17 reviewed
    Five-stage AI workflow could ease the code review bottleneck

    Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review

    H\"useyin \"Ozg\"ur Kamal{\i} +3

  22. cs.SE 2026-05-17 reviewed
    Multi-agent setup with graphs keeps business rules in legacy modernization

    AgentModernize: Preserving Business Logic in Legacy Modernization with Multi-Agent LLMs and Behavioral Specification Graphs

    Sheikh Nazib Ahmed +1

  23. cs.SE 2026-05-17 reviewed
    Agents fail 95% of SaaS tasks before business logic

    SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

    Qingnan Ren +13

  24. cs.SE 2026-05-17 reviewed
    LLM Agent Builds Formal Models by Repairing Verification Errors

    Event-B Agent: Towards LLM Agent for Formal Model Synthesis and Repair

    Hongshu Wang +5

  25. cs.SE 2026-05-17 reviewed
    ContraFix fixes 84% of C/C++ vulnerabilities at low cost

    ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse

    Simiao Liu +4

  26. cs.SE 2026-05-17 reviewed
    Memory layers raise repo vulnerability repair to 58%

    MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair

    Simiao Liu +5

  27. cs.SE 2026-05-17 reviewed
    Diagnostic probes recover 45-62% of mislabeled GUI failures

    DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

    Sirui Hong +5

  28. cs.SE 2026-05-17 reviewed
    DiagEval recovers 45-62% of misattributed GUI failures

    DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

    Sirui Hong +5

  29. cs.SE 2026-05-17 reviewed
    Models hit only 6 Mythos bug targets out of 54 attempts with files supplied

    Benchmarking Mythos-Linked Bug Rediscovery

    Isaac David +1

  30. cs.SE 2026-05-17 reviewed
    PLC-BinX predicts PLC binary toolchains with 100 percent accuracy

    One Step Further: Understanding PLC Binaries Through Cross-Platform Reverse Engineering and Function-Level Semantic Analysis

    Ang Jia +5

  31. cs.SE 2026-05-17 reviewed
    PLC-BinX predicts toolchain from binaries with 100% accuracy

    One Step Further: Understanding PLC Binaries Through Cross-Platform Reverse Engineering and Function-Level Semantic Analysis

    Ang Jia +5

  32. cs.SE 2026-05-17 reviewed
    Ontology organizes foundations of software languages

    Towards an Ontology for the Foundations of Software Languages

    Ralf L\"ammel

  33. cs.SE 2026-05-17 reviewed
    Block-level slicing triples LLM bug finds in 19K-line processor

    Debug Like a Human: Scaling LLM-based Fault Localization to Processor Design via Block-Level Instruction-Oriented Slicing

    Zizhen Liu +8

  34. cs.SE 2026-05-17 reviewed
    No LLM clears 80 percent on observation contract compliance

    ContractBench: Can LLM Agents Preserve Observation Contracts?

    Jicheng Wang +5

  35. cs.SE 2026-05-17 reviewed
    Context graphs guide LLMs to resolve code merge conflicts better

    Rover: Context-aware Conflict Resolution with LLM

    Qingyu Zhang +4

  36. cs.SE 2026-05-17 reviewed
    Automated TDD lifts AI web app success by 34-48 points

    From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

    Yuxuan Wan +5

  37. cs.SE 2026-05-16 reviewed
    Static checks boost diffusion code RL performance

    Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

    Shuyin Ouyang +4

  38. cs.PL 2026-05-16 reviewed
    Region allocators keep locality edge on modern hardware

    Reconsidering "Reconsidering Custom Memory Allocation"

    Nicolas van Kempen +1

  39. cs.CR 2026-05-16 reviewed
    LLM package hallucinations shrink to 4.6-6.1% but 127 names stay common

    The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

    Aleksandr Churilov (Independent Researcher)

  40. cs.SE 2026-05-16 reviewed
    LLMs skip hallucination-prone code tasks via execution checks

    Task Abstention for Large Language Models in Code Generation

    Yanke Zhou +6

  41. cs.SE 2026-05-16 reviewed
    Low-code DevOps speeds tasks but adds security and governance risks

    Low-Code Paradox in DevOps: Security and Governance Insights from Practitioners

    Muhammad Azeem Akbar +2

  42. cs.CR 2026-05-16 reviewed
    FIDO Times Firmware Inputs at Availability Checks to Lift Coverage

    Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input Delivery

    Shandian Shen +5

  43. cs.SE 2026-05-15 reviewed
    78% of open source AI policies allow GenAI contributions

    AI Policy, Disclosure, and Human in the Loop: How Are Contribution Guidelines Adapting to GenAI?

    Andre Hora +1

  44. cs.SE 2026-05-15 reviewed
    GitHub projects standardize on README

    What's Inside a GitHub Repository? An Empirical Study on the Contents of 10K Projects

    Andre Hora +2

  45. cs.SE 2026-05-15 reviewed
    Core compiler reuse via LSP powers fast IDE for Move

    Optimizing an IDE for an Evolving Language Ecosystem

    Adam Welc +4

  46. cs.SE 2026-05-15 reviewed
    LLM and search methods trade off strengths in fixing merge conflicts

    LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

    Heleno de Souza Campos Junior +1

  47. cs.SE 2026-05-15 reviewed
    AR test framework tracks stable areas in videos for 55.8% coverage

    TARIPlay: A Test Framework for AR Applications based on Interactive Area Tracking in Playback Videos

    Seyed Amir Mousavi +1

  48. cs.SE 2026-05-15 reviewed
    Gemini on trillion internal tokens cuts developer iterations 23%

    Customizing an LLM for Enterprise Software Engineering

    Aditya Kini +17

  49. cs.SE 2026-05-15 reviewed
    Adapted LLM cuts developer iterations by 23 percent

    Customizing an LLM for Enterprise Software Engineering

    Aditya Kini +17

  50. cs.CR 2026-05-15 reviewed
    Manufacturing ransomware recovery goes beyond backups

    From Backup Restoration to Minimum Viable Factory Recovery: A Systematization of Ransomware Recovery in Manufacturing Systems

    Chun Yin Chiu