pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 11

  1. cs.SE 2026-05-04 reviewed
    LLM repair models drop over 50% on minor code tweaks

    HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

    Fazle Rabbi +1

  2. cs.SE 2026-05-04 reviewed
    Evaluation issues cause many false failures in LLM code translation

    Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

    Fazle Rabbi +2

  3. cs.SE 2026-05-04 reviewed
    Evaluation errors inflate LLM code translation failure rates

    Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

    Fazle Rabbi +2

  4. cs.SE 2026-05-04 reviewed
    Agentic critic loop keeps code docs synced to changes

    DocSync: Agentic Documentation Maintenance via Critic-Guided Reflexion

    Sidhesh Badrinarayan +1

  5. cs.CR 2026-05-04 reviewed
    Binary patching works via decompile-repair-recompile

    SCRIBE: Practical Static Binary Patching via Binary-Aware Recompilation of Decompiled Code

    Han Dai +4

  6. cs.SE 2026-05-04 reviewed
    Datalog DSL in Lean translates queries to provable theorems

    A Shallow Embedding of Datalog in Lean

    Ramy Shahin

  7. cs.SE 2026-05-03 reviewed
    Foundation models detect Java refactoring bugs at 93.8% accuracy

    Foundation Models as Oracles for Refactoring Correctness Detection

    Rohit Gheyi +4

  8. cs.SE 2026-05-03 reviewed
    GitHub Actions audit finds 28% compliance with LLM hybrid checks

    How Compliant Are GitHub Actions Workflows? A Checklist-Based Study with LLM-Assisted Auditing

    Edward Abrokwah +1

  9. cs.SE 2026-05-03 reviewed
    This paper evaluates training-free classification of conventional commit messages using…

    Conventional Commit Classification using Large Language Models and Prompt Engineering

    H. M. Sazzad Quadir +2

  10. cs.AI 2026-05-03 reviewed
    ACDL standardizes precise descriptions of LLM agent contexts

    A Language for Describing Agentic LLM Contexts

    Noga Peleg Pelc +2

  11. cs.CR 2026-05-03 reviewed
    LLM agents cut false positives in security scans by 88 percent

    QASecClaw: A Multi-Agent LLM Approach for False Positive Reduction in Static Application Security Testing

    Mohd Ruhul Ameen +2

  12. cs.LG 2026-05-03 reviewed
    Declarative framework cuts RAG tuning code changes by 95%

    AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines

    Xintan Zeng +3

  13. cs.SE 2026-05-03 reviewed
    QSAF turns 34 circuit primitives into reusable hybrid-system components

    Quantum Software Architecture Framework (QSAF): A Component-Based Framework for Designing Hybrid Quantum-Classical Systems

    Arvind W. Kiwelekar +5

  14. cs.CR 2026-05-03 reviewed
    Expert patterns boost LLM vulnerability repair accuracy

    VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

    Jia Li +3

  15. cs.SE 2026-05-02 reviewed
    Sprint simulation teaches empirical control in Scrum projects

    A Lightweight Scrum Sprint Simulation to Help Learners Traverse the Empirical Process Control Threshold Concept

    Eduardo Miranda +2

  16. cs.SE 2026-05-02 reviewed
    Safety-gated memory for RL coding agents hits 80% accuracy

    Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

    Mehmet Iscan

  17. cs.SE 2026-05-02 reviewed
    Neuro-symbolic agents block invalid requirements by design

    Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse

    Ahmed Ibrahim

  18. cs.SE 2026-05-02 reviewed
    Genetic programming evolves scaling policies that cut microservice resource use

    Genetic Programming for Self-Adaptive Auto-Scaling of Microservices

    Jia Li +2

  19. cs.SE 2026-05-02 reviewed
    Unrestricted autonomy breaks LLM test repair in enterprise UIs

    Practical Limits of Autonomous Test Repair: A Multi-Agent Case Study with LLM-Driven Discovery and Self-Correction

    Hyukjoo Lee

  20. cs.SE 2026-05-02 reviewed
    LLM spec accuracy drops 20 percent after removing deceptive outputs

    LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation

    Dong Xu +11

  21. cs.SE 2026-05-02 reviewed
    ChatGPT supports nine categories of software design tasks

    Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

    Yifei Wang +7

  22. cs.DC 2026-05-02 reviewed
    Turing machine extension defines context-awareness

    On defining and modeling context-awareness

    Panteleimon Rodis

  23. cs.SE 2026-05-02 reviewed
    LLM feedback agents improve test coverage on C and Python code

    FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback

    Kushal Jasti +4

  24. cs.SE 2026-05-02 reviewed
    Interactive agents clarify vague specs before STL generation

    ClarifySTL: An Interactive LLM Agent Framework for STL Transformation through Requirements Clarification

    Yue Fang +5

  25. cs.SE 2026-05-01 reviewed
    AI code output rises but reliability lags without strong specs

    The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development

    Sabry E. Farrag

  26. cs.SE 2026-05-01 reviewed
    DDD simulator runs same microservice code under multiple consistency models

    A Domain-Driven Design Simulator for Business Logic-Rich Microservice Systems

    Daniel da Palma Pereira +1

  27. cs.SE 2026-05-01 reviewed
    Platform links every AI prompt to its code edits for replay

    RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions

    Keyu He +4

  28. cs.SE 2026-05-01 reviewed
    ProMoTA links high-level models to code with full traceability

    ProMoTA: a model-driven framework for end-to-end traceability analysis

    Sadaf Mustafiz +2

  29. cs.SE 2026-05-01 reviewed
    Shor ECDLP oracle in Qrisp breaks control semantics

    Semantics-Based Verification of an Implemented Shor Oracle for ECDLP in Qrisp

    Lei Zhang +1

  30. cs.SE 2026-05-01 reviewed
    LLM agents reproduce materials findings at 54 percent

    Can Coding Agents Reproduce Findings in Computational Materials Science?

    Ziyang Huang +17

  31. cs.SE 2026-05-01 reviewed
    GeoContra lifts LLM GIS correctness by 26 percent via contracts

    GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

    Yinhao Xiao +2

  32. cs.SE 2026-05-01 reviewed
    350k code preference pairs train multi-criteria reward models

    Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

    Indraneil Paul +2

  33. cs.SE 2026-05-01 reviewed
    350k code preferences train flexible multilingual reward models

    Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

    Indraneil Paul +2

  34. cs.LG 2026-05-01 reviewed
    Pass-rate rewards fail to beat binary rewards in code RL

    Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation

    Xin-Ye Li +5

  35. cs.SE 2026-05-01 reviewed
    Practitioners identify gaps in end-to-end autonomous driving tests

    From Research to Practice: An Interactive Rapid Review of Autonomous Driving System Testing in Industry

    Qunying Song +3

  36. cs.SE 2026-05-01 reviewed
    ML predicts energy of code blocks from static features

    EnCoDe: Energy Estimation of Source Code At Design-Time

    Shailender Goyal +2

  37. cs.SE 2026-05-01 reviewed
    Dataset shows API recommenders weaken on deep calls

    Q-ARE: An Evaluation Dataset for Query Based API Recommendation

    Shenglong Wu +2

  38. cs.SE 2026-05-01 reviewed
    Dense retrieval beats sparse for issue-commit links

    Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval

    Cole Morgan +3

  39. cs.SE 2026-05-01 reviewed
    PPO agent picks prompts for higher test coverage

    PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation

    Gourisetty Venkata Sai Koushik +5

  40. cs.SE 2026-05-01 reviewed
    Curriculum training lifts LLM code generation accuracy

    Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

    Shouyu Yin +3

  41. cs.CR 2026-05-01 reviewed
    Agent skills remain untrusted until verified by runtime

    Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

    Alfredo Metere

  42. cs.CR 2026-05-01 reviewed
    Agent skills stay untrusted until they pass verification tests

    Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

    Alfredo Metere

  43. cs.SE 2026-05-01 reviewed
    LLMs infill masked bug reports to uncover 27 Rust compiler bugs

    ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs

    Hongyan Gao +5

  44. cs.SE 2026-05-01 reviewed
    Fairness monitor agent cuts bias in LLM code by 65 percent

    Social Bias in LLM-Generated Code: Benchmark and Mitigation

    Fazle Rabbi +3

  45. cs.SE 2026-05-01 reviewed
    Agile team embeds log-based fraud alerts via weekly iterations

    Integrating Log-Based Security Analytics in Agile Workflows: A Real-World Experience Report

    Arpit Thool +1

  46. cs.SE 2026-05-01 reviewed
    Code model released openly after risk checks find no new threats

    Code World Model Preparedness Report

    Daniel Song +23

  47. cs.SE 2026-05-01 reviewed
    Code model cleared for open release after risk checks

    Code World Model Preparedness Report

    Daniel Song +23

  48. cs.CR 2026-04-30 reviewed
    Encrypted string operations enable private conformance checking

    A Privacy-Preserving Approach to Conformance Checking

    Luis Rodr\'iguez-Flores +3

  49. cs.SE 2026-04-30 reviewed
    Software leadership is managerial and interpersonal

    What Characterizes a Software Leader? Identifying Leadership Practices from Practitioners Social Media

    Murilo Coelho +5

  50. cs.SE 2026-04-30 reviewed
    Deptex finds true vulnerability reach by combining graphs and language models

    DEPTEX: Organization-First, Open Source Dependency Risk Monitoring

    Henry Ruckman-Utting +5