pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 1

  1. cs.AI 2026-05-22 reviewed
    Claude agent verifies programs at 98 percent success rate

    Agentic Proving for Program Verification

    Alessandro Sosso +2

  2. cs.LG 2026-05-22 reviewed
    Agents fail quantitative goals without progress tracking

    Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

    Yuandao Cai +4

  3. cs.PL 2026-05-22 reviewed
    JVM microbenchmarks yield misleading results from unrealistic profiles

    Misleading Microbenchmarks on the Java Virtual Machines

    Filippo Schiavio +2

  4. cs.PL 2026-05-22 reviewed
    SQL benchmarks turned into Java Stream tests expose best parallel patterns

    JEDI: Java Evaluation of Declarative and Imperative Queries

    Filippo Schiavio +1

  5. cs.SE 2026-05-22 reviewed
    Rust auto-enforces 48% of applicable MISRA C++ rules

    MISRust: Mapping MISRA-C++ Coding Guidelines to the Rust Programming Language

    Marius Molz +4

  6. cs.SE 2026-05-22 reviewed
    Enterprise AI needs risk reduction testing

    AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems

    Chitra Badagi +3

  7. cs.PL 2026-05-22 reviewed
    Compiler framework cuts runtime 45% while holding energy fixed

    MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level Optimization

    Amirhosein Sadr +1

  8. cs.SE 2026-05-22 reviewed
    AI coding assistants cut coding time but double worsened experience reports

    The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

    Annie Vella +1

  9. cs.SE 2026-05-21 reviewed
    Philosophical dispositions produce 51% unique AI code review findings

    Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study

    Kaushal Bansal

  10. cs.SE 2026-05-21 reviewed
    All seven LLMs generate vulnerable code in developer-like tests

    Security of LLM-generated Code: A Comparative Analysis

    Srivathsan G Morkonda +2

  11. cs.SE 2026-05-21 reviewed
    Kubernetes agent framework shows retrieval yields only partial falsification

    A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification

    Joshua Odmark +2

  12. cs.SE 2026-05-21 reviewed
    Input-output time proxies best match expert code comprehension rankings

    On the Reliability of Code Comprehension Proxies

    Erfan Arvan +3

  13. cs.SE 2026-05-21 reviewed
    Flipping optimization branches reveals 21 DBMS performance bugs

    Finding Performance Issues in Database Systems by Exploiting Dormant Code Paths

    Jinsheng Ba +1

  14. cs.SE 2026-05-21 reviewed
    LLM code smells found in 73.5% of analyzed systems

    LLM Code Smells: A Taxonomy and Detection Approach

    Zacharie Chenail-Larcher +4

  15. cs.CV 2026-05-21 reviewed
    Toolkit automates annotation of child-caregiver eye-tracking videos

    GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction

    Iba Baig +7

  16. cs.SE 2026-05-21 reviewed
    FAME detects log anomalies per message with 76x less labeling

    FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

    Huanchi Wang +5

  17. cs.AI 2026-05-21 reviewed
    One handler generates both streaming API and MCP tool

    HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

    Edwin Jose

  18. cs.SE 2026-05-21 reviewed
    Contractual skills turn agent instructions into inspectable task contracts

    Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

    Ting Liu

  19. cs.CR 2026-05-21 reviewed
    AI Framework Secures Cardless Banking Against Fraud

    Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

    Md Israfeel

  20. cs.CL 2026-05-21 reviewed
    Multiple metrics required to judge synthetic data for tool-calling agents

    SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

    Shuaiqi Wang +3

  21. cs.SE 2026-05-21 reviewed
    Rejections overstate AI agent errors in open source PRs

    Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study

    Sien Reeve O. Peralta +10

  22. cs.SE 2026-05-21 reviewed
    Refinement more than doubles compilability of agent patches

    "Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution

    Zhao Tian +6

  23. cs.CV 2026-05-21 reviewed
    Explicit baseline fixes attribution errors in neural explanations

    The Neglected Baseline in Model Interpretation

    Yongjin Cui +1

  24. cs.LG 2026-05-21 reviewed
    Adversarial scaling reveals LLM code weaknesses

    VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation

    Yifan Bai +9

  25. cs.SE 2026-05-21 reviewed
    GenAI adds hidden costs to developer well-being

    At What Cost? Software Developers' Well-Being in the Age of GenAI

    Mariam Guizani +3

  26. cs.MA 2026-05-21 reviewed
    Trial harnesses let AI agents turn outcomes into process updates

    Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators

    Chengcheng Wang +5

  27. cs.CR 2026-05-21 reviewed
    Attacks lift autonomous agent risk rate from 28.3% to 52.6%

    Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

    Jianan Ma +10

  28. cs.SE 2026-05-21 reviewed
    The paper describes an architecture that combines DevOps practices with decentralized…

    An Architecture for Decentralised Deployment and Operation of Blockchain Applications

    Fabian Stiehle +2

  29. cs.SE 2026-05-21 reviewed
    LLMs verify only 10% of test suites on code mutations

    SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

    Yuxuan Sun +8

  30. cs.NI 2026-05-21 reviewed
    Syntax-driven repair fixes 97.5% of network config errors

    Astragalus: Automatic Configuration Repair for Production Networks

    Zhenrong Gu +3

  31. cs.SE 2026-05-21 reviewed
    System repairs TEE partitioning errors at 87.6 percent success

    Automated Repair of TEE Partitioning Issues via DSL-Guided and LLM-Assisted Patching

    Chengyan Ma +6

  32. cs.SE 2026-05-21 reviewed
    LLM mocks let symbolic execution find TEE input flaws

    Finding Missing Input Validation in TEEs via LLM-Assisted Symbolic Execution

    Chengyan Ma +5

  33. cs.SE 2026-05-21 reviewed
    Patch-guided trajectories raise SWE agent fixes by 10.8 points at 15% lower cost

    From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

    Murong Ma +9

  34. cs.SE 2026-05-21 reviewed
    LLM summaries add context

    Deterministic vs. Probabilistic Summarisation: An Empirical Trade-off Study in Design Pattern Centric Java Code

    Najam Nazar +1

  35. cs.SE 2026-05-21 reviewed
    PITMuS maps bytecode mutants to source edits for fresh bug datasets

    PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction

    Tasfia Tasnim +1

  36. cs.CR 2026-05-20 reviewed
    Four principles let LLM agent build correct fuzz harnesses

    Quality-Assured Fuzz Harness Generation via the Four Principles Framework

    Ze Sheng +5

  37. cs.CR 2026-05-20 reviewed
    Multi-agent LLM system finds 29 zero-day vulnerabilities

    FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

    Ze Sheng +4

  38. cs.SE 2026-05-20 reviewed
    Agile workshop formulates four propositions to close research gaps

    The 2nd Workshop on Agile Practice & Research: A Summary and Call For Research

    Karen Eilers +5

  39. cs.SE 2026-05-20 reviewed
    ReproFlake supplies scripts to reproduce failures in 1115 flaky tests

    A Dataset of Reproducible Flaky-Test Failures

    Suzzana Rafi +5

  40. cs.CR 2026-05-20 reviewed
    Dataset unifies 73k binaries with build variations and CVE history

    ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

    Chang Liu +5

  41. cs.AI 2026-05-20 reviewed
    Hybrid OOD monitors lift LLM failure recall from 39 to 45 percent

    Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

    Dylan Feng +3

  42. cs.CL 2026-05-20 reviewed
    LLMs reach 100% consistency adapting grammars to metamodel changes

    Leveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-Evolution

    Weixing Zhang +4

  43. cs.SE 2026-05-20 reviewed
    AI refactoring PRs improve quality in 22.5% of cases

    Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

    Mohamed Almukhtar +2

  44. cs.SE 2026-05-20 reviewed
    Agents propose specs, solvers verify LLM-generated code

    Agentic Model Checking

    Youcheng Sun +3

  45. cs.SE 2026-05-20 reviewed
    Stdlib reimplementations match third-party Python library speeds

    Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries

    Peng Ding +1

  46. cs.SE 2026-05-20 reviewed
    Voxel reconstruction validates navmeshes with less exploration

    Validating Navmesh using Geometry: Voxel-Based Analysis with Prioritized Exploration

    Ramesh Raghavan +5

  47. cs.SE 2026-05-20 reviewed
    Agents pass visible tests but fail held-out usage tests as tasks lengthen

    SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

    Bingchen Zhao +3

  48. cs.SE 2026-05-20 reviewed
    SPLE review compares adoption models and AI challenges

    Software Product Line Engineering: Adoption, Tooling and AI Era Challenges

    Najam Nazar

  49. cs.AI 2026-05-20 reviewed
    Multi-agent system turns full LLM traces into evidence-backed insights

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Akshay Manglik +8

  50. cs.AI 2026-05-20 reviewed
    Multi-agent reports raise LLM scaffold performance by 30 points

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Akshay Manglik +8