pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 5

  1. cs.CR 2026-05-13 reviewed
    Tool finds 545 reference counting bugs in Linux kernel drivers

    Automatic Detection of Reference Counting Bugs in Linux Kernel Drivers

    Joe Hattori +2

  2. cs.CR 2026-05-13 reviewed
    DrvHorn uncovers 545 reference counting bugs in Linux v6.6 drivers

    Automatic Detection of Reference Counting Bugs in Linux Kernel Drivers

    Joe Hattori +2

  3. cs.AI 2026-05-13 reviewed
    Contrastive semantic model improves code translation

    Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

    Yuhan Wu +5

  4. cs.SE 2026-05-13 reviewed
    LLMs lag experts on system-level performance code

    PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

    Huihao Jing +7

  5. cs.SE 2026-05-13 reviewed
    Toolkit standardizes benchmarks for screenshot-to-code models

    UIBenchKit: A unified toolkit for design-to-code model evaluation

    Chinh T. Le +4

  6. cs.SE 2026-05-13 reviewed
    Code agents solve far fewer issues in full cycles than isolated tasks

    SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle

    Hao Guan +10

  7. cs.SE 2026-05-13 reviewed
    Code models miss over 93% of fixes from changes alone

    Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

    Nils Loose +4

  8. cs.CR 2026-05-13 reviewed
    Bonuses for security scans cut issue density in team code

    Security Incentivization: An Empirical Study of how Micropayments Impact Code Security

    Stefan Rass +7

  9. cs.CL 2026-05-13 reviewed
    LLM JSON stays valid inside tight token budgets

    TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints

    Yoshio Kato +1

  10. cs.SE 2026-05-13 reviewed
    Deeper thought per algorithm beats more candidates under fixed tokens

    Effective Harness Engineering for Algorithm Discovery with Coding Agents

    Yoichi Ishibashi +2

  11. cs.SE 2026-05-13 reviewed
    Protocols govern generated code via invariants and evidence chains

    Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

    Jun He +1

  12. cs.SE 2026-05-13 reviewed
    Protocols admit generated code only via signed compliance evidence

    Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

    Jun He +1

  13. cs.SE 2026-05-13 reviewed
    Protocols, not code, decide if generated software is admissible

    Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

    Jun He +1

  14. cs.SE 2026-05-13 reviewed
    10.7% of SWE-agent passes are lucky trial-and-error

    AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

    Priyam Sahoo +6

  15. cs.SE 2026-05-13 reviewed
    Metadata layer turns legacy SAS reports into AI-ready data

    A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study

    Jaime Yan

  16. cs.SE 2026-05-12 reviewed
    Open-source projects follow product life cycles

    Project Life Cycles in Open-Source Software

    Sanjiv Das +5

  17. cs.SE 2026-05-12 reviewed
    cozy is a comparative binary analysis tool that uses symbolic execution to find…

    Finding a Crab in the C: Assured Translation via Comparative Symbolic Execution

    Caleb Helbling +2

  18. eess.SY 2026-05-12 reviewed
    Natural language runs grid analyses in under two minutes

    Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics

    Boming Liu +2

  19. cs.SE 2026-05-12 reviewed
    Lattice structures LLM judgments for reliable program analysis

    Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

    Jacqueline L. Mitchell +1

  20. cs.SE 2026-05-12 reviewed
    LLMs match human accuracy in spotting usability requirements in reviews

    User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

    Cedric Wellhausen +2

  21. cs.SE 2026-05-12 reviewed
    Fine-tuned open LLM matches ChatGPT on code feedback quality

    Fine-Tuning Models for Automated Code Review Feedback

    Smitha S Kumar +3

  22. eess.SY 2026-05-12 reviewed
    Docker container makes Basilisk GN&C simulations reproducible

    Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference

    Anubhav Gupta

  23. cs.SE 2026-05-12 reviewed
    Nine LLM audits on prompts found 51 defects and converged to zero

    Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance

    Elias Calboreanu

  24. cs.SE 2026-05-12 reviewed
    MinTEJ terminal editor for Julia uses less memory than VS Code

    Minimalistic Terminal Editor for Julia Programming -- MinTEJ: A Friendly Approach for a Scientific Programmer

    Poornachandratejasvi Laxman Bhattar +3

  25. cs.SE 2026-05-12 reviewed
    LLMs fail most at strategy in GitHub issue fixes

    Characterizing the Failure Modes of LLMs in Resolving Real-World GitHub Issues

    Yanjie Jiang +5

  26. cs.SE 2026-05-12 reviewed
    Partial programs control risk in LLM code generation

    Uncertainty Quantification for LLM-based Code Generation

    Senrong Xu +8

  27. cs.SE 2026-05-12 reviewed
    Dataset delivers 449 reproducible locator breaks in web GUI tests

    ReproBreak: A Dataset of Reproducible Web Locator Breaks

    Thiago Santos de Moura +3

  28. cs.SE 2026-05-12 reviewed
    Dataset supplies 2440 proprietary industrial repositories

    CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research

    Vladislav Savenkov

  29. cs.SE 2026-05-12 reviewed
    Harness design stabilizes small language models at 95 percent success

    It's Not the Size: Harness Design Determines Operational Stability in Small Language Models

    Yong-eun Cho

  30. cs.SE 2026-05-12 reviewed
    Metamorphic testing and LLMs strengthen each other for AI quality checks

    Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey

    Zheng Zheng +4

  31. cs.SE 2026-05-12 reviewed
    Framework embeds values in CPS human monitoring rules

    HM-Req: A Framework for Embedding Values within CPS Human Monitoring Requirements

    Zoe Pfister +2

  32. cs.PL 2026-05-12 reviewed
    Diversified replicas detect correlated faults by ignoring addresses

    Divergent Multi-Version Execution (DME): Canonical Instruction-Trace Fault Detection via Structural Address-Space Decorrelation

    Petro Baran Yrievich

  33. cs.AI 2026-05-12 reviewed
    Microservices process thousands of documents per hour with OCR and LLMs

    Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

    Yao Fehlis +11

  34. cs.SE 2026-05-12 reviewed
    Agent decision traces vary up to 43 points in completeness across SDKs

    Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes

    Oleg Solozobov

  35. cs.SE 2026-05-12 reviewed
    Guided LLMs translate APL legacy code to working C#

    Neural Code Translation of Legacy Code: APL to C#

    Abdulrahman Ramadan +4

  36. cs.SE 2026-05-12 reviewed
    Print statements teach code models to reason step by step

    StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

    Hao Wang +3

  37. cs.SE 2026-05-12 reviewed
    Value and popularity drive OSS survival

    The Death Spiral of Open Source Projects: A Post-Mortem Analysis of Pull Request Workflow Dynamics

    Mohit Kaushik +1

  38. cs.AI 2026-05-12 reviewed
    Compiled interfaces cut agent token use by 57%

    SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

    Duling Xu +6

  39. cs.SE 2026-05-12 reviewed
    Bug localization replication fails after fixing data leak

    An Extensive Replication Study of the ABLoTS Approach for Bug Localization

    Feifei Niu +7

  40. cs.SE 2026-05-12 reviewed
    SMT-LLM resolves Python deps at 83.6 percent

    Breaking the Dependency Chaos: A Constraint-Driven Python Dependency Resolution Strategy with Selective LLM Imputation

    Kowshik Chowdhury +2

  41. cs.SE 2026-05-12 reviewed
    Seminar sets six research priorities for agents and software engineering

    A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar

    Davide Taibi +17

  42. cs.CR 2026-05-12 reviewed
    597-line harness supports fair comparisons of LLM pen-testing agents

    Cochise: A Reference Harness for Autonomous Penetration Testing

    Andreas Happe +1

  43. cs.SE 2026-05-12 reviewed
    Compiler feedback lifts neural decompilation success to 83.9 percent

    Decaf: Improving Neural Decompilation with Automatic Feedback and Search

    Alexander Shypula +2

  44. cs.SE 2026-05-12 reviewed
    Mined tokens lift LLM flaky test F1-score to 69.34%

    NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification

    Khondaker Tasnia Hoque +1

  45. cs.CR 2026-05-12 reviewed
    Risk lattice turns consent clicks into reusable options

    Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

    Ying Li +6

  46. cs.SE 2026-05-11 reviewed
    LLMs generate natural language specs to verify code compositionally

    Natural Language based Specification and Verification

    Zhaorui Li +1

  47. cs.LG 2026-05-11 reviewed
    Ranking own code attempts boosts single-sample accuracy

    Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

    Yizhu Jiao +5

  48. cs.LG 2026-05-11 reviewed
    Ranking own code attempts boosts single-rollout accuracy to match Best-of-4

    Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

    Yizhu Jiao +5

  49. cs.SE 2026-05-11 reviewed
    SysML model drives hardware verification directly via server link

    SHIA: A Direct SysML-Hardware Interface Architecture for Model-Centric Verification

    Charles Lewis +2

  50. cs.CR 2026-05-11 reviewed
    4714 GitHub workflows hijackable via crafted comments

    Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

    Neil Fendley +4