pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 7

  1. cs.SE 2026-05-10 reviewed
    Regional zoom beats global Pareto in 84-89% of SE tasks

    Zoom, Don't Wander: Why Regional Search Outperforms Pareto Reasoning and Global Optimization in Budget-Constrained SBSE

    Kishan Kumar Ganguly +1

  2. cs.MA 2026-05-10 reviewed
    LLM smart contracts score 8.29 points above human versions

    SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

    Abhinav Goel +3

  3. cs.SE 2026-05-10 reviewed
    ConCovUp lifts concurrency test coverage from 37% to 68%

    ConCovUp: Effective Agent-Based Test Driver Generation for Concurrency Testing

    Yuandao Cai +5

  4. cs.SE 2026-05-10 reviewed
    Belief-revision agents verify code authorship without training

    MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

    Jingwei Ye +7

  5. cs.SE 2026-05-10 reviewed
    Multi-agent belief revision verifies code authors without training

    MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

    Jingwei Ye +7

  6. cs.SE 2026-05-10 reviewed
    Multi-agent system verifies code authorship without training

    MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

    Jingwei Ye +7

  7. cs.SE 2026-05-10 reviewed
    Ethical safeguards prioritized in cost model for LLM education use

    Prediction Model of Motivators and Demotivators of Integrating Large Language Models in Software Engineering Education: An Empirical Study

    Maryam Khan +3

  8. cs.SE 2026-05-10 reviewed
    Model optimizes cost-efficient LLM integration in software engineering classes

    Prediction Model of Motivators and Demotivators of Integrating Large Language Models in Software Engineering Education: An Empirical Study

    Maryam Khan +3

  9. cs.SE 2026-05-10 reviewed
    Execution traces create first noise-free test for LLM code understanding

    An Execution-Verified Multi-Language Benchmark for Code Semantic Reasoning

    Yikun Li +9

  10. cs.LG 2026-05-10 reviewed
    LLM sim code runs but solves wrong physics

    Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code

    Zhenghan Song +6

  11. cs.SE 2026-05-10 reviewed
    Merlin turns natural language into CodeQL queries that raise accuracy 3.8x

    Generating Complex Code Analyzers from Natural Language Questions

    Amirmohammad Nazari +5

  12. quant-ph 2026-05-10 reviewed
    Memoized heuristics scale ion-trap qubit mapping

    Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

    Brent Russon +3

  13. cs.DB 2026-05-09 reviewed
    Krone decomposes logs into entity-action-status units for modular anomaly detection

    Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

    Lei Ma +7

  14. cs.AI 2026-05-09 reviewed
    Line-level rewards raise program repair success to 40.7% on SWE-bench

    BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

    Yuanhao Li +5

  15. cs.AI 2026-05-09 reviewed
    Line-level credit in RL lifts program repair to 40.7% on SWE-bench

    BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

    Yuanhao Li +5

  16. cs.AI 2026-05-09 reviewed
    Dual rewards boost code repair to 40.7% on SWE-bench

    BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

    Yuanhao Li +5

  17. cs.SE 2026-05-09 reviewed
    Developer reviews expose LLM code flaws missed by benchmarks

    Evaluating LLM-Generated Code: A Benchmark and Developer Study

    Joanna Szych +1

  18. cs.SE 2026-05-09 reviewed
    Fuzzer finds 64 inconsistencies in Solidity compilers

    ParityFuzz: Finding Inconsistencies across Solidity Compilers via Fine-Grained Mutation and Differential Analysis

    Bowei Su +4

  19. cs.AI 2026-05-09 reviewed
    AI safety guarantees proven in the framework

    Containment Verification: AI Safety Guarantees Independent of Alignment

    Royce Moon +1

  20. cs.SE 2026-05-09 reviewed
    Semantic distance beats disagreement counts for LLM code uncertainty

    Using Semantic Distance to Estimate Uncertainty in LLM-Based Code Generation

    Weilin He +2

  21. cs.SE 2026-05-09 reviewed
    Skill drift is contract violation in LLM agent libraries

    Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

    Linfeng Fan +3

  22. cs.SE 2026-05-09 reviewed
    Three-layer gate turns agent failures into bounded fixes

    Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

    Chenyu Zhao +9

  23. cs.SE 2026-05-09 reviewed
    LLMs mine tactics that let CoqHammer prove 24% more theorems

    A Learning Method for Symbolic Systems Using Large Language Models

    Jian Fang +2

  24. cs.SE 2026-05-09 reviewed
    Execution fingerprints beat text voting for LLM code

    Semantic Voting: Execution-Grounded Consensus for LLM Code Generation

    Shan Jiang +2

  25. cs.LG 2026-05-09 reviewed
    Sketching strategies outperforms flat sampling for code at fixed budget

    Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

    Shan Jiang +2

  26. cs.SE 2026-05-09 reviewed
    EvidenT repairs 54% of RISC-V package build failures

    EvidenT: An Evidence-Preserving Framework for Iterative System-Level Package Repair

    Chenyu Zhao +7

  27. cs.SE 2026-05-08 reviewed
    Models reach 92 percent on code but only 5 percent on provable code

    VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

    Zichen Xie +7

  28. cs.LG 2026-05-08 reviewed
    Benchmark reveals CUDA LLM fixers often degenerate code for tests

    CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

    Shiyang Li +3

  29. cs.SE 2026-05-08 reviewed
    Dataset collects 15k configs for AI coding tools

    A Dataset of Agentic AI Coding Tool Configurations

    Matthias Galster +6

  30. cs.SE 2026-05-08 reviewed
    AI agents omit runtime details in their own technical talks

    What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

    Junyu Huo +3

  31. cs.SE 2026-05-08 reviewed
    AI Agents Talk Security and Trust More Than Specific Code Issues

    What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

    Junyu Huo +3

  32. cs.LG 2026-05-08 reviewed
    Benchmark scores coding agents on engineering quality beyond bug fixes

    SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

    Mohit Raghavendra +14

  33. cs.CR 2026-05-08 reviewed
    Hardware attestation signs build provenance without trusting operators

    Kettle: Attested builds for verifiable software provenance

    Amean Asad +1

  34. cs.LG 2026-05-08 reviewed
    Cyclic tuning raises RAG quality by up to 54 percent

    CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

    Pengzhou Chen +1

  35. cs.SE 2026-05-08 reviewed
    AI agents start most PRs but humans keep merge authority

    Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

    Young Jo (seph) Chung +1

  36. cs.SE 2026-05-08 reviewed
    Collaborator AIs open most PRs while humans keep merge control

    Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

    Young Jo (seph) Chung +1

  37. cs.CL 2026-05-08 reviewed
    Adding one vector switches which tool a language model calls

    Tool Calling is Linearly Readable and Steerable in Language Models

    Zekun Wu (1 +9

  38. cs.SE 2026-05-08 reviewed
    Similar past faults annotated to guide LLMs in test code

    Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

    Golnaz Gharachorlu +4

  39. cs.SE 2026-05-08 reviewed
    Trace comparison creates a score for design conformance

    Evaluating Design Conformance Through Trace Comparison

    Reid Anderson +1

  40. cs.SE 2026-05-08 reviewed
    One rules engine powers play

    Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development

    Timothy C. Cogan

  41. cs.SE 2026-05-08 reviewed
    Bidirectional analysis finds 118 unsafe flows in 87 MCP servers

    Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

    Xinyi Hou +2

  42. cs.CR 2026-05-08 reviewed
    Security designs link to code checks in only a few ways

    Can I Check What I Designed? Mapping Security Design DSLs to Code Analyzers

    Sven Peldszus +5

  43. cs.SE 2026-05-08 reviewed
    Unified AST labels and graph matching link equivalent code across languages

    Bridging the Programming Language Gap: Constructing a Multilingual Shared Semantic Space through AST Unification and Graph Matching

    Junhao Chen +4

  44. cs.SE 2026-05-08 reviewed
    Agents patch code on 35-65% of already-fixed bugs

    Coding Agents Don't Know When to Act

    Thibaud Gloaguen +4

  45. cs.SE 2026-05-08 reviewed
    Neuro-symbolic method detects threats in stripped industrial binaries

    Securing the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Software

    Bowei Ning +6

  46. cs.SE 2026-05-08 reviewed
    SARC enforces agent constraints at runtime for zero hard violations

    SARC: A Governance-by-Architecture Framework for Agentic AI Systems

    Gaston Besanson

  47. cs.SE 2026-05-08 reviewed
    Manifesto recasts scaled agile around AI as first-class participant

    The AI-Native Large-Scale Agile Software Development Manifesto

    Ricardo Britto +3

  48. cs.SE 2026-05-08 reviewed
    Manifesto puts AI at core of large-scale agile development

    The AI-Native Large-Scale Agile Software Development Manifesto

    Ricardo Britto +3

  49. cs.SE 2026-05-08 reviewed
    Search tunes LLMs to cut harmful responses

    SafeTune: Search-based Harmfulness Minimisation for Large Language Models

    Giordano d'Aloisio +5

  50. cs.LG 2026-05-08 reviewed
    First benchmark supplies real data for LLM hyperparameter tuning

    LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems

    Siyu Wu +5