pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 2

  1. cs.SE 2026-05-20 reviewed
    Fortran scientific codes harbor many undefined-behavior-like defects

    RSE of a Quantum Transport Code and its Effects

    Christoph Conrads +1

  2. cs.SE 2026-05-20 reviewed
    LLMs turn technical privacy details into clear reports for workers

    Transforming Privacy Artifacts into Accessible Reports for Non-Technical Stakeholders

    Zoe Pfister +6

  3. cs.SE 2026-05-20 reviewed
    27% of Dockerfile SATD admissions couple with other files

    Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

    Wei Minn +7

  4. cs.LG 2026-05-20 reviewed
    RL fine-tuning lifts code generation pass@1 by 19% on MBPP

    Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

    Erfan Aghadavoodi Jolfaei +4

  5. cs.CR 2026-05-20 reviewed
    Spectral distances flag Trojaned DNN updates after one step

    Detecting Trojaned DNNs via Spectral Regression Analysis

    Samuele Pasini +2

  6. cs.CL 2026-05-20 reviewed
    Small classifier beats LLMs at pulling exact text from papers

    ACL-Verbatim: hallucination-free question answering for research

    G\'abor Recski +4

  7. cs.SE 2026-05-20 reviewed
    Refusal rate misranks LLMs on bio safety

    RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

    Lukas Weidener +4

    4 Piths
  8. cs.AI 2026-05-20 reviewed
    Five checkpoints enforce policy in generalist agents

    Governance by Construction for Generalist Agents

    Segev Shlomov +9

  9. cs.SE 2026-05-20 reviewed
    Bioinformatics bug detection rises 30-38% with new full-context dataset

    BioDefect: The First Dataset for Defect Detection in Bioinformatics Software

    Tianxiang Xu +5

  10. cs.SE 2026-05-20 reviewed
    LLMs endorse 32% of their own behavior-changing code rewrites

    Articulate but Wrong: Self-Review Failures in LLM-Based Code Modernization

    Gokul Chandra Purnachandra Reddy +2

  11. cs.SE 2026-05-20 reviewed
    Contextual data makes code smell detection more actionable

    An Event-Driven Tool for Context-Aware Code Smell Detection Using SmellDSL

    Matheus dos Santos Viegas +3

  12. cs.MA 2026-05-19 reviewed
    State management beats workspace isolation in multi-agent tasks

    Multi-agent Collaboration with State Management

    Mengyang Liu +4

  13. cs.AI 2026-05-19 reviewed
    LLM agent accuracy drops to 0.54-0.62 without labels

    AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

    Parsa Mazaheri +1

  14. cs.SE 2026-05-19 reviewed
    Privacy views raise coaching adherence from 0.48 to 0.74

    Privacy-by-Design Adaptive Group Assignment for Digital Lifestyle Coaching at Scale

    Nariman Mani +1

  15. cs.PL 2026-05-19 reviewed
    Frama-C plugin checks non-functional rules for automotive C

    Contract Based Verification of Non-functional Requirements for Embedded Automotive C Code

    Jesper Amilon +3

  16. cs.SE 2026-05-19 reviewed
    LLM tests catch all 16 anomalies where manual checks find only 7

    A Multi-Layer Testing Framework for Automated Data Quality Assurance in Cloud-Native ELT Pipelines

    Ismail Gargouri +1

  17. cs.SE 2026-05-19 reviewed
    Code gen picks winner by clustering behaviors on auto-generated inputs

    Code Generation by Differential Test Time Scaling

    Yifeng He +4

  18. cs.SE 2026-05-19 reviewed
    Agentic AI coding improves with structured verification loops

    Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

    Christopher Koch

  19. cs.SE 2026-05-19 reviewed
    Methodology turns Bodies of Knowledge into assessable competencies

    A Semantic-Web Oriented Competency Model for Engineering Programs

    Nicolas Evain (LIUPPA) +2

  20. cs.AI 2026-05-19 reviewed
    Four-part SDB contract organizes LLM agent runtimes

    A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

    Vasundra Srinivasan

  21. cs.SE 2026-05-19 reviewed
    Taxonomy organizes 248 studies on combined program analyses

    Combined Program Analysis Techniques: A Systematic Mapping Study

    Pietro Braione +5

  22. cs.SE 2026-05-19 reviewed
    Staged analysis improves LLM recovery of ROS 2 architectures

    Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction

    Dominique Briechle +7

  23. cs.SE 2026-05-19 reviewed
    Cleaner code reduces agent token use by 7-8% with no change in success

    Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study

    Priyansh Trivedi +1

  24. cs.SE 2026-05-19 reviewed
    Agent skills from expert methods beat docs for PostgreSQL tuning

    A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

    Hongyu Lin +6

  25. cs.SE 2026-05-19 reviewed
    Health data lakehouse shown usable for mixed-skill teams

    OpenHealth Lake: Designing and testing a data lakehouse platform for health applications

    Danilo Silva +5

  26. cs.SE 2026-05-19 reviewed
    LLMs Simplify OOD but Omit Key Abstractions

    Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?

    Zushuai Zhang +2

  27. cs.AI 2026-05-19 reviewed
    LLMs optimize code via priors

    Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

    Dmitry Redko (1) +9

  28. cs.AI 2026-05-19 reviewed
    Hard-coded verifiers beat LLM judges at matching human evaluations

    OpenComputer: Verifiable Software Worlds for Computer-Use Agents

    Jinbiao Wei +6

  29. quant-ph 2026-05-19 reviewed
    Quantum tests can live inside .qasm circuit files

    QUTest: A Native Testing Framework for Quantum Programs

    Jos\'e Campos

  30. cs.CR 2026-05-19 reviewed
    Agent fixes 89% of flaws in source-free industrial software

    SCARA: A Semantics-Constrained Autonomous Remediation Agent for Opaque Industrial Software Vulnerabilities

    Bowei Ning +6

  31. cs.SE 2026-05-19 reviewed
    Criterion-level pairwise judgments lift code judge accuracy to 66.3%

    CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

    Zhenyu Li +3

  32. cs.SE 2026-05-19 reviewed
    Study catalogs 301 real tile-program bugs from GitHub

    Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection

    Ravishka Rathnasuriya +6

  33. cs.HC 2026-05-19 reviewed
    Single-file AI tools push accessibility boundaries outward

    The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

    Rizwan Jahangir +1

  34. cs.CL 2026-05-19 reviewed
    One LLM system optimizes text to beat specialists on six tasks

    optimize_anything: A Universal API for Optimizing any Text Parameter

    Lakshya A Agrawal +13

  35. cs.AI 2026-05-19 reviewed
    Governance recipe lifts LLM skill-library performance from 0.26 to 0.58

    Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

    Xing Zhang +6

  36. cs.SE 2026-05-19 reviewed
    MILP solves fairness repair for neural networks with formal guarantees

    Provable Fairness Repair for Deep Neural Networks

    Jianan Ma +3

  37. cs.SE 2026-05-19 reviewed
    Dependency repair shrinks programs 52 percent more than syntax-only reducers

    DRReduce: Enhancing Syntax-Guided Program Reduction with Dependency Reconstruction

    Qiong Feng +4

  38. cs.SE 2026-05-19 reviewed
    Code models now decide when to answer and when to defer

    When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions

    Ravishka Rathnasuriya +1

  39. cs.SE 2026-05-19 reviewed
    Input adaptation cuts code model mispredictions without retraining

    On-the-Fly Input Adaptation for Reliable Code Intelligence

    Ravishka Rathnasuriya +1

  40. cs.AI 2026-05-19 reviewed
    MOCHA improves agent skill correctness on every task

    MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

    Md Mehrab Tanjim +8

  41. cs.SE 2026-05-19 reviewed
    Multi-agent system hardens test updates with mutations

    MuMuTestUp: Mutation-based Multi-Agent Test Case Update

    Dawei Tian (1) +9

  42. cs.SE 2026-05-19 reviewed
    Self-healing web apps detect faults at 90.7% and recover 56% faster

    When Web Apps Heal Themselves: A MAPE-K Based Approach to Fault Tolerance and Adaptive Recovery

    Sales Aribe Jr +1

  43. cs.SE 2026-05-18 reviewed
    LLM agents turn switch manuals into graphs at 97-99% accuracy

    Supporting System Testing with a Multi-Agent LLM-based Framework for Knowledge Graph Extraction: A Case Study with Ethernet Switch Systems

    Rongqi Pan +5

  44. cs.SE 2026-05-18 reviewed
    AI restructures open source docs to cut cognitive overload

    Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

    Zixuan Feng +4

  45. cs.SE 2026-05-18 reviewed
    RL agent refines prompts to boost LLM code pass rates

    Prompt Optimization for LLM Code Generation via Reinforcement Learning

    Ali Mohammadi Esfahani +2

  46. cs.SE 2026-05-18 reviewed
    Multi-agent pipeline extracts traceable specs from legacy code

    Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

    Sanderson Oliveira de Macedo +1

  47. cs.SE 2026-05-18 reviewed
    Stripping consent declarations raises overeager rate in coding agents

    Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

    Yubin Qu +6

  48. cs.IR 2026-05-18 reviewed
    q-log odds lift BM25 NDCG@10 by 89% on code search

    Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

    Santosh Kumar Radha +1

  49. cs.SE 2026-05-18 reviewed
    One Engineer With AI Agents Finishes Four-Person Job In Half The Time

    One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise

    Marcelo Vilas Boas +4

  50. cs.SE 2026-05-18 reviewed
    One engineer with AI agents finishes four-person project in half the time

    One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise

    Marcelo Vilas Boas +4