pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 14

  1. cs.SE 2026-04-28 reviewed
    Brief role-model stories in lectures support belonging in software courses

    Supporting Belonging in Software Engineering Through Role Models Exposure

    Ronnie de Souza Santos

  2. cs.AI 2026-04-27 reviewed
    Intent compilation turns partial goals into binding AI artifacts

    Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents

    Maximiliano Armesto +1

  3. cs.SE 2026-04-27 reviewed
    Product context retrieval lifts AI coding compliance from 46% to 95%

    Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%

    Drew Dillon +1

  4. cs.HC 2026-04-27 reviewed
    Speculative societies prompt OSS practitioners to rethink designer roles

    What If We Work Together? Fostering Reflections on Designer Inclusion in Open Source Software Through Speculative Design

    Rozhan Hozhabri Nezhad +2

  5. cs.CL 2026-04-27 reviewed
    Evidence rules stop research agents at the right time

    Don\'t Stop Early: Scalable Enterprise Deep Research with Controlled Information Flow and Evidence-Aware Termination

    Prafulla Kumar Choubey +7

  6. cs.SE 2026-04-27 reviewed
    LLMs biased to Python limit multilingual code tasks

    Large Language Models for Multilingual Code Intelligence: A Survey

    Chao Jiang +8

  7. cs.CL 2026-04-27 reviewed
    LLM auditors find fatal errors in agent benchmarks

    BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks

    Xinming Tu +5

  8. cs.CY 2026-04-27 reviewed
    Fine-tuning shifts AI safety scores in unpredictable ways

    Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

    Emaan Bilal Khan +3

  9. cs.SE 2026-04-27 reviewed
    The paper introduces FGDM, a four-agent framework that converts code into flow graphs and…

    FGDM: Reasoning Aware Multi-Agentic Framework for Software Bug Detection using Chain of Thought and Tree of Thought Prompting

    Srita Padmanabhuni +4

  10. cs.SE 2026-04-27 reviewed
    Under-specified prompts raise code correctness on rich tasks

    When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

    Amal Akli +3

  11. cs.SE 2026-04-27 reviewed
    Small finetuned model detects bad LLM code prompts at F1 0.80

    Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

    Amal Akli +3

  12. cs.SE 2026-04-27 reviewed
    Fine-tuned LLMs hit 1.00 structural fidelity on multi-file DSL edits

    Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

    Sivajeet Chand +3

  13. cs.SE 2026-04-27 reviewed
    SLMs on phones work only when given the smallest tasks

    Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

    William Oliveira

  14. cs.SE 2026-04-27 reviewed
    Mobile AI works reliably only when models do the least

    Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

    William Oliveira

  15. cs.SE 2026-04-27 reviewed
    LLM tools break standard evaluation rules in software engineering

    Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

    Utku Boran Torun +3

  16. cs.SE 2026-04-27 reviewed
    Markov chains predict LLM agent success times from traces

    Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

    Phat T. Tran-Truong +1

  17. cs.SE 2026-04-27 reviewed
    Pipeline migrates monoliths to serverless with 100% deployment success

    Mono2Sls: Automated Monolith-to-Serverless Migration via Multi-Stage Pipeline with Static Analysis

    Xingyan Chen +4

  18. cs.SE 2026-04-27 reviewed
    Review of 80 studies charts transformer use for finding code vulnerabilities

    A systematic literature Review for Transformer-based Software Vulnerability detection

    Fiza Naseer +4

  19. cs.SE 2026-04-27 reviewed
    Automated checks match developer labels only 44-62% for code review bots

    Understanding the Limits of Automated Evaluation for Code Review Bots in Practice

    Veli Karakaya +3

  20. cs.SE 2026-04-27 reviewed
    Survey maps student AI use across capstone projects

    How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study

    Michael Mircea +3

  21. cs.SE 2026-04-27 reviewed
    Structured knowledge turns LLM training into debuggable code

    Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

    Chenkai Pan +8

  22. cs.HC 2026-04-27 reviewed
    Tool generates personas to boost OSS developer empathy

    Putting a Face to the Issue: Fostering User Empathy of Open Source Software Developers With PersonaFlow

    Boniface Bahati Tadjuidje +2

  23. cs.SE 2026-04-27 reviewed
    More reviewer bot comments slow agentic PR resolution

    On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories

    Syeda Kaneez Fatima +5

  24. cs.SE 2026-04-27 reviewed
    Models reach only 74% on code questions linking definitions to calls

    SWE-QA: A Dataset and Benchmark for Complex Code Understanding

    La\"ila Elkoussy (LRE +3

  25. cs.CR 2026-04-27 reviewed
    Multi-agent SZZ raises F1 scores for vulnerability commit detection by up to 65%

    MAS-SZZ: Multi-Agentic SZZ Algorithm for Vulnerability-Inducing Commit Identification

    Sicong Cao +6

  26. cs.SE 2026-04-27 reviewed
    Humans drive creativity in design even when using LLMs

    Exploring Creativity in Human-Human-LLM Collaborative Software Design

    Victoria Jackson +3

  27. cs.LG 2026-04-27 reviewed
    One plugin interface unifies controls across diffusion models

    Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

    Zhongjie Duan +2

  28. cs.SE 2026-04-27 reviewed
    Evolving memory boosts private library code generation by 16%

    MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

    Mofei Li +3

  29. cs.SE 2026-04-27 reviewed
    Dynamic agents hit 95% success generating hardware reference models

    RefEvo: Agentic Design with Co-Evolutionary Verification for Agile Reference Model Generation

    Yifan Zhang +3

  30. cs.SE 2026-04-27 reviewed
    Basic agent with ADI fixes 63.8% of SWE-bench tasks

    Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis

    Jiahong Xiang +4

  31. cs.SE 2026-04-27 reviewed
    Software framework lets AI close the business experimentation loop

    Closing the Loop: A Software Framework for AI to Support Business Decision Making

    Jeffrey Wong +1

  32. cs.CR 2026-04-27 reviewed
    Go projects contain 7,473 crypto API misuses with uneven detector coverage

    Evaluating Cryptographic API Misuse Detectors for Go

    Vivi Andersson +1

  33. cs.SE 2026-04-27 reviewed
    Developers link to full migration guides in 83% of pull requests

    How Do Developers Use Migration Guides? A Case Study of Log4j

    Takahiro Monno +4

  34. cs.SE 2026-04-27 reviewed
    Developers link to full migration guides in 83 percent of pull requests

    How Do Developers Use Migration Guides? A Case Study of Log4j

    Takahiro Monno +4

  35. cs.AI 2026-04-27 reviewed
    Benchmark plus sentiment predicts AI agent adoption

    AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment

    Yuxuan Gao +2

  36. cs.SE 2026-04-27 reviewed
    Linking bug reports to fixes lifts vulnerability detection to 0.941 F1

    Vulnerability Identification by Harnessing Inter-connected Multi-Source Information

    Liyou Chen +5

  37. cs.SE 2026-04-27 reviewed
    Multi-agent constraints make decompiled binaries executable in 84-97% of cases

    Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery

    Yifan Zhang +4

  38. cs.PF 2026-04-26 reviewed
    Optimas automates GPU code optimization with 100% correctness

    Optimas: An Intelligent Analytics-Informed Generative AI Framework for Performance Optimization

    Mohammad Zaeed +2

  39. cs.CL 2026-04-26 reviewed
    LLM system automates 45% of support sessions from copilot corrections

    Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows

    Nikita Borovkov +6

  40. cs.SE 2026-04-26 reviewed
    6-33% of code review comments in scientific software are not useful

    Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

    Sharif Ahmed +1

  41. cs.SE 2026-04-26 reviewed
    Five-layer AI agent matches top coding tools on benchmarks

    KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant

    Koushik Sen

  42. cs.SE 2026-04-26 reviewed
    Fine-tuned LLMs answer code queries with focused UML diagrams

    Query2Diagram: Answering Developer Queries with UML Diagrams

    Oleg Baryshnikov (1) +7

  43. cs.CV 2026-04-26 reviewed
    Frontier agents succeed in only 20% of multi-day coworker tasks

    ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

    Fanqing Meng +48

  44. cs.SE 2026-04-26 reviewed
    LLMs classify code review comments using comment and diff

    Automated Classification of Human Code Review Comments with Large Language Models

    Semih \c{C}a\u{g}lar +2

  45. cs.SE 2026-04-26 reviewed
    DAG modeling doubles agent failure detection over end-to-end checks

    AgentEval: DAG-Structured Step-Level Evaluation for Agentic Workflows with Error Propagation Tracking

    Dongxin Guo +2

  46. cs.SE 2026-04-26 reviewed
    Grammar loop aligns CPS safety rules with simulations

    Grammar-Constrained Refinement of Safety Operational Rules Using Language in the Loop: What Could Go Wrong

    Khouloud Gaaloul +3

  47. cs.SE 2026-04-26 reviewed
    Requirements guide tests to detect 22-25 more business logic bugs

    Uncovering Business Logic Bugs via Semantics-Driven Unit Test Generation

    Chen Yang +1

  48. cs.SE 2026-04-26 reviewed
    LLM uncertainty propagates across workflows and people

    Uncertainty Propagation in LLM-Based Systems

    Boming Xia +5

  49. cs.SE 2026-04-25 reviewed
    Agents link browser symptoms to backend causes at 19.7% accuracy

    CUJBench: Benchmarking LLM-Agent on Cross-Modal Failure Diagnosis from Browser to Backend

    Haoming Meng

  50. cs.IR 2026-04-25 reviewed
    Prompt chaining lifts LLM accuracy on scientific text classification

    Automating Categorization of Scientific Texts with In-Context Learning and Prompt-Chaining in Large Language Models

    Gautam Kishore Shahi +1