pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 17

  1. cs.SE 2026-04-21 reviewed
    Fuzzing finds bugs in deductive verifiers

    Crash-free Deductive Verifiers

    Wander Nauta +2

  2. cs.CR 2026-04-21 reviewed
    DynaHug catches malicious ML models by watching runtime behavior

    Malicious ML Model Detection by Learning Dynamic Behaviors

    Sarang Nambiar +2

  3. cs.SE 2026-04-21 reviewed
    Tool flags code-doc mismatches only when tests prove the mismatch

    CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation

    Tobias Kiecker +4

  4. cs.SE 2026-04-21 reviewed
    Framework maps stakeholder views to formal SysML v2 architecture

    Towards Formalising Stakeholder Context using SysML v2

    Matthew Harrison +4

  5. cs.SE 2026-04-21 reviewed
    EnergyTrackr flags energy spikes in Java commits

    Systematic Detection of Energy Regression and Corresponding Code Patterns in Java Projects

    Fran\c{c}ois Bechet +4

  6. cs.AI 2026-04-21 reviewed
    LLM agents reach only 35 percent CTF checkpoint completion

    Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

    Ali Al-Kaswan +5

  7. cs.SE 2026-04-21 reviewed
    Mocking info from tests guides LLMs to better unit tests

    Improving LLM-Driven Test Generation by Learning from Mocking Information

    Jamie Lee +5

  8. cs.SE 2026-04-21 reviewed
    Simulated debugging boosts LLM bug fixes by 26% on Defects4J

    DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

    Linhao Wu +11

  9. cs.HC 2026-04-21 reviewed
    Four-layer workspace structures human-AI co-development of VA tools

    BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications

    Thilo Spinner +3

  10. cs.SE 2026-04-21 reviewed
    Iterative retriever lifts bug test generation rates by 20-32 percent

    iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation

    Junyi Wang +2

  11. cs.SE 2026-04-21 reviewed
    Large models sketch edits, small models apply them

    Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

    Chaozheng Wang +9

  12. cs.SE 2026-04-21 reviewed
    Empathic IDE matches standard tools on learning but helps more with errors

    Towards More Empathic Programming Environments: An Experimental Empathic AI-Enhanced IDE

    Justin Rainier Go +4

  13. cs.SE 2026-04-21 reviewed
    Mutations expose inconsistencies in 15% of Code LLM responses

    MUCOCO: Automated Consistency Testing of Code LLMs

    Chua Jin Chou +2

  14. cs.SE 2026-04-21 reviewed
    Multimodal AI spots GUI defects in multi-window mobile apps

    Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

    Xinyao Zhang +7

  15. cs.CR 2026-04-21 reviewed
    Adversarial agents eliminate 79% of LLM defect candidates

    Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

    Abhinav Agarwal

  16. cs.CR 2026-04-21 reviewed
    Security is relative to project contracts

    Security Is Relative: Training-Free Vulnerability Detection via Multi-Agent Behavioral Contract Synthesis

    Yongchao Wang +1

  17. cs.SE 2026-04-21 reviewed
    Framework turns aerospace requirements into LTL at 85% precision

    Automated LTL Specification Generation from Industrial Aerospace Requirements

    Zhi Ma +7

  18. cs.SE 2026-04-20 reviewed
    SVGD seeds raise ADS safety violation rates

    From Particles to Perils: SVGD-Based Hazardous Scenario Generation for Autonomous Driving Systems Testing

    Linfeng Liang +3

  19. cs.HC 2026-04-20 reviewed
    Graph interface turns AI coding into branching explorations

    Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

    Vassilios Exarhakos +2

  20. cs.SE 2026-04-20 reviewed
    AI-human loop cuts bug report labeling effort by 196%

    Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

    Guoming Long +3

  21. cs.SE 2026-04-20 reviewed
    Structural checks raise EDA code success without runtime debugging

    Structural Verification for Reliable EDA Code Generation without Tool-in-the-Loop Debugging

    Dinithi Jayasuriya +4

  22. cs.SE 2026-04-20 reviewed
    Cutoff theorem bounds verification search for DSLTrans properties

    Tractable Verification of Model Transformations: A Cutoff-Theorem Approach for DSLTrans

    Levi Lucio

  23. cs.SE 2026-04-20 reviewed
    AI transformation methods lack systematic guidance on ML task derivation

    From Business Problems to AI Solutions: Where Does Transformation Support Fail

    Abir Trabelsi +3

  24. cs.CR 2026-04-20 reviewed
    Only 0.4% of Android apps match privacy policies to their logs

    Do Privacy Policies Match with the Logs? An Empirical Study of Privacy Disclosure in Android Application Logs

    Zhiyuan Chen +6

  25. cs.SE 2026-04-20 reviewed
    Sentence transformers filter SCA alerts to 89% F1

    Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts

    Tam\'as Aladics +3

  26. cs.SE 2026-04-20 reviewed
    Direct TypeScript compiler parser speeds up large-repo indexing for AI agents

    TypeScript Repository Indexing for Code Agent Retrieval

    Junsong Pu +2

  27. cs.SE 2026-04-20 reviewed
    Agent builds playable web games from prompts where LLMs fail

    OpenGame: Open Agentic Coding for Games

    Yilei Jiang +10

  28. cs.SE 2026-04-20 reviewed
    AI software ecosystems show emergent failures from agent interactions

    More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

    Daniel Russo

  29. cs.SE 2026-04-20 reviewed
    Co-locating tests yields near-perfect AI code preservation

    Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

    \'Eric Jacopin

  30. cs.SE 2026-04-20 reviewed
    AI bot PR frequency tied to lower CI/CD success rates

    Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows

    Syed Muhammad Ashhar Shah (1) +5

  31. cs.SE 2026-04-20 reviewed
    Context composition causally shapes LLM failure explanation quality

    From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

    Julius Porbeck +4

  32. cs.SE 2026-04-20 reviewed
    Context composition causally shapes LLM bug explanation quality

    From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

    Julius Porbeck +4

  33. cs.AI 2026-04-20 reviewed
    Modular adapters beat fine-tuning on hard SQL queries

    LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL

    Salmane Chafik +2

  34. cs.SE 2026-04-20 reviewed
    LLM pipeline formalizes specs into properties at 77.8% accuracy

    Towards an Agentic LLM-based Approach to Requirement Formalization from Unstructured Specifications

    Alberto Tagliaferro +3

  35. cs.SE 2026-04-20 reviewed
    WebCompass benchmark evaluates full web coding workflows for AI models

    WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

    Xinping Lei +18

  36. cs.SE 2026-04-20 reviewed
    Real execution replaces mental simulation in LLM coding

    SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

    Woojin Lee +1

  37. cs.SE 2026-04-20 reviewed
    SystemC prototypes remove false positives from embedded fuzzing

    Stateful Embedded Fuzzing with Peripheral-Accurate SystemC Virtual Prototypes

    Chiara Ghinami +4

  38. cs.OS 2026-04-20 reviewed
    Processes and pipes made lightweight for far memory accelerators

    Proxics: an efficient programming model for far memory accelerators

    Zikai Liu +5

  39. cs.SE 2026-04-20 reviewed
    Fairness-first design thinking embeds equity into software architecture

    Fairness-First Design Thinking for Software Architecture

    Iffat Fatima +2

  40. cs.SE 2026-04-20 reviewed
    7B model beats larger LLMs at code translation without parallel examples

    CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

    Shangyu Li +9

  41. cs.SE 2026-04-20 reviewed
    AI systems should treat choices as governed tuned variables

    Statistical Software Engineering with Tuned Variables

    Nimrod Busany

  42. cs.SE 2026-04-20 reviewed
    API sequence mining boosts library fuzz coverage by 8.54%

    MASFuzzer: Fuzz Driver Generation and Adaptive Scheduling via Multidimensional API Sequences

    Xingyu Liu +3

  43. cs.SE 2026-04-20 reviewed
    PTMs added late in projects and accumulate rather than replaced

    When AI Models Become Dependencies: Studying the Evolution of Pre-Trained Model Reuse in Downstream Software Systems

    Peerachai Banyongrakkul +4

  44. cs.SE 2026-04-20 reviewed
    Framework detects every GitHub abuse type above 89% accuracy

    Weaponizing the Commons: A Taxonomy and Detection Framework of Abuse on GitHub

    Yuli Cheng +5

  45. cs.SE 2026-04-20 reviewed
    Ten cache smells affect 89% of GitLab CI/CD projects

    Cache-Related Smells in GitLab CI/CD: Comprehensive Catalog, Automated Detection, and Empirical Evidence

    Francesco Urdih +2

  46. cs.SE 2026-04-20 reviewed
    Graph consensus layer must replace code as AI coding artifact

    Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

    Tianfu Wang +7

  47. cs.AI 2026-04-20 reviewed
    Joint prompt and tool optimization raises agent success 5-20%

    JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

    Sandip Ghoshal +11

  48. cs.SE 2026-04-20 reviewed
    Video analysis lets AI grade diverse Scratch programs accurately

    Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation

    Donglin Li +3

  49. cs.SE 2026-04-20 reviewed
    Ground truth tests show debloaters remove needed code or keep extras

    Revisiting Code Debloating with Ground Truth-based Evaluation

    Muhammad Bilal +6

  50. cs.SE 2026-04-20 reviewed
    GLMTest raises branch accuracy to 50% by conditioning on code graphs

    Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics

    Khang Tran +3