pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 16

  1. cs.SE 2026-04-23 reviewed
    Models generate correct code without public tests

    You Don't Need Public Tests to Generate Correct Code

    Kaushitha Silva +1

  2. cs.SE 2026-04-23 reviewed
    Bug variants reveal memorization in LLM repair models

    A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

    Milan De Koning +3

  3. cs.SE 2026-04-23 reviewed
    Decomposition plus mutation refines LLM specs to verify real programs

    SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification

    Lezhi Ma +5

  4. cs.AI 2026-04-23 reviewed
    Bounds on neural-net safety probability under random inputs

    Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation

    Jingyang Li +3

  5. cs.SE 2026-04-23 reviewed
    GPT dominates generative AI use in IT project management

    A systematic review of generative AI usage for IT project management

    Ionut Anghel +1

  6. cs.SE 2026-04-23 reviewed
    Ambiguous requirements cut LLM code accuracy

    Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

    Di Yang +9

  7. cs.SE 2026-04-23 reviewed
    IRAP turns vague specs into math functions with 40x gains

    Conjecture and Inquiry: Quantifying Software Performance Requirements via Interactive Retrieval-Augmented Preference Elicitation

    Shihai Wang +1

  8. cs.CL 2026-04-23 reviewed
    Modular checks push GUI agents past human performance on OSWorld

    VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

    Qijun Han +13

  9. cs.LG 2026-04-23 reviewed
    The authors adapted their prior mdok method for machine-generated text detection to…

    mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

    Adam Skurla +2

  10. cs.CR 2026-04-23 reviewed
    Three LLM experts detect code vulnerabilities at 77% F1 for two cents

    Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

    Zhaohui Geoffrey Wang

  11. cs.SE 2026-04-23 reviewed
    SBOM mismatches produce inconsistent vulnerability reports

    Hidden Dependencies and Component Variants in SBOM-Based Software Composition Analysis

    Shawn Rasheed +4

  12. cs.AI 2026-04-23 reviewed
    Meta-predicates flag unsuitable evidence in clinical AI rules upfront

    Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages

    Michael Bouzinier +4

  13. cs.SE 2026-04-23 reviewed
    Execution feedback beats pipeline complexity for 1-3B code models

    Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation

    Charles Junichi McAndrews

  14. cs.SE 2026-04-22 reviewed
    Ground-truth dataset exposes differences in vulnerability detectors

    A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

    Peter Mandl +3

  15. cs.DC 2026-04-22 reviewed
    GPU runs 20,000 GWAS phenotypes in 20 minutes

    TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

    Xingzhong Zhao +7

  16. cs.AI 2026-04-22 reviewed
    POMDP models hidden user states to auto-refine LLM prompts

    Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

    Gricel V\'azquez +4

  17. cs.SE 2026-04-22 reviewed
    37% of AI governance prompts miss key structure

    Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

    Christo Zietsman

  18. cs.CR 2026-04-22 reviewed
    LLM gateways often swap models and misbill users

    Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways

    Guanjie Lin +5

  19. cs.SE 2026-04-22 reviewed
    The paper examines residual security risks in patched code by measuring semantic and…

    Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

    Mohammad Farhad +1

  20. cs.AI 2026-04-22 reviewed
    Value conflict tests show alignment faking in 7B models

    Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

    Inderjeet Nair +2

  21. cs.SE 2026-04-22 reviewed
    LLM tool provides 24/7 feedback for software engineering students

    Autonomous LLM-generated Feedback for Student Exercises in Introductory Software Engineering Courses

    Andreas Metzger

  22. cs.AI 2026-04-22 reviewed
    Coding agents retain just 44% of code in real commits

    SWE-chat: Coding Agent Interactions From Real Users in the Wild

    Joachim Baumann +5

  23. cs.HC 2026-04-22 reviewed
    Serverless toolkit builds urban VA prototypes in hours

    Autark: A Serverless Toolkit for Prototyping Urban Visual Analytics Systems

    Lucas Alexandre +7

  24. cs.SE 2026-04-22 reviewed
    High AUC Does Not Ensure Defect Models Beat Random at All Thresholds

    Evaluating Software Defect Prediction Models via the Area Under the ROC Curve Can Be Misleading

    Luigi Lavazza +2

  25. cs.SE 2026-04-22 reviewed
    QuanForge distinguishes QNN test suites and finds weak circuit regions

    QuanForge: A Mutation Testing Framework for Quantum Neural Networks

    Minqi Shao +2

  26. cs.SE 2026-04-22 reviewed
    GNNs spot LLM-written safety cases at F1 0.94

    Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

    Fariz Ikhwantri +1

  27. cs.SE 2026-04-22 reviewed
    LLM regex masks raise log parsing accuracy to 97.6%

    DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks

    Amir Shetaia +1

  28. cs.SE 2026-04-22 reviewed
    LLMs reach 88-89% accuracy on product line blueprint analysis

    Early-Stage Product Line Validation Using LLMs: A Study on Semi-Formal Blueprint Analysis

    Viet-Man Le +4

  29. cs.SE 2026-04-22 reviewed
    Hybrid detector finds 893k eliminable duplicate BDD steps

    Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

    Ali Hassaan Mughal +2

  30. cs.SE 2026-04-22 reviewed
    Security commit messages remain largely uninformative

    On the Informativeness of Security Commit Messages: A Large-scale Replication Study

    Syful Islam +1

  31. cs.SE 2026-04-22 reviewed
    Guardrails from requirements and models stabilize AI agents

    Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development -- Initial Findings

    Petrus Lipsanen +5

  32. cs.CL 2026-04-22 reviewed
    RL trains 7B model to build websites rivaling 671B LLMs

    WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

    Juyong Jiang +6

  33. cs.SE 2026-04-22 reviewed
    Reasoning models forecast parallel code races without tool calls

    Learning Reasoning World Models for Parallel Code

    Gautam Singh +3

  34. cs.SE 2026-04-22 reviewed
    LLMs detect logging security issues at 13-52 percent accuracy

    Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs

    He Yang Yuan +5

  35. cs.SE 2026-04-22 reviewed
    Static tool catches invented symbols in LLM API migrations

    Hallucination Inspector: A Fact-Checking Judge for API Migration

    Marcos Tileria +3

  36. cs.CR 2026-04-22 reviewed
    LLM agents confirm 84% of Node.js taint vulnerabilities

    Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

    Ronghao Ni +2

  37. cs.LG 2026-04-22 reviewed
    Dual tasks test if LLMs grasp code execution flow

    The Path Not Taken: Duality in Reasoning about Program Execution

    Eshgin Hasanov +3

  38. cs.LG 2026-04-22 reviewed
    LLM absorbs long contexts into fixed parameters with causal sync

    Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

    Zhixin Zhang +4

  39. cs.LG 2026-04-22 reviewed
    Joint optimizations cut multi-agent edge latency by 62 percent at 200 agents

    A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

    Samaresh Kumar Singh +1

  40. quant-ph 2026-04-22 reviewed
    Nine quantum-HPC stacks share design patterns for unifying layers

    Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey

    Amir Shehata +24

  41. cs.SE 2026-04-21 reviewed
    Code snippets prove 20 percent more library calls executable

    FIKA: Expanding Dependency Reachability with Executability Guarantees

    Yogya Gamage +3

  42. cs.SE 2026-04-21 reviewed
    Review charts automation routes for quantum software and AI

    Automated Quantum Software and AI Engineering

    Nazanin Siavash +1

  43. cs.SE 2026-04-21 reviewed
    Platform uses containers and supervised AI chat for reproducible biomedical workflows

    Biomedical systems biology workflow orchestration and execution with PoSyMed

    Simon S\"uwer +5

  44. cs.SE 2026-04-21 reviewed
    AI Security PRs Introduce Recurring Flaws but Often Merge

    Insights into Security-Related AI-Generated Pull Requests

    Md Fazle Rabbi +3

  45. cs.SE 2026-04-21 reviewed
    Vision models turn GUI bug videos into replays 72% of the time

    ViBR: Automated Bug Replay from Video-based Reports using Vision-Language Models

    Sidong Feng +5

  46. cs.SE 2026-04-21 reviewed
    LLM GUI code compiles but rarely plays without errors

    PlayCoder: Making LLM-Generated GUI Code Playable

    Zhiyuan Peng +5

  47. cs.RO 2026-04-21 reviewed
    One open codebase trains vision-language-action models end-to-end

    VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

    Jean Mercat +7

  48. cs.SE 2026-04-21 reviewed
    Predictive autoscaler holds Node.js latency at 26 ms in ramps

    Predictive Autoscaling for Node.js on Kubernetes: Lower Latency, Right-Sized Capacity

    Ivan Tymoshenko +2

  49. cs.SE 2026-04-21 reviewed
    Reflection and planning lift theorem proving 22% with fixed LLM calls

    On Reasoning-Centric LLM-based Automated Theorem Proving

    Yican Sun +3

  50. cs.CR 2026-04-21 reviewed
    Fine-tuned LLMs raise XSS obfuscation match rate to 0.22

    Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

    Divyesh Gabbireddy +1