pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 10

  1. cs.CR 2026-05-05 reviewed
    LLM method generates PoV tests showing feasible attacks in 55 percent of cases

    Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

    Shravya Kanchi +4

  2. cs.CR 2026-05-05 reviewed
    Staged tickets induce vulnerable code in coding agents at 53-86% rates

    MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

    Jonathan Steinberg +1

  3. cs.SE 2026-05-05 reviewed
    LLM linting outperforms rules for quantum programs

    Beyond Rules: LLM-Powered Linting for Quantum Programs

    Pietro Cassieri +4

  4. cs.LG 2026-05-05 reviewed
    ICU risk prediction improves as clinical pathways unfold

    From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

    Pasquale Ardimento +3

  5. cs.LG 2026-05-05 reviewed
    Clinical risk estimates improve with each new patient event

    From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

    Pasquale Ardimento +3

  6. cs.SE 2026-05-05 reviewed
    Dynamic knowledge base generates resilient Rust formal proofs

    KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

    Yuwei Liu +5

  7. cs.SE 2026-05-05 reviewed
    AI Advocates catalyze squad shift to human-AI hybrid teams

    AI Advocate: Educational Path to Transform Squads to the Future

    Carla Soares +5

  8. cs.CR 2026-05-05 reviewed
    Public firmware for crypto miners reveals exploitable flaws

    Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners

    Pierre Pouliquen +4

  9. cs.CR 2026-05-05 reviewed
    Public firmware reveals remote attack paths in most ASIC miners

    Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners

    Pierre Pouliquen +4

  10. cs.DC 2026-05-05 reviewed
    HPC workflows pause for human input without idling compute resources

    A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments

    Sergio Mendoza +7

  11. cs.SE 2026-05-05 reviewed
    Graph features fused into LLM layers lift code generation scores

    Deep Graph-Language Fusion for Structure-Aware Code Generation

    Mert Tiftikci +2

  12. cs.SE 2026-05-05 reviewed
    Better-connected U.S

    Geographic Variation in Stack Overflow Code Quality: Evidence from a Cross-Regional Study of Coding Practices

    Elijah Zolduoarrati +2

  13. physics.soc-ph 2026-05-05 reviewed
    Commit time series alpha flags software stability

    Long-Range Correlation in Code Commit Dynamics as a Novel Indicator of Software Product Stability: A Detrended Fluctuation Analysis Study

    Goran Mitevski

  14. cs.SE 2026-05-05 reviewed
    No language model fully rebuilds any program from scratch

    ProgramBench: Can Language Models Rebuild Programs From Scratch?

    John Yang +11

  15. cs.SE 2026-05-05 reviewed
    LLM agents use tree search to find root causes in microservices

    Multi-Agent Systems for Root Cause Analysis in Microservices

    Alexander Naakka +2

  16. cs.CR 2026-05-05 reviewed
    Zorya detects seven bugs in gc Go binaries

    From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries

    Karolina Gorna +4

  17. cs.CR 2026-05-05 reviewed
    Zorya detects seven bugs in real-world Go binaries

    From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries

    Karolina Gorna +4

  18. cs.SE 2026-05-05 reviewed
    AI models recover semantics from legacy database code

    Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

    Christian Mancas +1

  19. cs.CR 2026-05-05 reviewed
    Provenance graph auditing cuts LLM agent injection success to 3.8%

    ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

    Shihao Weng +5

  20. cs.SE 2026-05-05 reviewed
    Benchmark shows LLMs miss complete postconditions on real code

    POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

    Gehao Zhang +1

  21. cs.CR 2026-05-05 reviewed
    Three cryptographic layers block dependency confusion attacks

    Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems

    Alan L. McCann

  22. cs.SE 2026-05-05 reviewed
    Procedure turns abstract SE theories into testable hypotheses

    Operationalizing Software Engineering Theories for Practical Validation

    Isaque Alves +3

  23. cs.SE 2026-05-05 reviewed
    Sustainable scientific software shows higher test coverage

    Exploring Sustainability in Scientific Software through Code Quality & Test Coverage Metrics

    Sheikh Md. Mushfiqur Rahman +2

  24. cs.SE 2026-05-05 reviewed
    Semantic matching raises project assignment quality to 0.74 cosine similarity

    TeamUp: Semantic Project Matching and Team Formation for Learning at Scale

    Dhruv Gulwani +3

  25. cs.SE 2026-05-04 reviewed
    YAML descriptions cut LLM tool context 142 times

    DADL: A Declarative Description Language for Enterprise Tool Libraries in LLM Agent Systems

    Axel Dunkel

  26. cs.SE 2026-05-04 reviewed
    Kerncap turns full AMD GPU apps into isolated kernel reproducers in one command

    Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

    Cole Ramos +1

  27. cs.SE 2026-05-04 reviewed
    Kerncap extracts isolated kernels from 30 GB AMD GPU apps in one command

    Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

    Cole Ramos +1

  28. cs.AI 2026-05-04 reviewed
    4B model matches frontier LLMs at terminal execution

    Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

    Spandan Garg +2

  29. cs.CR 2026-05-04 reviewed
    Five LLMs label 1,554 prompts as executable malicious code requests

    A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

    Richard J. Young +1

  30. cs.AI 2026-05-04 reviewed
    Few traces validate complex agent behavior accurately

    Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

    Reshabh K Sharma +2

  31. cs.SE 2026-05-04 reviewed
    Data-flow graph lifts agent repair success 4.7 points

    ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair

    Shahd Seddik +1

  32. cs.SE 2026-05-04 reviewed
    ARIS pairs executor with cross-family reviewer to verify research claims

    ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

    Ruofeng Yang +2

  33. cs.CR 2026-05-04 reviewed
    Knowledge graph lets AI generate real DeFi exploits at 96 percent success

    EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

    Ruichao Liang +7

  34. cs.AI 2026-05-04 reviewed
    Stabilized distillation makes compact models reliable for cross-language code clones

    Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

    Mohamad Khajezade +2

  35. cs.DC 2026-05-04 reviewed
    Workflow templates speed sensor app prototyping for non-experts

    From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

    Komal Thareja +2

  36. cs.DC 2026-05-04 reviewed
    AI reuses sensor workflow template to cut dev time to 1-2 days

    (POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

    Komal Thareja +2

  37. cs.AI 2026-05-04 reviewed
    Tunable rules for human-AI tasks cut fatigue while raising output

    HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

    Vicente Pelechano +3

  38. cs.AI 2026-05-04 reviewed
    Tighter governance lifts manufacturing output and cuts fatigue

    HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

    Vicente Pelechano +3

  39. cs.SE 2026-05-04 reviewed
    AI code volume predicts structural decay almost perfectly

    AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

    Yuecai Zhu +2

  40. cs.SE 2026-05-04 reviewed
    Schema compiler lifts small LLMs to 84% tool accuracy at scale

    TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

    Furkan Sakizli

  41. cs.SE 2026-05-04 reviewed
    Structured specs let LLMs build whole repositories

    LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering

    Shuzhao Feng +3

  42. cs.SE 2026-05-04 reviewed
    Causal models replace correlations for software decisions

    Causal Software Engineering: A Vision and Roadmap

    Roberto Pietrantuono +4

  43. cs.SE 2026-05-04 reviewed
    Blackboard MCTS lifts LLM Pass@1 on contest programming benchmarks

    ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation

    Minnan Wei +3

  44. cs.SE 2026-05-04 reviewed
    Symbolic index gives LLMs zero-defect view of large codebases

    AOCI: Symbolic-Semantic Indexing for Practical Repository-Scale Code Understanding with LLMs

    Jinshi Liu +5

  45. cs.RO 2026-05-04 reviewed
    LLM tool helps map uncertainties in self-adaptive robots

    Human-in-the-Loop Uncertainty Analysis in Self-Adaptive Robots Using LLMs

    Hassan Sartaj +4

  46. cs.SE 2026-05-04 reviewed
    MDE user models found disconnected and mostly static

    A Low-Code Approach for the Automatic Personalization of Conversational Agents

    Aaron Conrardy +2

  47. cs.SE 2026-05-04 reviewed
    AI pull requests mostly get AI reviews or none

    These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests

    Kacper Duma (1) +6

  48. cs.SE 2026-05-04 reviewed
    63,533-commit benchmark aids AI for commit messages

    CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

    Zirui Wan +5

  49. cs.SE 2026-05-04 reviewed
    Triadic data unlocks long-horizon work for engineering agents

    The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents

    Yelin Kim

  50. cs.SE 2026-05-04 reviewed
    LLM repair models drop over 50% on minor code tweaks

    HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

    Fazle Rabbi +1