pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 6

  1. cs.CR 2026-05-11 reviewed
    Fuzzer finds 15 vulnerabilities in LLM serving engines

    Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing

    Yunze Zhao +4

  2. cs.PL 2026-05-11 reviewed
    Symbolic analysis quantifies fraction of inputs changed by patches

    Quantitative Symbolic Patch Impact Analysis

    Laboni Sarker +2

  3. cs.LG 2026-05-11 reviewed
    DMI-Lib cuts LLM internal observability overhead to 0.4-6.8 percent

    Enabling Performant and Flexible Model-Internal Observability for LLM Inference

    Nengneng Yu +4

  4. cs.SE 2026-05-11 reviewed
    Code editor plugin logs student sessions for education datasets

    Using Logs to support Programming Education

    Gilmar Gomes do Nascimento +3

  5. cs.AI 2026-05-11 reviewed
    Git-like trace lets meta-agents fork past states 5x faster than Docker

    Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

    Simon Yu +6

  6. cs.SE 2026-05-11 reviewed
    Pipeline builds dataset of 347 real C++ performance patches

    CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

    Tommy Ho +2

  7. cs.AI 2026-05-11 reviewed
    Benchmark shows CAD models miss fine details and complex operations

    BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

    Haozhe Zhang +6

  8. cs.AI 2026-05-11 reviewed
    BenchCAD benchmark shows AI simplifies complex CAD designs

    BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

    Haozhe Zhang +6

  9. cs.HC 2026-05-11 reviewed
    StartFlow helps non-experts build clearer startup prototypes

    StartFlow: From Method Conception to Multi-Perspective Evaluation in UX Prototyping for Software Startups

    Guilherme Corredato Guerino +3

  10. cs.AI 2026-05-11 reviewed
    LLM agents top out below 60% success in complex tool sandboxes

    ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

    Yuanyang Li +4

  11. cs.AI 2026-05-11 reviewed
    LLM agents top out below 60% on complex tool tasks

    ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

    Yuanyang Li +4

  12. quant-ph 2026-05-11 reviewed
    Unitaria composes quantum block encodings like NumPy arrays

    Unitaria: Quantum Linear Algebra via Block Encodings

    Matthias Deiml +4

  13. cs.SE 2026-05-11 reviewed
    AutoSOUP generates unit proofs for component memory safety via LLM hybrid

    AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

    Paschal C. Amusuo +7

  14. cs.SE 2026-05-11 reviewed
    AI leaves all problem-solving behaviors intact in code extension tasks

    ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code

    Norman Anderson +9

  15. cs.LG 2026-05-11 reviewed
    Masking bad steps inside failed runs lifts agent resolution 3.7 percent

    Step Rejection Fine-Tuning: A Practical Distillation Recipe

    Igor Slinko +3

  16. cs.SE 2026-05-11 reviewed
    Autoencoder context compression fails on multi-step coding agents

    On Problems of Implicit Context Compression for Software Engineering Agents

    Kirill Gelvan +5

  17. cs.SE 2026-05-11 reviewed
    New benchmark tests agents on cracking binaries from executables

    CrackMeBench: Binary Reverse Engineering for Agents

    Isaac David +1

  18. cs.AI 2026-05-11 reviewed
    LLARS unifies LLM prompt engineering

    LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

    Philipp Steigerwald +4

  19. cs.LO 2026-05-11 reviewed
    Logic prover turns G-code collisions into LLM correction signals

    Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

    Yeonseok Lee

  20. cs.LO 2026-05-11 reviewed
    Neuro-symbolic loop fixes G-code via spatial proof failures

    Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

    Yeonseok Lee

  21. cs.LO 2026-05-11 reviewed
    Neuro-symbolic system turns G-code collisions into bounding-box fixes

    Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

    Yeonseok Lee

  22. cs.LO 2026-05-11 reviewed
    Separation logic catches CNC collisions as spatial data races

    Separation Logic for Verifying Physical Collisions of CNC Programs

    Yeonseok Lee

  23. cs.LO 2026-05-11 reviewed
    Separation logic verifies CNC collisions as spatial data races

    Separation Logic for Verifying Physical Collisions of CNC Programs

    Yeonseok Lee

  24. cs.SE 2026-05-11 reviewed
    VLMs automate robot task oracles from video

    VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

    Prasun Saurabh +4

  25. cs.SE 2026-05-11 reviewed
    VISOR automates robot test oracles using vision-language models

    VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

    Prasun Saurabh +4

  26. cs.SE 2026-05-11 reviewed
    DREAMS tool cuts time for DRM model creation and revision

    DREAMS: Modelling Support for Research into Engineering and Artistic Design

    Apala Chakrabarti

  27. cs.AI 2026-05-11 reviewed
    Vision loop polishes LaTeX documents to publication standards

    PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

    Bihui Yu +8

  28. cs.SE 2026-05-11 reviewed
    ReXCL tool automates requirements extraction and classification

    Read, Extract, Classify: A Tool for Smarter Requirements Engineering

    Paheli Bhattacharya +3

  29. cs.SE 2026-05-11 reviewed
    Margin-aware geometry reduces distortions in imbalanced vulnerability detection

    MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection

    Yuteng Zhang +4

  30. cs.AI 2026-05-11 reviewed
    Tiered AI agent framework adapts review to risk and separates duties

    Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution

    Kai Pan +1

  31. cs.CR 2026-05-11 reviewed
    Usability demands trick LLMs into insecure code

    Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

    Yue Li +7

  32. cs.CR 2026-05-11 reviewed
    LLM agents discover 40 bugs in V8 JavaScript engine

    Agentic Fuzzing: Opportunities and Challenges

    Junyoung Park +1

  33. cs.NI 2026-05-11 reviewed
    Simulator models edge computing on optical networks

    GenioSim: A Novel Simulation Platform for Edge Computing over Optical Networks

    Carmine Cesarano +2

  34. cs.SE 2026-05-11 reviewed
    Config file structure has no effect on coding agent adherence

    Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables

    Damon McMillan

  35. cs.PL 2026-05-11 reviewed
    Move prover checks first-class functions with state changes

    Formal Verification of Imperative First-Class Functions in Move

    Wolfgang Grieskamp +3

  36. cs.PL 2026-05-11 reviewed
    Move Prover verifies first-class imperative functions

    Formal Verification of Imperative First-Class Functions in Move

    Wolfgang Grieskamp +3

  37. cs.PL 2026-05-11 reviewed
    Hybrid analysis and AI infers Move specifications

    Combining Mechanical and Agentic Specification Inference for Move

    Wolfgang Grieskamp +2

  38. cs.PL 2026-05-11 reviewed
    Tool pairs weakest-precondition analysis with AI to infer Move specs

    Combining Mechanical and Agentic Specification Inference for Move

    Wolfgang Grieskamp +2

  39. cs.SI 2026-05-11 reviewed
    Iterative prompting beats single-pass for complex graph tasks

    GraphInstruct: A Progressive Benchmark for Diagnosing Capability Gaps in LLM Graph Generation

    Zihe Wei +3

  40. cs.SI 2026-05-11 reviewed
    Benchmark finds LLM graph failures peak at multi-constraint tasks

    GraphInstruct: A Progressive Benchmark for Diagnosing Capability Gaps in LLM Graph Generation

    Zihe Wei +3

  41. cs.LG 2026-05-11 reviewed
    LLMs recover from flawed partial reasoning only 29% of the time

    TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications

    Pranshav Gajjar +2

  42. cs.SE 2026-05-11 reviewed
    Deterministic orchestration matches LLM accuracy with 3.5x lower costs

    Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization

    Naing Oo Lwin +1

  43. cs.SE 2026-05-10 reviewed
    Cloning duplicates many agent tools in public marketplaces

    Evaluating Tool Cloning in Agentic-AI Ecosystems

    Taein Kim +4

  44. cs.SE 2026-05-10 reviewed
    Cloning duplicates 60-85% of high-similarity tool pairs in agent ecosystems

    Evaluating Tool Cloning in Agentic-AI Ecosystems

    Taein Kim +4

  45. cs.SE 2026-05-10 reviewed
    Shared contract makes agent benchmark gate change controller choice

    An Executable Benchmarking Suite for Tool-Using Agents

    Zhiqing Zhong +3

  46. cs.SE 2026-05-10 reviewed
    GenAI turns software engineering from code writing to intent oversight

    From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

    Elyson De La Cruz

  47. cs.SE 2026-05-10 reviewed
    Trajectory context lifts tool accuracy from 39% to 57%

    Trajectory Supervision for Continual Tool-Use Learning in LLMs

    Vishnu Vardhan Reddy +2

  48. cs.LG 2026-05-10 reviewed
    Pre-execution rubrics lift tool agents to 0.86 accuracy

    RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

    Will LeVine +3

  49. cs.LG 2026-05-10 reviewed
    Pre-execution rubrics lift tool-use success to 0.86 average

    RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

    Will LeVine +3

  50. cs.LG 2026-05-10 reviewed
    Pre-execution rubrics lift tool agent reliability to 0.86

    RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

    Will LeVine +3