archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 6

cs.CR 2026-05-11 reviewed

Fuzzer finds 15 vulnerabilities in LLM serving engines
Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing

Yunze Zhao +4
cs.PL 2026-05-11 reviewed

Symbolic analysis quantifies fraction of inputs changed by patches
Quantitative Symbolic Patch Impact Analysis

Laboni Sarker +2
cs.LG 2026-05-11 reviewed

DMI-Lib cuts LLM internal observability overhead to 0.4-6.8 percent
Enabling Performant and Flexible Model-Internal Observability for LLM Inference

Nengneng Yu +4
cs.SE 2026-05-11 reviewed

Code editor plugin logs student sessions for education datasets
Using Logs to support Programming Education

Gilmar Gomes do Nascimento +3
cs.AI 2026-05-11 reviewed

Git-like trace lets meta-agents fork past states 5x faster than Docker
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

Simon Yu +6
cs.SE 2026-05-11 reviewed

Pipeline builds dataset of 347 real C++ performance patches
CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

Tommy Ho +2
cs.AI 2026-05-11 reviewed

Benchmark shows CAD models miss fine details and complex operations
BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Haozhe Zhang +6
cs.AI 2026-05-11 reviewed

BenchCAD benchmark shows AI simplifies complex CAD designs
BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Haozhe Zhang +6
cs.HC 2026-05-11 reviewed

StartFlow helps non-experts build clearer startup prototypes
StartFlow: From Method Conception to Multi-Perspective Evaluation in UX Prototyping for Software Startups

Guilherme Corredato Guerino +3
cs.AI 2026-05-11 reviewed

LLM agents top out below 60% success in complex tool sandboxes
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

Yuanyang Li +4
cs.AI 2026-05-11 reviewed

LLM agents top out below 60% on complex tool tasks
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

Yuanyang Li +4
quant-ph 2026-05-11 reviewed

Unitaria composes quantum block encodings like NumPy arrays
Unitaria: Quantum Linear Algebra via Block Encodings

Matthias Deiml +4
cs.SE 2026-05-11 reviewed

AutoSOUP generates unit proofs for component memory safety via LLM hybrid
AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

Paschal C. Amusuo +7
cs.SE 2026-05-11 reviewed

AI leaves all problem-solving behaviors intact in code extension tasks
ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code

Norman Anderson +9
cs.LG 2026-05-11 reviewed

Masking bad steps inside failed runs lifts agent resolution 3.7 percent
Step Rejection Fine-Tuning: A Practical Distillation Recipe

Igor Slinko +3
cs.SE 2026-05-11 reviewed

Autoencoder context compression fails on multi-step coding agents
On Problems of Implicit Context Compression for Software Engineering Agents

Kirill Gelvan +5
cs.SE 2026-05-11 reviewed

New benchmark tests agents on cracking binaries from executables
CrackMeBench: Binary Reverse Engineering for Agents

Isaac David +1
cs.AI 2026-05-11 reviewed

LLARS unifies LLM prompt engineering
LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

Philipp Steigerwald +4
cs.LO 2026-05-11 reviewed

Logic prover turns G-code collisions into LLM correction signals
Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

Yeonseok Lee
cs.LO 2026-05-11 reviewed

Neuro-symbolic loop fixes G-code via spatial proof failures
Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

Yeonseok Lee
cs.LO 2026-05-11 reviewed

Neuro-symbolic system turns G-code collisions into bounding-box fixes
Correct-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logic

Yeonseok Lee
cs.LO 2026-05-11 reviewed

Separation logic catches CNC collisions as spatial data races
Separation Logic for Verifying Physical Collisions of CNC Programs

Yeonseok Lee
cs.LO 2026-05-11 reviewed

Separation logic verifies CNC collisions as spatial data races
Separation Logic for Verifying Physical Collisions of CNC Programs

Yeonseok Lee
cs.SE 2026-05-11 reviewed

VLMs automate robot task oracles from video
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

Prasun Saurabh +4
cs.SE 2026-05-11 reviewed

VISOR automates robot test oracles using vision-language models
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

Prasun Saurabh +4
cs.SE 2026-05-11 reviewed

DREAMS tool cuts time for DRM model creation and revision
DREAMS: Modelling Support for Research into Engineering and Artistic Design

Apala Chakrabarti
cs.AI 2026-05-11 reviewed

Vision loop polishes LaTeX documents to publication standards
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

Bihui Yu +8
cs.SE 2026-05-11 reviewed

ReXCL tool automates requirements extraction and classification
Read, Extract, Classify: A Tool for Smarter Requirements Engineering

Paheli Bhattacharya +3
cs.SE 2026-05-11 reviewed

Margin-aware geometry reduces distortions in imbalanced vulnerability detection
MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection

Yuteng Zhang +4
cs.AI 2026-05-11 reviewed

Tiered AI agent framework adapts review to risk and separates duties
Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution

Kai Pan +1
cs.CR 2026-05-11 reviewed

Usability demands trick LLMs into insecure code
Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Yue Li +7
cs.CR 2026-05-11 reviewed

LLM agents discover 40 bugs in V8 JavaScript engine
Agentic Fuzzing: Opportunities and Challenges

Junyoung Park +1
cs.NI 2026-05-11 reviewed

Simulator models edge computing on optical networks
GenioSim: A Novel Simulation Platform for Edge Computing over Optical Networks

Carmine Cesarano +2
cs.SE 2026-05-11 reviewed

Config file structure has no effect on coding agent adherence
Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables

Damon McMillan
cs.PL 2026-05-11 reviewed

Move prover checks first-class functions with state changes
Formal Verification of Imperative First-Class Functions in Move

Wolfgang Grieskamp +3
cs.PL 2026-05-11 reviewed

Move Prover verifies first-class imperative functions
Formal Verification of Imperative First-Class Functions in Move

Wolfgang Grieskamp +3
cs.PL 2026-05-11 reviewed

Hybrid analysis and AI infers Move specifications
Combining Mechanical and Agentic Specification Inference for Move

Wolfgang Grieskamp +2
cs.PL 2026-05-11 reviewed

Tool pairs weakest-precondition analysis with AI to infer Move specs
Combining Mechanical and Agentic Specification Inference for Move

Wolfgang Grieskamp +2
cs.SI 2026-05-11 reviewed

Iterative prompting beats single-pass for complex graph tasks
GraphInstruct: A Progressive Benchmark for Diagnosing Capability Gaps in LLM Graph Generation

Zihe Wei +3
cs.SI 2026-05-11 reviewed

Benchmark finds LLM graph failures peak at multi-constraint tasks
GraphInstruct: A Progressive Benchmark for Diagnosing Capability Gaps in LLM Graph Generation

Zihe Wei +3
cs.LG 2026-05-11 reviewed

LLMs recover from flawed partial reasoning only 29% of the time
TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications

Pranshav Gajjar +2
cs.SE 2026-05-11 reviewed

Deterministic orchestration matches LLM accuracy with 3.5x lower costs
Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization

Naing Oo Lwin +1
cs.SE 2026-05-10 reviewed

Cloning duplicates many agent tools in public marketplaces
Evaluating Tool Cloning in Agentic-AI Ecosystems

Taein Kim +4
cs.SE 2026-05-10 reviewed

Cloning duplicates 60-85% of high-similarity tool pairs in agent ecosystems
Evaluating Tool Cloning in Agentic-AI Ecosystems

Taein Kim +4
cs.SE 2026-05-10 reviewed

Shared contract makes agent benchmark gate change controller choice
An Executable Benchmarking Suite for Tool-Using Agents

Zhiqing Zhong +3
cs.SE 2026-05-10 reviewed

GenAI turns software engineering from code writing to intent oversight
From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

Elyson De La Cruz
cs.SE 2026-05-10 reviewed

Trajectory context lifts tool accuracy from 39% to 57%
Trajectory Supervision for Continual Tool-Use Learning in LLMs

Vishnu Vardhan Reddy +2
cs.LG 2026-05-10 reviewed

Pre-execution rubrics lift tool agents to 0.86 accuracy
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

Will LeVine +3
cs.LG 2026-05-10 reviewed

Pre-execution rubrics lift tool-use success to 0.86 average
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

Will LeVine +3
cs.LG 2026-05-10 reviewed

Pre-execution rubrics lift tool agent reliability to 0.86
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

Will LeVine +3