archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 13

quant-ph 2026-04-29 reviewed

Quantum circuits cover conditions well but paths poorly
Probabilistic Condition, Decision and Path Coverage of Circuit-based Quantum Programs

Daniel Fortunato +2
cs.AI 2026-04-29 reviewed

MoE models match human graders on math rubrics where 70B model fails
Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics

Jatin Bhusal +3
cs.SE 2026-04-29 reviewed

Seven recommendations guide LLM adoption in software teams
Recommendations for Efficient and Responsible LLM Adoption within Industrial Software Development

Krishna Ronanki +5
cs.SE 2026-04-29 reviewed

Pipeline builds consistent graphs from C
Graph Construction and Matching for Imperative Programs using Neural and Structural Methods

Arshad Beg +2
cs.SE 2026-04-29 reviewed

Pipeline builds consistent graphs from C
Graph Construction and Matching for Imperative Programs using Neural and Structural Methods

Arshad Beg +2
cs.SE 2026-04-29 reviewed

Natural language scenarios generate higher-coverage tests than BDD
PICKLES: a Natural Language Framework for Requirement Specification and Model-Based Testing

Mar\'ia Bel\'en Rodr\'iguez +1
cs.SE 2026-04-29 reviewed

Solidity semantic clones detected with 97% recall via code and comments
Identifying and Characterizing Semantic Clones of Solidity Functions

Ermanno Francesco Sannini +6
cs.SE 2026-04-29 reviewed

Knowledge graph drives 3x faster documentation with 85% fewer tokens
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates

Dong Xu +4
cs.SE 2026-04-29 reviewed

Speculative decoding speeds up SE tasks more for small models
An Empirical Study of Speculative Decoding on Software Engineering Tasks

Yijia Li +3
cs.SE 2026-04-29 reviewed

LLMs vary widely in screening papers for software SLRs
Beyond Accuracy: LLM Variability in Evidence Screening for Software Engineering SLRs

Gilberto Sussumu Hida +2
cs.SE 2026-04-29 reviewed

Swarm optimizer cuts vehicle offload response times
Towards Intelligent Computation Offloading in Dynamic Vehicular Networks: A Scalable Multilayer Pipeline

Falk Dettinger +5
cs.SE 2026-04-29 reviewed

Asset shells keep OCL constraints inside MBSE models
Asset Administration Shell-Based OCL Validation Framework for Model-Based System Engineering

Om Parkash +4
cs.SE 2026-04-29 reviewed

Software engineering shifts from code generation to AI delegation
Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

Happy Bhati
cs.CR 2026-04-29 reviewed

Only 23% of LLM-generated Rust crypto code compiles
An Empirical Security Evaluation of LLM-Generated Cryptographic Rust Code

Mohamed Elsayed +2
cs.SE 2026-04-29 reviewed

Survey finds disconnect between program structure and adaptive security tests
Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

Michael Wienczkowski
cs.SE 2026-04-29 reviewed

Review shows LLMs automate data tasks in software engineering studies
LLM-Assisted Empirical Software Engineering: Systematic Literature Review and Research Agenda

Victoria Gomes +4
cs.SE 2026-04-28 reviewed

LLM observability layers mature but integration lags
AI Observability for Large Language Model Systems: A Multi-Layer Analysis of Monitoring Approaches from Confidence Calibration to Infrastructure Tracing

Twinkll Sisodia
cs.SE 2026-04-28 reviewed

LLM pipeline lifts bug report completeness from 8% to 96%
ImproBR: Bug Report Improver Using LLMs

Emre Furkan Akyol +2
cs.SE 2026-04-28 reviewed

Multi-view training detects AI code on unseen languages at 0.845 F1
UCSC-NLP at SemEval-2026 Task 13: Multi-View Generalization and Diagnostic Analysis of Machine-Generated Code Detection

Kargi Chauhan +1
cs.SE 2026-04-28 reviewed

LLM turns uncovered code into valid bug reports at 85 percent rate
LLM-Guided Issue Generation from Uncovered Code Segments

Diany Pressato +3
cs.SE 2026-04-28 reviewed

LLM tool turns uncovered code into prioritized bug reports
LLM-Guided Issue Generation from Uncovered Code Segments

Diany Pressato +3
cs.SE 2026-04-28 reviewed

Splitting code viewing from editing raises agent success 2.1% at 17.9% lower cost
SWE-Edit: Rethinking Code Editing for Efficient SWE-Agent

Yikai Zhang +11
cs.CR 2026-04-28 reviewed

GenDetect turns a single observed DeFi attack into reusable detection rules by…
GenDetect: Generalizing Reactive Detection for Resilience Against Imitative DeFi Attack Cascade

Bowen Cai +6
cs.SE 2026-04-28 reviewed

Carbon-tax ordering cuts LLM memory by up to 49x
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Ajmain Inqiad Alam +4
cs.SE 2026-04-28 reviewed

Multi-LLM pipeline extracts 734 trajectories from GitHub issues
From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions

Nazia Shehnaz Joynab +1
cs.SE 2026-04-28 reviewed

LLM REST tests lose effectiveness on faulty code and vague specs
RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

Leon Kogler +5
cs.CL 2026-04-28 reviewed

Evolved harnesses raise coding-agent pass@1 from 69.7% to 77%
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Jiahang Lin +10
cs.CL 2026-04-28 reviewed

Ten AHE iterations lift coding-agent pass@1 to 77%
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Jiahang Lin +10
cs.SE 2026-04-28 reviewed

RSEs form a collective identity that shapes their wellbeing
Does social identity matter in software engineering? Assessing the case of research software engineers

Chukwudi Uwasomba +7
cs.SE 2026-04-28 reviewed

Developer roles drive microservices coupling more than architecture
Key Developer Roles and Organizational Coupling in Microservices: A Longitudinal Analysis

Xiaozhou Li +3
cs.SE 2026-04-28 reviewed

Code metrics match plagiarism tools in ranking performance
Can Code Evaluation Metrics Detect Code Plagiarism?

Fahad Ebrahim +1
cs.SE 2026-04-28 reviewed

Scenarios compose into online tests for robot systems
Scenario-based System Testing for Distributed Robotics Applications

Jan Peleska +3
cs.SE 2026-04-28 reviewed

Multi-agent editing lifts code success to 68.6 percent
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?

Noam Tarshish +6
cs.SE 2026-04-28 reviewed

Code-comment alignment lifts F1 scores by up to 27% in vulnerability detection
Learning Generalizable Multimodal Representations for Software Vulnerability Detection

Zeming Dong +7
cs.SE 2026-04-28 reviewed

Classical ML beats transformers for bug report fault localization
Bug-Report-Driven Fault Localization: Industrial Benchmarking and Lesson Learned at ABB Robotics

Pernilla Hall +3
cs.SE 2026-04-28 reviewed

Bug report text trains models to find faults in robotics code
Bug-Report-Driven Fault Localization: Industrial Benchmarking and Lesson Learned at ABB Robotics

Pernilla Hall +3
cs.SE 2026-04-28 reviewed

GPT tools draft spreadsheet models but fail to reproduce them consistently
Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task

Thomas A. Grossman +2
cs.SE 2026-04-28 reviewed

LLMs generate Given-When-Then tests for FMU simulations
Using Large Language Models for Black-Box Testing of FMU-Based Simulations

Abdullah Mughees +5
cs.SE 2026-04-28 reviewed

PLM choice outweighs GNN backbone in code hybrid models
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Mohamed Taoufik Kaouthar El Idrissi +2
cs.SE 2026-04-28 reviewed

12,000 tests quantify energy costs of mobile settings
An Empirical Analysis of Mobile Energy Consumption Across User Configurations

Wellington Oliveira
cs.SE 2026-04-28 reviewed

MBSE models must be co-designed as AI-queryable knowledge bases
AI as Consumer and Participant: A Co-Design Agenda for MBSE Substrates and Methodology

Siyuan Ji
cs.SE 2026-04-28 reviewed

MLLMs suggest ranked usability fixes from videos
Recommending Usability Improvements with Multimodal Large Language Models

Sebastian Lubos +4
cs.SE 2026-04-28 reviewed

LLMs inconsistent on equivalent code versions
CoRE: A Fine-Grained Code Reasoning Benchmark Beyond Output Prediction

Jun Gao +8
cs.SE 2026-04-28 reviewed

Commit structure lifts test prioritization in CI
Commit-Aware Learning-Based Test Case Prioritization for Continuous Integration

Lorenzo Abbondante +1
cs.SE 2026-04-28 reviewed

R³-SQL reaches 75.03 accuracy on BIRD-dev for Text-to-SQL
R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL

Hojae Han +4
cs.DB 2026-04-28 reviewed

VisualNeo connects visual queries to Neo4j for graph searches
VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

Kai Huang +7
cs.CR 2026-04-28 reviewed

MARD is a multi-agent system that uses large language models to detect Android malware by…
MARD: A Multi-Agent Framework for Robust Android Malware Detection

Xueying Zeng +6
cs.LG 2026-04-28 reviewed

DiRe preserves 3-4 times more topology than UMAP at equal speed
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale

Alexander Kolpakov +1
cs.CR 2026-04-28 reviewed

Conformance checking runs on homomorphically encrypted logs
Secure Conformance Checking using Token-based Replay and Homomorphic Encryption

Luis-Armando Rodr\'iguez-Flores +3
cs.CR 2026-04-28 reviewed

Four agents turn incomplete Rust CVEs into analyzable tests
Symbolic Execution Meets Multi-LLM Orchestration: Detecting Memory Vulnerabilities in Incomplete Rust CVE Snippets

Zeyad Abdelrazek +1