archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 17

cs.SE 2026-04-21 reviewed

Fuzzing finds bugs in deductive verifiers
Crash-free Deductive Verifiers

Wander Nauta +2
cs.CR 2026-04-21 reviewed

DynaHug catches malicious ML models by watching runtime behavior
Malicious ML Model Detection by Learning Dynamic Behaviors

Sarang Nambiar +2
cs.SE 2026-04-21 reviewed

Tool flags code-doc mismatches only when tests prove the mismatch
CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation

Tobias Kiecker +4
cs.SE 2026-04-21 reviewed

Framework maps stakeholder views to formal SysML v2 architecture
Towards Formalising Stakeholder Context using SysML v2

Matthew Harrison +4
cs.SE 2026-04-21 reviewed

EnergyTrackr flags energy spikes in Java commits
Systematic Detection of Energy Regression and Corresponding Code Patterns in Java Projects

Fran\c{c}ois Bechet +4
cs.AI 2026-04-21 reviewed

LLM agents reach only 35 percent CTF checkpoint completion
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

Ali Al-Kaswan +5
cs.SE 2026-04-21 reviewed

Mocking info from tests guides LLMs to better unit tests
Improving LLM-Driven Test Generation by Learning from Mocking Information

Jamie Lee +5
cs.SE 2026-04-21 reviewed

Simulated debugging boosts LLM bug fixes by 26% on Defects4J
DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

Linhao Wu +11
cs.HC 2026-04-21 reviewed

Four-layer workspace structures human-AI co-development of VA tools
BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications

Thilo Spinner +3
cs.SE 2026-04-21 reviewed

Iterative retriever lifts bug test generation rates by 20-32 percent
iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation

Junyi Wang +2
cs.SE 2026-04-21 reviewed

Large models sketch edits, small models apply them
Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

Chaozheng Wang +9
cs.SE 2026-04-21 reviewed

Empathic IDE matches standard tools on learning but helps more with errors
Towards More Empathic Programming Environments: An Experimental Empathic AI-Enhanced IDE

Justin Rainier Go +4
cs.SE 2026-04-21 reviewed

Mutations expose inconsistencies in 15% of Code LLM responses
MUCOCO: Automated Consistency Testing of Code LLMs

Chua Jin Chou +2
cs.SE 2026-04-21 reviewed

Multimodal AI spots GUI defects in multi-window mobile apps
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

Xinyao Zhang +7
cs.CR 2026-04-21 reviewed

Adversarial agents eliminate 79% of LLM defect candidates
Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

Abhinav Agarwal
cs.CR 2026-04-21 reviewed

Security is relative to project contracts
Security Is Relative: Training-Free Vulnerability Detection via Multi-Agent Behavioral Contract Synthesis

Yongchao Wang +1
cs.SE 2026-04-21 reviewed

Framework turns aerospace requirements into LTL at 85% precision
Automated LTL Specification Generation from Industrial Aerospace Requirements

Zhi Ma +7
cs.SE 2026-04-20 reviewed

SVGD seeds raise ADS safety violation rates
From Particles to Perils: SVGD-Based Hazardous Scenario Generation for Autonomous Driving Systems Testing

Linfeng Liang +3
cs.HC 2026-04-20 reviewed

Graph interface turns AI coding into branching explorations
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

Vassilios Exarhakos +2
cs.SE 2026-04-20 reviewed

AI-human loop cuts bug report labeling effort by 196%
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

Guoming Long +3
cs.SE 2026-04-20 reviewed

Structural checks raise EDA code success without runtime debugging
Structural Verification for Reliable EDA Code Generation without Tool-in-the-Loop Debugging

Dinithi Jayasuriya +4
cs.SE 2026-04-20 reviewed

Cutoff theorem bounds verification search for DSLTrans properties
Tractable Verification of Model Transformations: A Cutoff-Theorem Approach for DSLTrans

Levi Lucio
cs.SE 2026-04-20 reviewed

AI transformation methods lack systematic guidance on ML task derivation
From Business Problems to AI Solutions: Where Does Transformation Support Fail

Abir Trabelsi +3
cs.CR 2026-04-20 reviewed

Only 0.4% of Android apps match privacy policies to their logs
Do Privacy Policies Match with the Logs? An Empirical Study of Privacy Disclosure in Android Application Logs

Zhiyuan Chen +6
cs.SE 2026-04-20 reviewed

Sentence transformers filter SCA alerts to 89% F1
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts

Tam\'as Aladics +3
cs.SE 2026-04-20 reviewed

Direct TypeScript compiler parser speeds up large-repo indexing for AI agents
TypeScript Repository Indexing for Code Agent Retrieval

Junsong Pu +2
cs.SE 2026-04-20 reviewed

Agent builds playable web games from prompts where LLMs fail
OpenGame: Open Agentic Coding for Games

Yilei Jiang +10
cs.SE 2026-04-20 reviewed

AI software ecosystems show emergent failures from agent interactions
More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

Daniel Russo
cs.SE 2026-04-20 reviewed

Co-locating tests yields near-perfect AI code preservation
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

\'Eric Jacopin
cs.SE 2026-04-20 reviewed

AI bot PR frequency tied to lower CI/CD success rates
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows

Syed Muhammad Ashhar Shah (1) +5
cs.SE 2026-04-20 reviewed

Context composition causally shapes LLM failure explanation quality
From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

Julius Porbeck +4
cs.SE 2026-04-20 reviewed

Context composition causally shapes LLM bug explanation quality
From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

Julius Porbeck +4
cs.AI 2026-04-20 reviewed

Modular adapters beat fine-tuning on hard SQL queries
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL

Salmane Chafik +2
cs.SE 2026-04-20 reviewed

LLM pipeline formalizes specs into properties at 77.8% accuracy
Towards an Agentic LLM-based Approach to Requirement Formalization from Unstructured Specifications

Alberto Tagliaferro +3
cs.SE 2026-04-20 reviewed

WebCompass benchmark evaluates full web coding workflows for AI models
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Xinping Lei +18
cs.SE 2026-04-20 reviewed

Real execution replaces mental simulation in LLM coding
SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

Woojin Lee +1
cs.SE 2026-04-20 reviewed

SystemC prototypes remove false positives from embedded fuzzing
Stateful Embedded Fuzzing with Peripheral-Accurate SystemC Virtual Prototypes

Chiara Ghinami +4
cs.OS 2026-04-20 reviewed

Processes and pipes made lightweight for far memory accelerators
Proxics: an efficient programming model for far memory accelerators

Zikai Liu +5
cs.SE 2026-04-20 reviewed

Fairness-first design thinking embeds equity into software architecture
Fairness-First Design Thinking for Software Architecture

Iffat Fatima +2
cs.SE 2026-04-20 reviewed

7B model beats larger LLMs at code translation without parallel examples
CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

Shangyu Li +9
cs.SE 2026-04-20 reviewed

AI systems should treat choices as governed tuned variables
Statistical Software Engineering with Tuned Variables

Nimrod Busany
cs.SE 2026-04-20 reviewed

API sequence mining boosts library fuzz coverage by 8.54%
MASFuzzer: Fuzz Driver Generation and Adaptive Scheduling via Multidimensional API Sequences

Xingyu Liu +3
cs.SE 2026-04-20 reviewed

PTMs added late in projects and accumulate rather than replaced
When AI Models Become Dependencies: Studying the Evolution of Pre-Trained Model Reuse in Downstream Software Systems

Peerachai Banyongrakkul +4
cs.SE 2026-04-20 reviewed

Framework detects every GitHub abuse type above 89% accuracy
Weaponizing the Commons: A Taxonomy and Detection Framework of Abuse on GitHub

Yuli Cheng +5
cs.SE 2026-04-20 reviewed

Ten cache smells affect 89% of GitLab CI/CD projects
Cache-Related Smells in GitLab CI/CD: Comprehensive Catalog, Automated Detection, and Empirical Evidence

Francesco Urdih +2
cs.SE 2026-04-20 reviewed

Graph consensus layer must replace code as AI coding artifact
Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Tianfu Wang +7
cs.AI 2026-04-20 reviewed

Joint prompt and tool optimization raises agent success 5-20%
JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

Sandip Ghoshal +11
cs.SE 2026-04-20 reviewed

Video analysis lets AI grade diverse Scratch programs accurately
Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation

Donglin Li +3
cs.SE 2026-04-20 reviewed

Ground truth tests show debloaters remove needed code or keep extras
Revisiting Code Debloating with Ground Truth-based Evaluation

Muhammad Bilal +6
cs.SE 2026-04-20 reviewed

GLMTest raises branch accuracy to 50% by conditioning on code graphs
Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics

Khang Tran +3