archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 15

cs.SE 2026-04-25 reviewed

UniAda attack fools self-driving cars on both steering and speed
UniAda: Universal Adaptive Multi-objective Adversarial Attack for End-to-End Autonomous Driving Systems

Jingyu Zhang +5
cs.SE 2026-04-25 reviewed

Local LLMs spot 43-45% of Python bugs in real projects
An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code

Jelena Ili\'c Vuli\'cevi\'c
cs.SE 2026-04-25 reviewed

Study evaluates 15 test metrics across 1640 DL scenarios
Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts

Jingyu Zhang +5
cs.SE 2026-04-25 reviewed

LLMs match only 0-60% of real open-source commits
Can LLMs be Effective Code Contributors? A Study on Open-source Projects

Chun Jie Chong +3
cs.NI 2026-04-25 reviewed

Agents map 6G intents to pre-validated services via TMF graphs
Towards Agentic Test-Driven Quality Assurance for 6G Networks

Christos Tranoris +3
cs.SE 2026-04-25 reviewed

Knowledge levers raise software project capital by 63.8%
Knowledge Lever Risk Management for Software Engineering: A Stochastic Framework for Mitigating Knowledge Loss

Mark Chua +1
cs.SE 2026-04-25 reviewed

AI reviewer in GitHub PRs sustains 33% follow-up activity
AI-Assisted Code Review as a Scaffold for Code Quality and Self-Regulated Learning: An Experience Report

Eduardo Oliveira +4
cs.SE 2026-04-25 reviewed

Layered procedures and accountability underpin effective fintech ISMS
Operationalising Information Security Management: A Procedural Framework Analysis of ISO/IEC 27001:2022 Implementation in a Financial-Technology Organisation

Ratul Ali
cs.SE 2026-04-25 reviewed

Framework sets up any code repository automatically
RAT: RunAnyThing via Fully Automated Environment Configuration

Renhong Huang +6
cs.NI 2026-04-25 reviewed

RANalyzer ties performance drops to specific code changes
RANalyzer: Automated Continuous RAN Software Evaluation and Regression Analysis

Ravis Shirkhani +4
cs.SE 2026-04-25 reviewed

Argumentation resolves multi-agent requirements conflicts with traceability
ArgRE: Formal Argumentation for Conflict Resolution in Multi-Agent Requirements Negotiation

Haowei Cheng +8
cs.SE 2026-04-25 reviewed

Seven modeling challenges limit iFogSim for complex IoT setups
Source-Code Analysis of iFogSim for Simulating Distributed IoT Architectures: Coverage, Challenges, and Enhancements

Milliam Maxime Zekeng Ndadji
cs.SE 2026-04-25 reviewed

New framework generates scientific code without test cases
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows

Siddeshwar Raghavan +1
cs.SE 2026-04-25 reviewed

Parallel agents produce readable code quality feedback
Code Broker: A Multi-Agent System for Automated Code Quality Assessment

Samer Attrah
cs.CY 2026-04-24 reviewed

Frontier AI firms should disclose internal deployment details
What Should Frontier AI Developers Disclose About Internal Deployments?

Jacob Charnock +3
cs.SE 2026-04-24 reviewed

Testing documentation correlates with higher test engagement in OSS pull requests
The Impact of Documentation on Test Engagement in Pull Requests in OSS

Teal Amore +2
cs.CV 2026-04-24 reviewed

Smartphone photos detect anemia at 96 percent accuracy
AnemiaVision: Non-Invasive Anemia Detection via Smartphone Imagery Using EfficientNet-B3 with TrivialAugmentWide, Mixup Augmentation, and Persistent Patient History Management

Rahul Patel
cs.CL 2026-04-24 reviewed

Coding agents burn 1000x more tokens than chats or reasoning
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Longju Bai +7
cs.SE 2026-04-24 reviewed

Vibe coding hackathon lets all skill levels build apps with AI only
Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

Ashley J. Chen (1) +7
cs.SE 2026-04-24 reviewed

Binary analysis infers test equivalence classes from legacy firmware
Inferring Equivalence Classes from Legacy Undocumented Embedded Binaries for ISO 26262-Compliant Testing

Marco De Luca +4
cs.SE 2026-04-24 reviewed

RealBench shows LLMs lag at full repo code gen even with UML specs
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

Jia Li +11
cs.SE 2026-04-24 reviewed

Verifier warnings add no value to comprehensibility prediction models
Verifier Warnings Do Not Improve Comprehensibility Prediction

Nadeeshan De Silva +2
cs.SE 2026-04-24 reviewed

Selective mutation cuts deep learning mutants by over half
Quality-Driven Selective Mutation for Deep Learning

Zaheed Ahmed +3
cs.SE 2026-04-24 reviewed

LLMs turn natural language into Dafny-verified code at high success rates
From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

Md Erfan +3
cs.SE 2026-04-24 reviewed

ZF requirements show gaps in ROS 2 and AUTOSAR Adaptive
A Comparison of ROS 2 and AUTOSAR Adaptive Platform Against Industry-Elicited Automotive Middleware Requirements

Lucas Hegerath +8
cs.SE 2026-04-24 reviewed

Template breaks AI tests into goal
Test Design and Review Argumentation in AI-Assisted Test Generation

Eduard Paul Enoiu +1
cs.SE 2026-04-24 reviewed

Game points cut cross-service code contributions
Gamifying Architectural Governance to Reduce Organizational Coupling in Microservice Systems

Xiaozhou Li
cs.SE 2026-04-24 reviewed

LLM framework boosts traceability F1 by 7.4% with 41.7% fewer tokens
R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability

Yifei Wang +5
cs.SE 2026-04-24 reviewed

Modular refactor adds use case diagrams to gamified UML tool
Enhancing a gamified tool for UML modeling education

Giacomo Garaccione +2
cs.CR 2026-04-24 reviewed

Poisoning 10% of code data blocks unauthorized AI training
Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets

Yuan Xiao +10
cs.AR 2026-04-24 reviewed

Helpers from high-level features speed HLS verification up to 6x
AutoINV: Automated Invariant Generation Framework for Formal Verification on High-Level Synthesis Designs

Xiaofeng Zhou +5
cs.SE 2026-04-24 reviewed

Reflections boost RAG to 0.78 F1 for predicting SO code edits without training
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow

Mehedi Hasan Shanto +2
cs.SE 2026-04-24 reviewed

LLM feedback loop pulls low-level goals at 61% accuracy
Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations

Anna Arnaudo +6
cs.CR 2026-04-23 reviewed

Smart contracts lock fraud detection trails into blockchain
Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML

Zhaohui Wang
cs.SE 2026-04-23 reviewed

Ethics testing detects harms in generative AI outputs
Ethics Testing: Proactive Identification of Generative AI System Harms

Shin Hwei Tan +2
cs.SE 2026-04-23 reviewed

Pipeline rebuilds real crashes in CARLA with exact road maps
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation

Nahian Salsabil +1
cs.SE 2026-04-23 reviewed

Call-chain context lifts LLM test coverage
Call-Chain-Aware LLM-Based Test Generation for Java Projects

Guancheng Wang +4
cs.CY 2026-04-23 reviewed

Framework helps universities update rules for student GenAI use
A Systematic AI Adoption Framework for Higher Education: From Student GenAI Usage to Institutional Integration

Michael Neumann +3
cs.SE 2026-04-23 reviewed

Tests yield 300 correct runtime checkers across four systems
FlyCatcher: Neural Inference of Runtime Checkers from Tests

Beatriz Souza +3
cs.SE 2026-04-23 reviewed

Nominal group interviews enable documentless assessments
Documentless Assessments Using Nominal Group Interviews

Eduardo Miranda
cs.CR 2026-04-23 reviewed

87% of multi-commit Python vulnerabilities evade per-commit SAST
CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Arunabh Majumdar
cs.CL 2026-04-23 reviewed

Models create harder math problems than they can solve in dual tests
MathDuels: Evaluating LLMs as Problem Posers and Solvers

Zhiqiu Xu +3
cs.OH 2026-04-23 reviewed

Framework eases onboarding for research computing newcomers
Institutionalizing Best Practices in Research Computing: A Framework and Case Study for Improving User Onboarding

Ayush Chaturvedi +6
cs.SE 2026-04-23 reviewed

One test generalizes into full scenario coverage
Generalizing Test Cases for Comprehensive Test Scenario Coverage

Binhang Qi +6
cs.LG 2026-04-23 reviewed

PrismaDV produces task-aware data unit tests automatically
PrismaDV: Automated Task-Aware Data Unit Test Generation

Hao Chen +2
cs.SE 2026-04-23 reviewed

Structured JSON output beats direct and agentic LLM methods for analysis queries
Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis

Krishna Narasimhan
cs.SE 2026-04-23 reviewed

Grounding document steers AI coding to valid scientific results
Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development

Magnus Palmblad +2
cs.CL 2026-04-23 reviewed

AI code generators include sensitive attributes in 88% of ML pipelines
From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation

Minh Duc Bui +4
cs.SE 2026-04-23 reviewed

LLMs answer 98 percent of ROS2 architecture questions correctly
Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?

Laura Duits +2
cs.SE 2026-04-23 reviewed

Provenance data turns ML interpretability into verifiable requirements
Verifying Machine Learning Interpretability Requirements through Provenance

Lynn Vonderhaar +3