archive
Every paper Pith has read. Search by title, abstract, or pith.
1797 papers in cs.SE · page 9
-
Developers put ethical rules for AI agents into repo files
Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files
-
Semi-supervised models flag unrelated CI build failures
Is this Build Failure Related to my Patch? An Empirical Study of Unrelated Build Failures in Continuous Integration
-
Two hours of prep lets AI agents build full-stack platform in parallel
Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology
-
Security detection rules keep oscillating between goals
Evolution of Log-Based Detection Rules in Public Repositories
-
Detection rules keep adding and removing conditions instead of stabilizing
Evolution of Log-Based Detection Rules in Public Repositories
-
Claude leads public single-file HTML generation tests
The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
-
Policy gating blocks cross-tenant leaks in shared AI retrieval
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
-
Nine-dimension model explains root causes in five of twelve DeFi incidents
Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach
-
Retrieval scaffolding aligns AI services with production rules
Architectural Constraints Alignment in AI-assisted, Platform-based Service Development
-
Symbolic conflict essences detect rule interference exactly
Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version
-
Grid overlay beats semantic prompts for LLM chart reading
Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
-
Syntax routing lets small code models exceed large-model accuracy at 58 percent lower cost
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
-
LLM agents classify repo artifacts competitively without context limits
Agentic Repository Mining: A Multi-Task Evaluation
-
Case study maps SIL rules and memory limits in real car software
Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework
-
Developers accept most LLM refactoring suggestions unchanged
Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions
-
Bug tools need more than accuracy to be adopted
Toward an Understanding of Developer Behaviour while Using Bug Localization Tools
-
GenAI speeds up coding but leaves learning unchanged
A meta-analysis of the effect of generative AI on productivity and learning in programming
-
Function chunking lowers RAG code completion performance
How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study
-
Spec-guided fuzzer finds 24 new bugs in industrial protocols
AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing
-
AI pipeline with teacher review upgrades peer feedback quality
AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education
-
AI tool scales rubric feedback for student slides
AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations
-
Controlled protocols shrink attention-model gains in PKT
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
-
LLM framework builds UVM testbenches in 4.5 hours at 95.65% coverage
UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification
-
Training data flaws cause code defects in LLMs via 18 paths
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
-
LLM evolution with runtime targets yields 15x Java speedup
CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
-
Digital twin trust maps to four integration patterns across domains
Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons
-
AI app builders oversimplify specs and skip secure backends
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
-
Screening cuts agent-repair leaderboard rank shifts by 62%
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
-
Fine-tuned reranker improves code search across three tasks
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
-
Fine-tuned reranker lifts code search on all three tasks
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
-
AI coding tools shift responsibility for errors to users
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
-
Declarative YAML lets AI agents run any scientific workflow
PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation
-
Governance routines cut enterprise tech modernization effort by 30 percent
EMRGF: A Practitioner Framework for Governance-Driven Enterprise Technology Modernization
-
Model benchmarks cannot confirm deployed AI alignment
Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
-
Automatic framework fixes failures in LLM reinforcement tuning
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
-
Context hurts design exploration on some tasks by 46%
When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration
-
Benchmark and adapted tool generate Java reproduction tests from issues
Reproduction Test Generation for Java SWE Issues
-
Adapted tool generates reproduction tests for Java issues
Reproduction Test Generation for Java SWE Issues
-
Token n-grams and code metrics triage C vulnerabilities at PR-AUC 0.64
Lightweight Vulnerability Detection from Code Metrics and Token Features
-
EngThrive pairs speed, ease and quality metrics with wellbeing guardrails
EngThrive: Make It Fast and Easy to Do Great Work
-
Kumushi steers LLMs to root causes for deeper vulnerability fixes
Root-Cause-Driven Automated Vulnerability Repair
-
AI extracts PDF data to audit every transaction
Automated Population-Level Audit Assurance via AI-Based Document Intelligence
-
50 testing tools converge on similar output formats
Exploring the Output of Software Testing Tools through a Visual Comparative Analysis
-
Agents negotiate stable software module decompositions
A Multi-Agent Consensus Protocol for Stable Software Remodularization
-
Embeddings link source code to decompiled binaries without names
Identifier-Free Code Embedding Models for Scalable Search
-
NeurIPS should require reproducible evidence for AI safety claims
NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims
-
RL agent raises Rust static analysis precision from 26% to 59%
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
-
RL agent cuts false positives in Rust safety checks
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
-
Requirements engineering divides into two isolated pathways
Two Integration Pathways in Human-Centered Requirements Engineering: A Systematic Mapping Study of Structural Gaps
-
Brick-Circuit generator spans quantum states more uniformly at low depth
Randomized and Diverse Input State Generation for Quantum Program Testing