pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 9

  1. cs.SE 2026-05-07 reviewed
    Developers put ethical rules for AI agents into repo files

    Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files

    Christoph Treude +2

  2. cs.SE 2026-05-07 reviewed
    Semi-supervised models flag unrelated CI build failures

    Is this Build Failure Related to my Patch? An Empirical Study of Unrelated Build Failures in Continuous Integration

    Andie Huang +3

  3. cs.SE 2026-05-06 reviewed
    Two hours of prep lets AI agents build full-stack platform in parallel

    Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology

    Andrew Zigler

  4. cs.CR 2026-05-06 reviewed
    Security detection rules keep oscillating between goals

    Evolution of Log-Based Detection Rules in Public Repositories

    Minjun Long +1

  5. cs.CR 2026-05-06 reviewed
    Detection rules keep adding and removing conditions instead of stabilizing

    Evolution of Log-Based Detection Rules in Public Repositories

    Minjun Long +1

  6. cs.SE 2026-05-06 reviewed
    Claude leads public single-file HTML generation tests

    The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking

    Diego Cabezas Palacios

  7. cs.CR 2026-05-06 reviewed
    Policy gating blocks cross-tenant leaks in shared AI retrieval

    Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

    Francisco Javier Arceo +1

  8. cs.DC 2026-05-06 reviewed
    Nine-dimension model explains root causes in five of twelve DeFi incidents

    Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach

    Eva Oberholzer +3

  9. cs.SE 2026-05-06 reviewed
    Retrieval scaffolding aligns AI services with production rules

    Architectural Constraints Alignment in AI-assisted, Platform-based Service Development

    Julius Irion +7

  10. cs.SE 2026-05-06 reviewed
    Symbolic conflict essences detect rule interference exactly

    Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version

    Alexander Lauer +3

  11. cs.AI 2026-05-06 reviewed
    Grid overlay beats semantic prompts for LLM chart reading

    Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

    Andrei Lazarev +2

  12. cs.SE 2026-05-06 reviewed
    Syntax routing lets small code models exceed large-model accuracy at 58 percent lower cost

    SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

    Kishanthan Thangarajah +2

  13. cs.SE 2026-05-06 reviewed
    LLM agents classify repo artifacts competitively without context limits

    Agentic Repository Mining: A Multi-Task Evaluation

    Johannes H\"artel

  14. cs.SE 2026-05-06 reviewed
    Case study maps SIL rules and memory limits in real car software

    Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework

    Tobias Denzinger (CARIAD SE) +2

  15. cs.SE 2026-05-06 reviewed
    Developers accept most LLM refactoring suggestions unchanged

    Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions

    David Sch\"on +6

  16. cs.SE 2026-05-06 reviewed
    Bug tools need more than accuracy to be adopted

    Toward an Understanding of Developer Behaviour while Using Bug Localization Tools

    Pablo Diaz Pedreira +2

  17. cs.SE 2026-05-06 reviewed
    GenAI speeds up coding but leaves learning unchanged

    A meta-analysis of the effect of generative AI on productivity and learning in programming

    Sebastian Maier +4

  18. cs.SE 2026-05-06 reviewed
    Function chunking lowers RAG code completion performance

    How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study

    Xinjian Wu +3

  19. cs.CR 2026-05-06 reviewed
    Spec-guided fuzzer finds 24 new bugs in industrial protocols

    AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing

    Jiaying Meng +4

  20. cs.HC 2026-05-06 reviewed
    AI pipeline with teacher review upgrades peer feedback quality

    AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education

    Alvaro Becerra +2

  21. cs.HC 2026-05-06 reviewed
    AI tool scales rubric feedback for student slides

    AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations

    Alvaro Becerra +2

  22. cs.LG 2026-05-06 reviewed
    Controlled protocols shrink attention-model gains in PKT

    Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

    Jaewook Kim +1

  23. cs.AR 2026-05-06 reviewed
    LLM framework builds UVM testbenches in 4.5 hours at 95.65% coverage

    UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification

    Junhao Ye +9

  24. cs.SE 2026-05-06 reviewed
    Training data flaws cause code defects in LLMs via 18 paths

    Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

    Kaifeng He +9

  25. cs.SE 2026-05-06 reviewed
    LLM evolution with runtime targets yields 15x Java speedup

    CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

    Ajay Krishna Borra +11

  26. cs.CY 2026-05-06 reviewed
    Digital twin trust maps to four integration patterns across domains

    Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons

    Chi Fai David Lam (1) +3

  27. cs.MA 2026-05-06 reviewed
    AI app builders oversimplify specs and skip secure backends

    SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

    Siddhant Saxena +2

  28. cs.AI 2026-05-06 reviewed
    Screening cuts agent-repair leaderboard rank shifts by 62%

    AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

    Yuelin Hu +4

  29. cs.SE 2026-05-06 reviewed
    Fine-tuned reranker improves code search across three tasks

    Beyond Retrieval: A Multitask Benchmark and Model for Code Search

    Siqiao Xue +6

  30. cs.SE 2026-05-06 reviewed
    Fine-tuned reranker lifts code search on all three tasks

    Beyond Retrieval: A Multitask Benchmark and Model for Code Search

    Siqiao Xue +6

  31. cs.SE 2026-05-06 reviewed
    AI coding tools shift responsibility for errors to users

    Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

    Christoph Treude

  32. cs.SE 2026-05-06 reviewed
    Declarative YAML lets AI agents run any scientific workflow

    PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation

    Yuchen Wang +1

  33. cs.SE 2026-05-06 reviewed
    Governance routines cut enterprise tech modernization effort by 30 percent

    EMRGF: A Practitioner Framework for Governance-Driven Enterprise Technology Modernization

    Harveen Punihani

  34. cs.AI 2026-05-06 reviewed
    Model benchmarks cannot confirm deployed AI alignment

    Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

    Varad Vishwarupe +3

  35. cs.SE 2026-05-06 reviewed
    Automatic framework fixes failures in LLM reinforcement tuning

    Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

    Lingzhe Zhang +8

  36. cs.AI 2026-05-05 reviewed
    Context hurts design exploration on some tasks by 46%

    When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

    Saranyan Vigraham

  37. cs.SE 2026-05-05 reviewed
    Benchmark and adapted tool generate Java reproduction tests from issues

    Reproduction Test Generation for Java SWE Issues

    Toufique Ahmed +3

  38. cs.SE 2026-05-05 reviewed
    Adapted tool generates reproduction tests for Java issues

    Reproduction Test Generation for Java SWE Issues

    Toufique Ahmed +3

  39. cs.CR 2026-05-05 reviewed
    Token n-grams and code metrics triage C vulnerabilities at PR-AUC 0.64

    Lightweight Vulnerability Detection from Code Metrics and Token Features

    Chun Yin Chiu

  40. cs.SE 2026-05-05 reviewed
    EngThrive pairs speed, ease and quality metrics with wellbeing guardrails

    EngThrive: Make It Fast and Easy to Do Great Work

    Brian Houck +3

  41. cs.CR 2026-05-05 reviewed
    Kumushi steers LLMs to root causes for deeper vulnerability fixes

    Root-Cause-Driven Automated Vulnerability Repair

    Hulin Wang +15

  42. cs.SE 2026-05-05 reviewed
    AI extracts PDF data to audit every transaction

    Automated Population-Level Audit Assurance via AI-Based Document Intelligence

    Santosh Vasudevan +1

  43. cs.HC 2026-05-05 reviewed
    50 testing tools converge on similar output formats

    Exploring the Output of Software Testing Tools through a Visual Comparative Analysis

    Brandon Lit +2

  44. cs.SE 2026-05-05 reviewed
    Agents negotiate stable software module decompositions

    A Multi-Agent Consensus Protocol for Stable Software Remodularization

    Ahmed F. Ibrahim

  45. cs.CR 2026-05-05 reviewed
    Embeddings link source code to decompiled binaries without names

    Identifier-Free Code Embedding Models for Scalable Search

    Eric Wolos +1

  46. cs.CY 2026-05-05 reviewed
    NeurIPS should require reproducible evidence for AI safety claims

    NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims

    Varad Vishwarupe +3

  47. cs.SE 2026-05-05 reviewed
    RL agent raises Rust static analysis precision from 26% to 59%

    Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

    P Akilesh +3

  48. cs.SE 2026-05-05 reviewed
    RL agent cuts false positives in Rust safety checks

    Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

    P Akilesh +3

  49. cs.SE 2026-05-05 reviewed
    Requirements engineering divides into two isolated pathways

    Two Integration Pathways in Human-Centered Requirements Engineering: A Systematic Mapping Study of Structural Gaps

    Imen Benzarti +4

  50. cs.SE 2026-05-05 reviewed
    Brick-Circuit generator spans quantum states more uniformly at low depth

    Randomized and Diverse Input State Generation for Quantum Program Testing

    Maryse Ernzer +3