An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
hub Canonical reference
Sánchez, Pedro Delgado-Pérez, Inmaculada Medina-Bulo, and Sergio Segura
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
Gleaner replaces slow graph-based trace analysis with bag-of-edges set operations plus log semantics and alarm-driven diversity to deliver faster, higher-fidelity sampling that improves RCA accuracy even at 1% rates.
AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
MutDafny uses 40 mutation operators on 794 real-world Dafny programs to detect weak specifications, manually confirming five such cases at a rate of one per 241 lines.
Large-scale review mining of 1M+ comments from 171 Gen-AI apps using an LLM framework reveals top topics plus three opportunities and three challenges for developers.
Hidden dependencies and component variants in SBOMs cause inconsistent vulnerability reporting and VEX handling across scanners.
Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.
StartFlow is a new structured method that helps startup teams without UX expertise produce clearer wireflow prototypes with fewer usability problems.
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
citing papers explorer
-
Misleading Microbenchmarks on the Java Virtual Machines
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.