An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
hub Canonical reference
Sánchez, Pedro Delgado-Pérez, Inmaculada Medina-Bulo, and Sergio Segura
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
Gleaner replaces slow graph-based trace analysis with bag-of-edges set operations plus log semantics and alarm-driven diversity to deliver faster, higher-fidelity sampling that improves RCA accuracy even at 1% rates.
AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
Classport adds dependency information to Java class files to enable runtime introspection of used dependencies, shown feasible on six real-world projects.
JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
MutDafny uses 40 mutation operators on 794 real-world Dafny programs to detect weak specifications, manually confirming five such cases at a rate of one per 241 lines.
Large-scale review mining of 1M+ comments from 171 Gen-AI apps using an LLM framework reveals top topics plus three opportunities and three challenges for developers.
Hidden dependencies and component variants in SBOMs cause inconsistent vulnerability reporting and VEX handling across scanners.
Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.
StartFlow is a new structured method that helps startup teams without UX expertise produce clearer wireflow prototypes with fewer usability problems.
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
citing papers explorer
-
Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns
An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
-
Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics
Gleaner replaces slow graph-based trace analysis with bag-of-edges set operations plus log semantics and alarm-driven diversity to deliver faster, higher-fidelity sampling that improves RCA accuracy even at 1% rates.
-
AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
-
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
-
Classport: Designing Runtime Dependency Introspection for Java
Classport adds dependency information to Java class files to enable runtime introspection of used dependencies, shown feasible on six real-world projects.
-
JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks
JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
-
Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
-
Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
-
MutDafny: A Mutation-Based Approach to Assess Dafny Specifications
MutDafny uses 40 mutation operators on 794 real-world Dafny programs to detect weak specifications, manually confirming five such cases at a rate of one per 241 lines.
-
Understanding the Challenges and Opportunities of Generative AI Apps: An Empirical Study
Large-scale review mining of 1M+ comments from 171 Gen-AI apps using an LLM framework reveals top topics plus three opportunities and three challenges for developers.
-
Hidden Dependencies and Component Variants in SBOM-Based Software Composition Analysis
Hidden dependencies and component variants in SBOMs cause inconsistent vulnerability reporting and VEX handling across scanners.
-
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows
Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
-
Misleading Microbenchmarks on the Java Virtual Machines
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.
-
StartFlow: From Method Conception to Multi-Perspective Evaluation in UX Prototyping for Software Startups
StartFlow is a new structured method that helps startup teams without UX expertise produce clearer wireflow prototypes with fewer usability problems.
-
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.