MultiLogBench shows that LLM performance on automated logging varies substantially across programming languages, demonstrating that single-language evidence is insufficient for general claims about model behavior or tool design.
month = sep, year =
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 2polarities
background 2representative citing papers
A black-box LLM approach for fault localization in system-level test code that estimates execution traces from failure logs to rank potential faults with reduced inference cost.
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
MR-Scout extracts over 11,000 metamorphic-relation-encoded test cases from 701 OSS projects, codifies 97% of them as high-quality generators, and shows they raise line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests.
Developers most frequently reference the full Log4j migration guide in pull request descriptions (82.81% of cases) and continue consulting it during post-update maintenance tasks.
Empirical review of 233 real-world vulnerabilities from 34 TON audits produces a specialized checklist for asynchronous message handling, supported by case studies and an 11-person practitioner survey.
Generative AI suitability in qualitative research depends primarily on the approach (small-q positivist/post-positivist or Big Q non-positivist) along with skills, ethics, and personal preferences.
citing papers explorer
-
Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs
MultiLogBench shows that LLM performance on automated logging varies substantially across programming languages, demonstrating that single-language evidence is insufficient for general claims about model behavior or tool design.
-
Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models
A black-box LLM approach for fault localization in system-level test code that estimates execution traces from failure logs to rank potential faults with reduced inference cost.
-
Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
-
Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
-
MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
-
MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases
MR-Scout extracts over 11,000 metamorphic-relation-encoded test cases from 701 OSS projects, codifies 97% of them as high-quality generators, and shows they raise line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests.
-
How Do Developers Use Migration Guides? A Case Study of Log4j
Developers most frequently reference the full Log4j migration guide in pull request descriptions (82.81% of cases) and continue consulting it during post-update maintenance tasks.
-
From Paradigm Shift to Audit Rift: Empirical Analysis and Validation of Security Audit Methodologies for Asynchronous Smart Contract Systems
Empirical review of 233 real-world vulnerabilities from 34 TON audits produces a specialized checklist for asynchronous message handling, supported by case studies and an 11-person practitioner survey.
-
To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research
Generative AI suitability in qualitative research depends primarily on the approach (small-q positivist/post-positivist or Big Q non-positivist) along with skills, ethics, and personal preferences.