LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
NeuroFlake integrates discriminative token mining into LLMs to classify flaky tests, raising F1-score to 69.34% on FlakeBench while showing greater robustness to semantic-preserving perturbations than prior methods.
FlaXifyer applies few-shot learning on pre-trained language models to categorize intermittent CI job failures from logs at 84.3% Macro F1 and 92.0% Top-2 accuracy using 12 examples per category, with LogSift reducing log review effort by 74.4%.
citing papers explorer
-
Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
-
NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification
NeuroFlake integrates discriminative token mining into LLMs to classify flaky tests, raising F1-score to 69.34% on FlakeBench while showing greater robustness to semantic-preserving perturbations than prior methods.
-
Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models
FlaXifyer applies few-shot learning on pre-trained language models to categorize intermittent CI job failures from logs at 84.3% Macro F1 and 92.0% Top-2 accuracy using 12 examples per category, with LogSift reducing log review effort by 74.4%.