Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.
arXiv preprint arXiv:2409.13642 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SieveFL combines vector retrieval and JaCoCo runtime pruning to cut LLM token use by 49% while achieving 41.8% Top-1 accuracy on 395 Defects4J bugs, outperforming AgentFL.
DEFault++ applies hierarchical learning with a Fault Propagation Graph to detect, localize, and diagnose faults in transformers, improving F1 to 0.826-0.909 and developer repair accuracy from 57.1% to 83.3% on a new benchmark of 5,556 mutation-tested runs.
GenLoc integrates semantic retrieval and LLM-based iterative code exploration to outperform prior IRBL and LLM methods on Java and Python bug localization benchmarks.
citing papers explorer
-
When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.
-
SieveFL: Hierarchical Runtime-Aware Pruning for Scalable LLM-Based Fault Localization
SieveFL combines vector retrieval and JaCoCo runtime pruning to cut LLM token use by 49% while achieving 41.8% Top-1 accuracy on 395 Defects4J bugs, outperforming AgentFL.
-
Hierarchical Fault Detection and Diagnosis for Transformer Architectures
DEFault++ applies hierarchical learning with a Fault Propagation Graph to detect, localize, and diagnose faults in transformers, improving F1 to 0.826-0.909 and developer repair accuracy from 57.1% to 83.3% on a new benchmark of 5,556 mutation-tested runs.