GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.
Science China Information Sciences , volume=
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7verdicts
UNVERDICTED 7representative citing papers
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
GraphMind equips LLM agents with graph awareness to construct human-like social networks, producing botnets that substantially degrade performance of both text-based and graph-based detectors.
Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.
citing papers explorer
-
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.
-
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
-
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
-
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
-
Beyond Individual Mimicry: Constructing Human-Like Social network with Graph-Augmented LLM Agents
GraphMind equips LLM agents with graph awareness to construct human-like social networks, producing botnets that substantially degrade performance of both text-based and graph-based detectors.
-
Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media
Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.