LLM-ReSum uses LLM self-evaluation in a closed feedback loop to refine summaries, improving factual accuracy by up to 33% and coverage by 39% with 89% human preference.
Meteor: An automatic metric for mt evalua- tion with improved correlation with human judgments
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Temporal conditioning in three LLM-based planner architectures for AV scene-to-plan reasoning yields no statistically significant gains on NLP correctness metrics but enables predictive hazard reasoning and stable corrections on BDD-X subsets.
citing papers explorer
-
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
LLM-ReSum uses LLM self-evaluation in a closed feedback loop to refine summaries, improving factual accuracy by up to 33% and coverage by 39% with 89% human preference.
-
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
Temporal conditioning in three LLM-based planner architectures for AV scene-to-plan reasoning yields no statistically significant gains on NLP correctness metrics but enables predictive hazard reasoning and stable corrections on BDD-X subsets.