ChartFI-Bench supplies 896 chart-description pairs and four metrics (Faithfulness, Coverage, Informativeness, Acuity) to evaluate MLLM-generated chart descriptions on faithfulness and insightfulness.
Can llms produce faithful explanations for fact-checking? towards faith- ful explainable fact-checking via multi-agent debate
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
RW-Post is an auditable benchmark linking social media posts to evidence from human fact-check articles for evaluating multimodal AI fact-checking across different evidence regimes.
DebateCV uses opposing debater agents and a trained moderator agent to verify complex claims via debate, outperforming single-agent baselines in accuracy and justification quality.
RW-Post is an auditable text-image benchmark for real-world multimodal fact-checking that links posts to evidence traces from human fact-check articles and includes the AgentFact baseline for evaluation.
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.
citing papers explorer
-
ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models
ChartFI-Bench supplies 896 chart-description pairs and four metrics (Faithfulness, Coverage, Informativeness, Acuity) to evaluate MLLM-generated chart descriptions on faithfulness and insightfulness.
-
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
-
RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
RW-Post is an auditable benchmark linking social media posts to evidence from human fact-check articles for evaluating multimodal AI fact-checking across different evidence regimes.
-
Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
DebateCV uses opposing debater agents and a trained moderator agent to verify complex claims via debate, outperforming single-agent baselines in accuracy and justification quality.
-
RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
RW-Post is an auditable text-image benchmark for real-world multimodal fact-checking that links posts to evidence traces from human fact-check articles and includes the AgentFact baseline for evaluation.
-
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.