SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
Entity-based knowledge conflicts in question answering
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
citing papers explorer
-
Trust or Abstain? A Self-Aware RAG Approach
SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
-
When Less is More: The LLM Scaling Paradox in Context Compression
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
-
ART: Automatic multi-step reasoning and tool-use for large language models
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
-
Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking
MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
-
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.