SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
Entity-based knowledge conflicts in question answering
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
SingGuard introduces a policy-adaptive multimodal LLM guardrail with dynamic reasoning regimes and SingGuard-Bench, reporting SOTA F1 scores across 35 datasets and improved policy-following accuracy under runtime shifts.
MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
citing papers explorer
-
Trust or Abstain? A Self-Aware RAG Approach
SABER combines self-prior with multi-trace PK and CK reasoning representations to estimate reliability beliefs and drive trust-or-abstain decisions in knowledge-conflict RAG, improving accuracy over baselines.
-
When Less is More: The LLM Scaling Paradox in Context Compression
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
-
ART: Automatic multi-step reasoning and tool-use for large language models
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
-
SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning
SingGuard introduces a policy-adaptive multimodal LLM guardrail with dynamic reasoning regimes and SingGuard-Bench, reporting SOTA F1 scores across 35 datasets and improved policy-following accuracy under runtime shifts.
-
Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking
MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.